RE: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-12 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Hongbo Zhang [mailto:hongbo.zh...@linaro.org]
> Sent: Wednesday, November 12, 2014 4:09 PM
> To: Bhushan Bharat-R65777
> Cc: Antonios Motakis; open list:VFIO DRIVER; will.dea...@arm.com;
> alex.william...@redhat.com; open list; io...@lists.linux-foundation.org;
> t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu
> Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to
> Kconfig
> 
> On 12 November 2014 18:05, bharat.bhus...@freescale.com
>  wrote:
> > Hi,
> >
> >
> >
> > This is not yet supported on Freescale PowerPC. I am still in process
> > of upstreaming the FSL PAMU specific patches for same.
> >
> > Initial plan is to test with PCIe devices and then with Platform devices.
> >
> 
> I see there is already driver/iommu/fsl_pamu.c, doesn't it work?

We need VFIO iommu driver for same.

> Could you explain briefly what is wrong? I've heard that the vfio pci works on
> powerpc platforms.

Yes, patches are in Freescale internal git repository. But those patches are 
yet to be upstreamed. I have started working on same.

Thanks
-Bharat

> 
> >
> >
> > Thanks
> >
> > -Bharat
> >
> >
> >
> > From: kvmarm-boun...@lists.cs.columbia.edu
> > [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo
> > Zhang
> > Sent: Wednesday, November 12, 2014 3:08 PM
> > To: Antonios Motakis
> > Cc: open list:VFIO DRIVER; will.dea...@arm.com;
> > alex.william...@redhat.com; open list;
> > io...@lists.linux-foundation.org; t...@virtualopensystems.com;
> > kvm...@lists.cs.columbia.edu
> > Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM
> > module to Kconfig
> >
> >
> >
> >
> >
> >
> >
> > On 28 October 2014 02:07, Antonios Motakis
> >  wrote:
> >
> > Enable building the VFIO PLATFORM driver that allows to use Linux
> > platform devices with VFIO.
> >
> > Signed-off-by: Antonios Motakis 
> > ---
> >  drivers/vfio/Kconfig   | 1 +
> >  drivers/vfio/Makefile  | 1 +
> >  drivers/vfio/platform/Kconfig  | 9 +
> > drivers/vfio/platform/Makefile | 4 
> >  4 files changed, 15 insertions(+)
> >  create mode 100644 drivers/vfio/platform/Kconfig  create mode 100644
> > drivers/vfio/platform/Makefile
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > a0abe04..962fb80 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -27,3 +27,4 @@ menuconfig VFIO
> >   If you don't know what to do here, say N.
> >
> >  source "drivers/vfio/pci/Kconfig"
> > +source "drivers/vfio/platform/Kconfig"
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
> > 0b035b1..dadf0ca 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> >  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> >  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
> >  obj-$(CONFIG_VFIO_PCI) += pci/
> > +obj-$(CONFIG_VFIO_PLATFORM) += platform/
> > diff --git a/drivers/vfio/platform/Kconfig
> > b/drivers/vfio/platform/Kconfig new file mode 100644 index
> > 000..c51af17
> > --- /dev/null
> > +++ b/drivers/vfio/platform/Kconfig
> > @@ -0,0 +1,9 @@
> > +config VFIO_PLATFORM
> > +   tristate "VFIO support for platform devices"
> > +   depends on VFIO && EVENTFD && ARM
> >
> >
> >
> > Hi Antonios,
> >
> > Is this only for ARM? how about X86 and PowerPC?
> >
> > On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
> > Access Management Unit), and I am trying to use this VFIO framework on it.
> >
> >
> >
> > +   help
> > + Support for platform devices with VFIO. This is required to make
> > + use of platform devices present on the system using the VFIO
> > + framework.
> > +
> > + If you don't know what to do here, say N.
> > diff --git a/drivers/vfio/platform/Makefile
> > b/drivers/vfio/platform/Makefile new file mode 100644 index
> > 000..279862b
> > --- /dev/null
> > +++ b/drivers/vfio/platform/Makefile
> > @@ -0,0 +1,4 @@
> > +
> > +vfio-platform-y := vfio_platform.o vfio_platform_common.o
> > +
> > +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
> > --
> > 2.1.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> >


RE: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-12 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Hongbo Zhang [mailto:hongbo.zh...@linaro.org]
 Sent: Wednesday, November 12, 2014 4:09 PM
 To: Bhushan Bharat-R65777
 Cc: Antonios Motakis; open list:VFIO DRIVER; will.dea...@arm.com;
 alex.william...@redhat.com; open list; io...@lists.linux-foundation.org;
 t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu
 Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to
 Kconfig
 
 On 12 November 2014 18:05, bharat.bhus...@freescale.com
 bharat.bhus...@freescale.com wrote:
  Hi,
 
 
 
  This is not yet supported on Freescale PowerPC. I am still in process
  of upstreaming the FSL PAMU specific patches for same.
 
  Initial plan is to test with PCIe devices and then with Platform devices.
 
 
 I see there is already driver/iommu/fsl_pamu.c, doesn't it work?

We need VFIO iommu driver for same.

 Could you explain briefly what is wrong? I've heard that the vfio pci works on
 powerpc platforms.

Yes, patches are in Freescale internal git repository. But those patches are 
yet to be upstreamed. I have started working on same.

Thanks
-Bharat

 
 
 
  Thanks
 
  -Bharat
 
 
 
  From: kvmarm-boun...@lists.cs.columbia.edu
  [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo
  Zhang
  Sent: Wednesday, November 12, 2014 3:08 PM
  To: Antonios Motakis
  Cc: open list:VFIO DRIVER; will.dea...@arm.com;
  alex.william...@redhat.com; open list;
  io...@lists.linux-foundation.org; t...@virtualopensystems.com;
  kvm...@lists.cs.columbia.edu
  Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM
  module to Kconfig
 
 
 
 
 
 
 
  On 28 October 2014 02:07, Antonios Motakis
  a.mota...@virtualopensystems.com wrote:
 
  Enable building the VFIO PLATFORM driver that allows to use Linux
  platform devices with VFIO.
 
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/vfio/Kconfig   | 1 +
   drivers/vfio/Makefile  | 1 +
   drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
   4 files changed, 15 insertions(+)
   create mode 100644 drivers/vfio/platform/Kconfig  create mode 100644
  drivers/vfio/platform/Makefile
 
  diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
  a0abe04..962fb80 100644
  --- a/drivers/vfio/Kconfig
  +++ b/drivers/vfio/Kconfig
  @@ -27,3 +27,4 @@ menuconfig VFIO
If you don't know what to do here, say N.
 
   source drivers/vfio/pci/Kconfig
  +source drivers/vfio/platform/Kconfig
  diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
  0b035b1..dadf0ca 100644
  --- a/drivers/vfio/Makefile
  +++ b/drivers/vfio/Makefile
  @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
   obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
   obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
   obj-$(CONFIG_VFIO_PCI) += pci/
  +obj-$(CONFIG_VFIO_PLATFORM) += platform/
  diff --git a/drivers/vfio/platform/Kconfig
  b/drivers/vfio/platform/Kconfig new file mode 100644 index
  000..c51af17
  --- /dev/null
  +++ b/drivers/vfio/platform/Kconfig
  @@ -0,0 +1,9 @@
  +config VFIO_PLATFORM
  +   tristate VFIO support for platform devices
  +   depends on VFIO  EVENTFD  ARM
 
 
 
  Hi Antonios,
 
  Is this only for ARM? how about X86 and PowerPC?
 
  On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
  Access Management Unit), and I am trying to use this VFIO framework on it.
 
 
 
  +   help
  + Support for platform devices with VFIO. This is required to make
  + use of platform devices present on the system using the VFIO
  + framework.
  +
  + If you don't know what to do here, say N.
  diff --git a/drivers/vfio/platform/Makefile
  b/drivers/vfio/platform/Makefile new file mode 100644 index
  000..279862b
  --- /dev/null
  +++ b/drivers/vfio/platform/Makefile
  @@ -0,0 +1,4 @@
  +
  +vfio-platform-y := vfio_platform.o vfio_platform_common.o
  +
  +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
  --
  2.1.1
 
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-kernel in the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 


RE: [PATCH v8 07/18] vfio/platform: return info for device memory mapped IO regions

2014-10-21 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
> Sent: Tuesday, October 21, 2014 10:05 PM
> To: Antonios Motakis
> Cc: open list:VFIO DRIVER; eric.au...@linaro.org; marc.zyng...@arm.com;
> will.dea...@arm.com; open list; io...@lists.linux-foundation.org;
> t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu;
> christoffer.d...@linaro.org
> Subject: Re: [PATCH v8 07/18] vfio/platform: return info for device memory
> mapped IO regions
> 
> On Mon, 2014-10-13 at 15:10 +0200, Antonios Motakis wrote:
> > This patch enables the IOCTLs VFIO_DEVICE_GET_REGION_INFO ioctl call,
> > which allows the user to learn about the available MMIO resources of a
> > device.
> >
> > Signed-off-by: Antonios Motakis 
> > ---
> >  drivers/vfio/platform/vfio_platform_common.c  | 93
> > +--
> > drivers/vfio/platform/vfio_platform_private.h | 22 +++
> >  2 files changed, 111 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/platform/vfio_platform_common.c
> > b/drivers/vfio/platform/vfio_platform_common.c
> > index 1e4073f..8a7e474 100644
> > --- a/drivers/vfio/platform/vfio_platform_common.c
> > +++ b/drivers/vfio/platform/vfio_platform_common.c
> > @@ -27,17 +27,84 @@
> >
> >  #include "vfio_platform_private.h"
> >
> > +static int vfio_platform_regions_init(struct vfio_platform_device
> > +*vdev) {
> > +   int cnt = 0, i;
> > +
> > +   while (vdev->get_resource(vdev, cnt))
> > +   cnt++;
> > +
> > +   vdev->regions = kcalloc(cnt, sizeof(struct vfio_platform_region),
> > +   GFP_KERNEL);
> > +   if (!vdev->regions)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < cnt;  i++) {
> > +   struct resource *res =
> > +   vdev->get_resource(vdev, i);
> > +
> > +   if (!res)
> > +   goto err;
> > +
> > +   vdev->regions[i].addr = res->start;
> > +   vdev->regions[i].size = resource_size(res);
> > +   vdev->regions[i].flags = 0;
> > +
> > +   switch (resource_type(res)) {
> > +   case IORESOURCE_MEM:
> > +   vdev->regions[i].type = VFIO_PLATFORM_REGION_TYPE_MMIO;
> > +   break;
> > +   case IORESOURCE_IO:
> > +   vdev->regions[i].type = VFIO_PLATFORM_REGION_TYPE_PIO;
> > +   break;
> 
> Ok, we have support for PIO in platform now (thanks!), does the user know what
> type of region they're dealing with?  Do they care?  For PCI the user tests 
> the
> PCI BAR in config space to determine which type it is.  I'm guessing that
> platform would do something similar against the device tree or ACPI, right?

Interested in knowing what type of PIO region in a platform device, is this for 
I2C/SPI type of device? 


Thanks
-Bharat

> 
> > +   default:
> > +   goto err;
> > +   }
> > +   }
> > +
> > +   vdev->num_regions = cnt;
> > +
> > +   return 0;
> > +err:
> > +   kfree(vdev->regions);
> > +   return -EINVAL;
> > +}
> > +
> > +static void vfio_platform_regions_cleanup(struct vfio_platform_device
> > +*vdev) {
> > +   vdev->num_regions = 0;
> > +   kfree(vdev->regions);
> > +}
> > +
> >  static void vfio_platform_release(void *device_data)  {
> > +   struct vfio_platform_device *vdev = device_data;
> > +
> > +   if (atomic_dec_and_test(>refcnt))
> > +   vfio_platform_regions_cleanup(vdev);
> > +
> > module_put(THIS_MODULE);
> >  }
> >
> >  static int vfio_platform_open(void *device_data)  {
> > +   struct vfio_platform_device *vdev = device_data;
> > +   int ret;
> > +
> > if (!try_module_get(THIS_MODULE))
> > return -ENODEV;
> >
> > +   if (atomic_inc_return(>refcnt) == 1) {
> > +   ret = vfio_platform_regions_init(vdev);
> > +   if (ret)
> > +   goto err_reg;
> > +   }
> > +
> > return 0;
> > +
> > +err_reg:
> > +   module_put(THIS_MODULE);
> > +   return ret;
> 
> Note that if vfio_platform_regions_init() fails then your refcnt is wrong.  We
> switched to a mutex in vfio-pci to avoid this.  See 61d792562b53.
> 
> >  }
> >
> >  static long vfio_platform_ioctl(void *device_data, @@ -58,15 +125,33
> > @@ static long vfio_platform_ioctl(void *device_data,
> > return -EINVAL;
> >
> > info.flags = vdev->flags;
> > -   info.num_regions = 0;
> > +   info.num_regions = vdev->num_regions;
> > info.num_irqs = 0;
> >
> > return copy_to_user((void __user *)arg, , minsz);
> >
> > -   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
> > -   return -EINVAL;
> > +   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
> > +   struct vfio_region_info info;
> > +
> > +   minsz = offsetofend(struct vfio_region_info, offset);
> > +
> > +   if (copy_from_user(, (void __user *)arg, minsz))
> > +   

RE: [PATCH v8 07/18] vfio/platform: return info for device memory mapped IO regions

2014-10-21 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
 boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
 Sent: Tuesday, October 21, 2014 10:05 PM
 To: Antonios Motakis
 Cc: open list:VFIO DRIVER; eric.au...@linaro.org; marc.zyng...@arm.com;
 will.dea...@arm.com; open list; io...@lists.linux-foundation.org;
 t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu;
 christoffer.d...@linaro.org
 Subject: Re: [PATCH v8 07/18] vfio/platform: return info for device memory
 mapped IO regions
 
 On Mon, 2014-10-13 at 15:10 +0200, Antonios Motakis wrote:
  This patch enables the IOCTLs VFIO_DEVICE_GET_REGION_INFO ioctl call,
  which allows the user to learn about the available MMIO resources of a
  device.
 
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/vfio/platform/vfio_platform_common.c  | 93
  +--
  drivers/vfio/platform/vfio_platform_private.h | 22 +++
   2 files changed, 111 insertions(+), 4 deletions(-)
 
  diff --git a/drivers/vfio/platform/vfio_platform_common.c
  b/drivers/vfio/platform/vfio_platform_common.c
  index 1e4073f..8a7e474 100644
  --- a/drivers/vfio/platform/vfio_platform_common.c
  +++ b/drivers/vfio/platform/vfio_platform_common.c
  @@ -27,17 +27,84 @@
 
   #include vfio_platform_private.h
 
  +static int vfio_platform_regions_init(struct vfio_platform_device
  +*vdev) {
  +   int cnt = 0, i;
  +
  +   while (vdev-get_resource(vdev, cnt))
  +   cnt++;
  +
  +   vdev-regions = kcalloc(cnt, sizeof(struct vfio_platform_region),
  +   GFP_KERNEL);
  +   if (!vdev-regions)
  +   return -ENOMEM;
  +
  +   for (i = 0; i  cnt;  i++) {
  +   struct resource *res =
  +   vdev-get_resource(vdev, i);
  +
  +   if (!res)
  +   goto err;
  +
  +   vdev-regions[i].addr = res-start;
  +   vdev-regions[i].size = resource_size(res);
  +   vdev-regions[i].flags = 0;
  +
  +   switch (resource_type(res)) {
  +   case IORESOURCE_MEM:
  +   vdev-regions[i].type = VFIO_PLATFORM_REGION_TYPE_MMIO;
  +   break;
  +   case IORESOURCE_IO:
  +   vdev-regions[i].type = VFIO_PLATFORM_REGION_TYPE_PIO;
  +   break;
 
 Ok, we have support for PIO in platform now (thanks!), does the user know what
 type of region they're dealing with?  Do they care?  For PCI the user tests 
 the
 PCI BAR in config space to determine which type it is.  I'm guessing that
 platform would do something similar against the device tree or ACPI, right?

Interested in knowing what type of PIO region in a platform device, is this for 
I2C/SPI type of device? 


Thanks
-Bharat

 
  +   default:
  +   goto err;
  +   }
  +   }
  +
  +   vdev-num_regions = cnt;
  +
  +   return 0;
  +err:
  +   kfree(vdev-regions);
  +   return -EINVAL;
  +}
  +
  +static void vfio_platform_regions_cleanup(struct vfio_platform_device
  +*vdev) {
  +   vdev-num_regions = 0;
  +   kfree(vdev-regions);
  +}
  +
   static void vfio_platform_release(void *device_data)  {
  +   struct vfio_platform_device *vdev = device_data;
  +
  +   if (atomic_dec_and_test(vdev-refcnt))
  +   vfio_platform_regions_cleanup(vdev);
  +
  module_put(THIS_MODULE);
   }
 
   static int vfio_platform_open(void *device_data)  {
  +   struct vfio_platform_device *vdev = device_data;
  +   int ret;
  +
  if (!try_module_get(THIS_MODULE))
  return -ENODEV;
 
  +   if (atomic_inc_return(vdev-refcnt) == 1) {
  +   ret = vfio_platform_regions_init(vdev);
  +   if (ret)
  +   goto err_reg;
  +   }
  +
  return 0;
  +
  +err_reg:
  +   module_put(THIS_MODULE);
  +   return ret;
 
 Note that if vfio_platform_regions_init() fails then your refcnt is wrong.  We
 switched to a mutex in vfio-pci to avoid this.  See 61d792562b53.
 
   }
 
   static long vfio_platform_ioctl(void *device_data, @@ -58,15 +125,33
  @@ static long vfio_platform_ioctl(void *device_data,
  return -EINVAL;
 
  info.flags = vdev-flags;
  -   info.num_regions = 0;
  +   info.num_regions = vdev-num_regions;
  info.num_irqs = 0;
 
  return copy_to_user((void __user *)arg, info, minsz);
 
  -   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
  -   return -EINVAL;
  +   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
  +   struct vfio_region_info info;
  +
  +   minsz = offsetofend(struct vfio_region_info, offset);
  +
  +   if (copy_from_user(info, (void __user *)arg, minsz))
  +   return -EFAULT;
  +
  +   if (info.argsz  minsz)
  +   return -EINVAL;
  +
  +   if (info.index = vdev-num_regions)
  +   return -EINVAL;
  +
  +   /* map offset to the physical 

RE: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device

2014-08-20 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Yijing Wang
> Sent: Wednesday, August 20, 2014 11:59 AM
> To: Bhushan Bharat-R65777; Basu Arnab-B45036
> Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
> paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
> ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org;
> virtualizat...@lists.linux-foundation.org; Hanjun Guo; linux-
> ker...@vger.kernel.org
> Subject: Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
> 
> >> The key difference between PCI device and Non-PCI MSI is the
> >> interfaces to access hardware MSI registers.
> >> for instance, currently, msi_chip->setup_irq() to setup MSI irq and
> >> configure the MSI address/data registers, so we need to provide
> >> device specific
> >> write_msi_msg() interface, then when we call msi_chip->setup_irq(),
> >> the device MSI registers can be configured appropriately.
> >
> > What if we can register/override the setup_irq() from bus-driver (not sure,
> but may be device-driver itself). Example PCI bus-driver will provide
> setup_irq() (or the part of setup_irq which set address and data in h/w) by 
> PCI
> bus, which configure address/data in h/w as per PCI standard.
> >
> > We in Freescale will be using MSI for the devices behind a new-bus (which is
> not PCI based), We have a separate bus driver for same. And this new bus 
> driver
> register/provide its own address/data write function which is based on that
> specific bus protocol.
> 
> Hi Bharat, I'm glad to know your MSI device working mode.
> Provide the private MSI setup functions in bus-driver layer can't apply to all
> Non-PCI MSI devices, because we can not guarantee Non-PCI MSI devices are 
> always
> on a bus. The existing HPET, DMAR device both have no bus bind.

Yes, that's why I was not sure of bus-driver or device-driver model.

> I'm working on a
> new MSI setup framework, as you mentioned before, in device-driver model.
> 
> I abstracted a new virtual device (called struct msi_dev), this msi_dev will
> manage all MSI info,

Will this "struct msi_dev" will be part of "struct device"?

> and a new bus named msi_bus, also introduced a new driver
> msi_driver, msi_bus is responsible for binding msi_dev and msi_driver.
> All MSI devices will be classified into different MSI device types, like
> MSI_TYPE_PCI, MSI_TYPE_HPET, MSI_TYPE_DMAR, etc..
> 
> Each MSI type device should provide a private struct msi_driver. msi_driver
> should contain the type specific MSI ops functions to help setup and enable 
> MSI
> device, request MSI irq.
> 
> I almost finish the first draft, and will post out next week in plan :)

Will be looking forward to next version.

Thanks
-Bharat

> 
> 
> Thanks!
> Yijing.
> 
> >
> > Thanks
> > -Bharat
> >
> >>
> >> My patchset is just a RFC draft, I will update it later, all we want
> >> to do is make kernel support Non-PCI MSI devices.
> >>
> >> Thanks!
> >> Yijing.
> >>
> >>
> >>>
> >>> Thanks
> >>> Arnab
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe
> >>> linux-kernel" in the body of a message to majord...@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> Please read the FAQ at  http://www.tux.org/lkml/
> >>>
> >>> .
> >>>
> >>
> >>
> >> --
> >> Thanks!
> >> Yijing
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-pci"
> >> in the body of a message to majord...@vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> >
> > .
> >
> 
> 
> --
> Thanks!
> Yijing
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 11/11] x86/MSI: Refactor x86 MSI code

2014-08-20 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Yijing Wang
> Sent: Saturday, July 26, 2014 8:39 AM
> To: linux-kernel@vger.kernel.org
> Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
> paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
> ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
> Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
> Wang
> Subject: [RFC PATCH 11/11] x86/MSI: Refactor x86 MSI code

Please provide description about what this refactoring is? Also does other 
architecture also need similar refactoring ?

Thanks
-Bharat

> 
> Signed-off-by: Yijing Wang 
> ---
>  arch/x86/include/asm/io_apic.h   |2 +-
>  arch/x86/include/asm/irq_remapping.h |4 +-
>  arch/x86/include/asm/pci.h   |6 ++--
>  arch/x86/include/asm/x86_init.h  |   10 +++---
>  arch/x86/kernel/apic/io_apic.c   |   23 +++
>  arch/x86/kernel/x86_init.c   |   12 
>  drivers/iommu/amd_iommu.c|   16 ++
>  drivers/iommu/intel_irq_remapping.c  |9 --
>  drivers/iommu/irq_remapping.c|   51 -
>  drivers/iommu/irq_remapping.h|6 ++--
>  drivers/msi/msi.c|3 +-
>  11 files changed, 72 insertions(+), 70 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index 90f97b4..692a90f 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -158,7 +158,7 @@ extern int native_setup_ioapic_entry(int, struct
> IO_APIC_route_entry *,
>struct io_apic_irq_attr *);
>  extern void eoi_ioapic_irq(unsigned int irq, struct irq_cfg *cfg);
> 
> -extern void native_compose_msi_msg(struct pci_dev *pdev,
> +extern void native_compose_msi_msg(struct msi_irqs *msi,
>  unsigned int irq, unsigned int dest,
>  struct msi_msg *msg, u8 hpet_id);
>  extern void native_eoi_ioapic_pin(int apic, int pin, int vector);
> diff --git a/arch/x86/include/asm/irq_remapping.h
> b/arch/x86/include/asm/irq_remapping.h
> index b7747c4..a10003d 100644
> --- a/arch/x86/include/asm/irq_remapping.h
> +++ b/arch/x86/include/asm/irq_remapping.h
> @@ -47,7 +47,7 @@ extern int setup_ioapic_remapped_entry(int irq,
>  int vector,
>  struct io_apic_irq_attr *attr);
>  extern void free_remapped_irq(int irq);
> -extern void compose_remapped_msi_msg(struct pci_dev *pdev,
> +extern void compose_remapped_msi_msg(struct msi_irqs *msi,
>unsigned int irq, unsigned int dest,
>struct msi_msg *msg, u8 hpet_id);
>  extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
> @@ -77,7 +77,7 @@ static inline int setup_ioapic_remapped_entry(int irq,
>   return -ENODEV;
>  }
>  static inline void free_remapped_irq(int irq) { }
> -static inline void compose_remapped_msi_msg(struct pci_dev *pdev,
> +static inline void compose_remapped_msi_msg(struct msi_irqs *msi,
>   unsigned int irq, unsigned int dest,
>   struct msi_msg *msg, u8 hpet_id)
>  {
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index 0892ea0..04c9ef6 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -96,10 +96,10 @@ extern void pci_iommu_alloc(void);
>  #ifdef CONFIG_PCI_MSI
>  /* implemented in arch/x86/kernel/apic/io_apic. */
>  struct msi_desc;
> -int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
> +int native_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type);
>  void native_teardown_msi_irq(unsigned int irq);
> -void native_restore_msi_irqs(struct pci_dev *dev);
> -int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
> +void native_restore_msi_irqs(struct msi_irqs *msi);
> +int setup_msi_irq(struct msi_irqs *msi, struct msi_desc *msidesc,
> unsigned int irq_base, unsigned int irq_offset);
>  #else
>  #define native_setup_msi_irqsNULL
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index e45e4da..8e42f17 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -170,18 +170,18 @@ struct x86_platform_ops {
>   void (*apic_post_init)(void);
>  };
> 
> -struct pci_dev;
> +struct msi_irqs;
>  struct msi_msg;
>  struct msi_desc;
> 
>  struct x86_msi_ops {
> - int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
> - void (*compose_msi_msg)(struct pci_dev *dev, unsigned int irq,
> + int (*setup_msi_irqs)(struct msi_irqs *msi, int nvec, int type);
> + void (*compose_msi_msg)(struct msi_irqs *msi, 

RE: [RFC PATCH 10/11] PCI/MSI: Split the generic MSI code into new file

2014-08-20 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Yijing Wang
> Sent: Saturday, July 26, 2014 8:39 AM
> To: linux-kernel@vger.kernel.org
> Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
> paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
> ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
> Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
> Wang
> Subject: [RFC PATCH 10/11] PCI/MSI: Split the generic MSI code into new file
> 
> MSI interrupt will not only used in PCI device, more
> and more Non-PCI device also want to use MSI. ARM
> GIC v3 spec says in ARM platform with GIC v3 controller,
> Non-PCI device can also be design to support MSI to
> simplify interrupt wires, for the existing Non-PCI
> device, consolidator is designed and used to translate
> legacy interrupt to MSI. So for support Non-PCI MSI
> device, generic MSI driver is needed. Split the generic
> MSI code into new location, drivers/msi/msi.c. Then
> MSI driver does not depend PCI anymore.
> 
> Signed-off-by: Yijing Wang 
> ---
>  drivers/Kconfig  |1 +
>  drivers/Makefile |1 +
>  drivers/msi/Kconfig  |8 +
>  drivers/msi/Makefile |1 +
>  drivers/msi/msi.c|  540 
> ++
>  drivers/pci/Kconfig  |6 +-
>  drivers/pci/msi.c|  500 ---
>  include/linux/msi.h  |   31 +++-
>  8 files changed, 617 insertions(+), 471 deletions(-)
>  create mode 100644 drivers/msi/Kconfig
>  create mode 100644 drivers/msi/Makefile
>  create mode 100644 drivers/msi/msi.c
> 
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 0e87a34..4d05749 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -176,4 +176,5 @@ source "drivers/powercap/Kconfig"
> 
>  source "drivers/mcb/Kconfig"
> 
> +source "drivers/msi/Kconfig"
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index f98b50d..47ae3d1 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -158,3 +158,4 @@ obj-$(CONFIG_NTB) += ntb/
>  obj-$(CONFIG_FMC)+= fmc/
>  obj-$(CONFIG_POWERCAP)   += powercap/
>  obj-$(CONFIG_MCB)+= mcb/
> +obj-$(CONFIG_MSI)+= msi/
> diff --git a/drivers/msi/Kconfig b/drivers/msi/Kconfig
> new file mode 100644
> index 000..739bd13
> --- /dev/null
> +++ b/drivers/msi/Kconfig
> @@ -0,0 +1,8 @@
> +config MSI
> + bool "Message Signaled Interrupts (MSI and MSI-X)"
> + default y
> + help
> + This allows device drivers to use generic MSI(Message
> + Signaled Interrupt). Message Signaled Interrupts enable
> + a device to generate an interrupt using an inbound Memory
> + Write to a specific target address.
> diff --git a/drivers/msi/Makefile b/drivers/msi/Makefile
> new file mode 100644
> index 000..39cb026
> --- /dev/null
> +++ b/drivers/msi/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_MSI) += msi.o
> diff --git a/drivers/msi/msi.c b/drivers/msi/msi.c
> new file mode 100644
> index 000..3fbd539
> --- /dev/null
> +++ b/drivers/msi/msi.c
> @@ -0,0 +1,540 @@
> +/*
> + * File: msi.c
> + * Purpose:  Message Signaled Interrupt (MSI)
> + *
> + * Copyright (C) 2014 Huawei Ltd.
> + * Copyright (C) Yijing Wang 
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Arch hooks */
> +
> +int __weak arch_setup_msi_irq(struct msi_irqs *msi, struct msi_desc *desc)
> +{
> + struct pci_dev *dev = msi->data;
> + struct msi_chip *chip = dev->bus->msi; //TO BE DONE: rework msi_chip to
> support Non-PCI MSI
> + int err;
> +
> + if (!chip || !chip->setup_irq)
> + return -EINVAL;
> +
> + err = chip->setup_irq(chip, dev, desc);
> + if (err < 0)
> + return err;
> +
> + irq_set_chip_data(desc->irq, chip);
> + return 0;
> +}
> +
> +void __weak arch_teardown_msi_irq(unsigned int irq)
> +{
> + struct msi_chip *chip = irq_get_chip_data(irq);
> +
> + if (!chip || !chip->teardown_irq)
> + return;
> +
> + chip->teardown_irq(chip, irq);
> +}
> +
> +int __weak arch_msi_check_device(struct msi_irqs *msi, int nvec, int type)
> +{
> + struct pci_dev *dev = msi->data;
> + struct msi_chip *chip = dev->bus->msi; //TO BE DONE: rework msi_chip to
> support Non-PCI MSI
> +
> + if (!chip || !chip->check_device)
> + return 0;
> +
> + return chip->check_device(chip, dev, nvec, type);
> +}
> +
> +int __weak arch_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type)
> +{
> + struct msi_desc *entry;
> + int ret;
> +
> + /*
> +  * If an architecture wants to support multiple MSI, it needs to
> +  * override arch_setup_msi_irqs()
> +  */
> + 

RE: [RFC PATCH 09/11] PCI/MSI: refactor PCI MSI driver

2014-08-20 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Yijing Wang
> Sent: Saturday, July 26, 2014 8:39 AM
> To: linux-kernel@vger.kernel.org
> Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
> paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
> ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
> Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
> Wang
> Subject: [RFC PATCH 09/11] PCI/MSI: refactor PCI MSI driver
> 
> Use struct msi_ops to hook PCI MSI operations,
> and use struct msi_irqs to refactor PCI MSI drvier.
> 
> Signed-off-by: Yijing Wang 
> ---
>  drivers/pci/msi.c   |  351 
> ++-
>  include/linux/msi.h |   14 +-
>  include/linux/pci.h |   11 +-
>  3 files changed, 222 insertions(+), 154 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 41c33da..f0c5989 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -29,8 +29,9 @@ static int pci_msi_enable = 1;
> 
>  /* Arch hooks */
> 
> -int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
> +int __weak arch_setup_msi_irq(struct msi_irqs *msi, struct msi_desc *desc)
>  {
> + struct pci_dev *dev = msi->data; //TO BE DONE: rework msi_chip to 
> support
> Non-PCI
>   struct msi_chip *chip = dev->bus->msi;
>   int err;
> 
> @@ -56,8 +57,9 @@ void __weak arch_teardown_msi_irq(unsigned int irq)
>   chip->teardown_irq(chip, irq);
>  }
> 
> -int __weak arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
> +int __weak arch_msi_check_device(struct msi_irqs *msi, int nvec, int type)
>  {
> + struct pci_dev *dev = msi->data; //TO BE DONE: rework msi_chip to 
> support
> Non-PCI
>   struct msi_chip *chip = dev->bus->msi;
> 
>   if (!chip || !chip->check_device)
> @@ -66,7 +68,7 @@ int __weak arch_msi_check_device(struct pci_dev *dev, int
> nvec, int type)
>   return chip->check_device(chip, dev, nvec, type);
>  }
> 
> -int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> +int __weak arch_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type)
>  {
>   struct msi_desc *entry;
>   int ret;
> @@ -78,8 +80,8 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int 
> nvec,
> int type)
>   if (type == MSI_TYPE && nvec > 1)
>   return 1;
> 
> - list_for_each_entry(entry, >msi_list, list) {
> - ret = arch_setup_msi_irq(dev, entry);
> + list_for_each_entry(entry, >msi_list, list) {
> + ret = arch_setup_msi_irq(msi, entry);
>   if (ret < 0)
>   return ret;
>   if (ret > 0)
> @@ -93,11 +95,11 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int
> nvec, int type)
>   * We have a default implementation available as a separate non-weak
>   * function, as it is used by the Xen x86 PCI code
>   */
> -void default_teardown_msi_irqs(struct pci_dev *dev)
> +void default_teardown_msi_irqs(struct msi_irqs *msi)
>  {
>   struct msi_desc *entry;
> 
> - list_for_each_entry(entry, >msi_list, list) {
> + list_for_each_entry(entry, >msi_list, list) {
>   int i, nvec;
>   if (entry->irq == 0)
>   continue;
> @@ -110,22 +112,22 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
>   }
>  }
> 
> -void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
> +void __weak arch_teardown_msi_irqs(struct msi_irqs *msi)
>  {
> - return default_teardown_msi_irqs(dev);
> + return default_teardown_msi_irqs(msi);
>  }
> 
> -static void default_restore_msi_irq(struct pci_dev *dev, int irq)
> +static void default_restore_msi_irq(struct msi_irqs *msi, int irq)
>  {
>   struct msi_desc *entry;
> 
>   entry = NULL;
> - if (dev->msix_enabled) {
> - list_for_each_entry(entry, >msi_list, list) {
> + if (msi->msix_enabled) {
> + list_for_each_entry(entry, >msi_list, list) {
>   if (irq == entry->irq)
>   break;
>   }
> - } else if (pci_dev_msi_enabled(dev, MSI_TYPE))  {
> + } else if (msi->msi_enabled)  {
>   entry = irq_get_msi_desc(irq);
>   }
> 
> @@ -133,20 +135,9 @@ static void default_restore_msi_irq(struct pci_dev *dev,
> int irq)
>   write_msi_msg(irq, >msg);
>  }
> 
> -void __weak arch_restore_msi_irqs(struct pci_dev *dev)
> +void __weak arch_restore_msi_irqs(struct msi_irqs *msi)
>  {
> - return default_restore_msi_irqs(dev);
> -}
> -
> -static void msi_set_enable(struct pci_dev *dev, int enable)
> -{
> - u16 control;
> -
> - pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, );
> - control &= ~PCI_MSI_FLAGS_ENABLE;
> - if (enable)
> - control |= PCI_MSI_FLAGS_ENABLE;
> - pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, 

RE: [RFC PATCH 09/11] PCI/MSI: refactor PCI MSI driver

2014-08-20 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Yijing Wang
 Sent: Saturday, July 26, 2014 8:39 AM
 To: linux-kernel@vger.kernel.org
 Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
 paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
 ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
 Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
 Wang
 Subject: [RFC PATCH 09/11] PCI/MSI: refactor PCI MSI driver
 
 Use struct msi_ops to hook PCI MSI operations,
 and use struct msi_irqs to refactor PCI MSI drvier.
 
 Signed-off-by: Yijing Wang wangyij...@huawei.com
 ---
  drivers/pci/msi.c   |  351 
 ++-
  include/linux/msi.h |   14 +-
  include/linux/pci.h |   11 +-
  3 files changed, 222 insertions(+), 154 deletions(-)
 
 diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
 index 41c33da..f0c5989 100644
 --- a/drivers/pci/msi.c
 +++ b/drivers/pci/msi.c
 @@ -29,8 +29,9 @@ static int pci_msi_enable = 1;
 
  /* Arch hooks */
 
 -int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
 +int __weak arch_setup_msi_irq(struct msi_irqs *msi, struct msi_desc *desc)
  {
 + struct pci_dev *dev = msi-data; //TO BE DONE: rework msi_chip to 
 support
 Non-PCI
   struct msi_chip *chip = dev-bus-msi;
   int err;
 
 @@ -56,8 +57,9 @@ void __weak arch_teardown_msi_irq(unsigned int irq)
   chip-teardown_irq(chip, irq);
  }
 
 -int __weak arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
 +int __weak arch_msi_check_device(struct msi_irqs *msi, int nvec, int type)
  {
 + struct pci_dev *dev = msi-data; //TO BE DONE: rework msi_chip to 
 support
 Non-PCI
   struct msi_chip *chip = dev-bus-msi;
 
   if (!chip || !chip-check_device)
 @@ -66,7 +68,7 @@ int __weak arch_msi_check_device(struct pci_dev *dev, int
 nvec, int type)
   return chip-check_device(chip, dev, nvec, type);
  }
 
 -int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 +int __weak arch_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type)
  {
   struct msi_desc *entry;
   int ret;
 @@ -78,8 +80,8 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int 
 nvec,
 int type)
   if (type == MSI_TYPE  nvec  1)
   return 1;
 
 - list_for_each_entry(entry, dev-msi_list, list) {
 - ret = arch_setup_msi_irq(dev, entry);
 + list_for_each_entry(entry, msi-msi_list, list) {
 + ret = arch_setup_msi_irq(msi, entry);
   if (ret  0)
   return ret;
   if (ret  0)
 @@ -93,11 +95,11 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int
 nvec, int type)
   * We have a default implementation available as a separate non-weak
   * function, as it is used by the Xen x86 PCI code
   */
 -void default_teardown_msi_irqs(struct pci_dev *dev)
 +void default_teardown_msi_irqs(struct msi_irqs *msi)
  {
   struct msi_desc *entry;
 
 - list_for_each_entry(entry, dev-msi_list, list) {
 + list_for_each_entry(entry, msi-msi_list, list) {
   int i, nvec;
   if (entry-irq == 0)
   continue;
 @@ -110,22 +112,22 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
   }
  }
 
 -void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
 +void __weak arch_teardown_msi_irqs(struct msi_irqs *msi)
  {
 - return default_teardown_msi_irqs(dev);
 + return default_teardown_msi_irqs(msi);
  }
 
 -static void default_restore_msi_irq(struct pci_dev *dev, int irq)
 +static void default_restore_msi_irq(struct msi_irqs *msi, int irq)
  {
   struct msi_desc *entry;
 
   entry = NULL;
 - if (dev-msix_enabled) {
 - list_for_each_entry(entry, dev-msi_list, list) {
 + if (msi-msix_enabled) {
 + list_for_each_entry(entry, msi-msi_list, list) {
   if (irq == entry-irq)
   break;
   }
 - } else if (pci_dev_msi_enabled(dev, MSI_TYPE))  {
 + } else if (msi-msi_enabled)  {
   entry = irq_get_msi_desc(irq);
   }
 
 @@ -133,20 +135,9 @@ static void default_restore_msi_irq(struct pci_dev *dev,
 int irq)
   write_msi_msg(irq, entry-msg);
  }
 
 -void __weak arch_restore_msi_irqs(struct pci_dev *dev)
 +void __weak arch_restore_msi_irqs(struct msi_irqs *msi)
  {
 - return default_restore_msi_irqs(dev);
 -}
 -
 -static void msi_set_enable(struct pci_dev *dev, int enable)
 -{
 - u16 control;
 -
 - pci_read_config_word(dev, dev-msi_cap + PCI_MSI_FLAGS, control);
 - control = ~PCI_MSI_FLAGS_ENABLE;
 - if (enable)
 - control |= PCI_MSI_FLAGS_ENABLE;
 - pci_write_config_word(dev, dev-msi_cap + PCI_MSI_FLAGS, control);
 + return default_restore_msi_irqs(msi);
  }
 
  static void msix_clear_and_set_ctrl(struct pci_dev 

RE: [RFC PATCH 10/11] PCI/MSI: Split the generic MSI code into new file

2014-08-20 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Yijing Wang
 Sent: Saturday, July 26, 2014 8:39 AM
 To: linux-kernel@vger.kernel.org
 Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
 paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
 ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
 Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
 Wang
 Subject: [RFC PATCH 10/11] PCI/MSI: Split the generic MSI code into new file
 
 MSI interrupt will not only used in PCI device, more
 and more Non-PCI device also want to use MSI. ARM
 GIC v3 spec says in ARM platform with GIC v3 controller,
 Non-PCI device can also be design to support MSI to
 simplify interrupt wires, for the existing Non-PCI
 device, consolidator is designed and used to translate
 legacy interrupt to MSI. So for support Non-PCI MSI
 device, generic MSI driver is needed. Split the generic
 MSI code into new location, drivers/msi/msi.c. Then
 MSI driver does not depend PCI anymore.
 
 Signed-off-by: Yijing Wang wangyij...@huawei.com
 ---
  drivers/Kconfig  |1 +
  drivers/Makefile |1 +
  drivers/msi/Kconfig  |8 +
  drivers/msi/Makefile |1 +
  drivers/msi/msi.c|  540 
 ++
  drivers/pci/Kconfig  |6 +-
  drivers/pci/msi.c|  500 ---
  include/linux/msi.h  |   31 +++-
  8 files changed, 617 insertions(+), 471 deletions(-)
  create mode 100644 drivers/msi/Kconfig
  create mode 100644 drivers/msi/Makefile
  create mode 100644 drivers/msi/msi.c
 
 diff --git a/drivers/Kconfig b/drivers/Kconfig
 index 0e87a34..4d05749 100644
 --- a/drivers/Kconfig
 +++ b/drivers/Kconfig
 @@ -176,4 +176,5 @@ source drivers/powercap/Kconfig
 
  source drivers/mcb/Kconfig
 
 +source drivers/msi/Kconfig
  endmenu
 diff --git a/drivers/Makefile b/drivers/Makefile
 index f98b50d..47ae3d1 100644
 --- a/drivers/Makefile
 +++ b/drivers/Makefile
 @@ -158,3 +158,4 @@ obj-$(CONFIG_NTB) += ntb/
  obj-$(CONFIG_FMC)+= fmc/
  obj-$(CONFIG_POWERCAP)   += powercap/
  obj-$(CONFIG_MCB)+= mcb/
 +obj-$(CONFIG_MSI)+= msi/
 diff --git a/drivers/msi/Kconfig b/drivers/msi/Kconfig
 new file mode 100644
 index 000..739bd13
 --- /dev/null
 +++ b/drivers/msi/Kconfig
 @@ -0,0 +1,8 @@
 +config MSI
 + bool Message Signaled Interrupts (MSI and MSI-X)
 + default y
 + help
 + This allows device drivers to use generic MSI(Message
 + Signaled Interrupt). Message Signaled Interrupts enable
 + a device to generate an interrupt using an inbound Memory
 + Write to a specific target address.
 diff --git a/drivers/msi/Makefile b/drivers/msi/Makefile
 new file mode 100644
 index 000..39cb026
 --- /dev/null
 +++ b/drivers/msi/Makefile
 @@ -0,0 +1 @@
 +obj-$(CONFIG_MSI) += msi.o
 diff --git a/drivers/msi/msi.c b/drivers/msi/msi.c
 new file mode 100644
 index 000..3fbd539
 --- /dev/null
 +++ b/drivers/msi/msi.c
 @@ -0,0 +1,540 @@
 +/*
 + * File: msi.c
 + * Purpose:  Message Signaled Interrupt (MSI)
 + *
 + * Copyright (C) 2014 Huawei Ltd.
 + * Copyright (C) Yijing Wang wangyij...@huawei.com
 + */
 +#include linux/err.h
 +#include linux/mm.h
 +#include linux/irq.h
 +#include linux/interrupt.h
 +#include linux/export.h
 +#include linux/ioport.h
 +#include linux/proc_fs.h
 +#include linux/msi.h
 +#include linux/smp.h
 +#include linux/errno.h
 +#include linux/io.h
 +#include linux/slab.h
 +#include linux/device.h
 +#include linux/pci.h
 +
 +/* Arch hooks */
 +
 +int __weak arch_setup_msi_irq(struct msi_irqs *msi, struct msi_desc *desc)
 +{
 + struct pci_dev *dev = msi-data;
 + struct msi_chip *chip = dev-bus-msi; //TO BE DONE: rework msi_chip to
 support Non-PCI MSI
 + int err;
 +
 + if (!chip || !chip-setup_irq)
 + return -EINVAL;
 +
 + err = chip-setup_irq(chip, dev, desc);
 + if (err  0)
 + return err;
 +
 + irq_set_chip_data(desc-irq, chip);
 + return 0;
 +}
 +
 +void __weak arch_teardown_msi_irq(unsigned int irq)
 +{
 + struct msi_chip *chip = irq_get_chip_data(irq);
 +
 + if (!chip || !chip-teardown_irq)
 + return;
 +
 + chip-teardown_irq(chip, irq);
 +}
 +
 +int __weak arch_msi_check_device(struct msi_irqs *msi, int nvec, int type)
 +{
 + struct pci_dev *dev = msi-data;
 + struct msi_chip *chip = dev-bus-msi; //TO BE DONE: rework msi_chip to
 support Non-PCI MSI
 +
 + if (!chip || !chip-check_device)
 + return 0;
 +
 + return chip-check_device(chip, dev, nvec, type);
 +}
 +
 +int __weak arch_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type)
 +{
 + struct msi_desc *entry;
 + int ret;
 +
 + /*
 +  * If an architecture wants to support multiple MSI, it needs to
 +  * override 

RE: [RFC PATCH 11/11] x86/MSI: Refactor x86 MSI code

2014-08-20 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Yijing Wang
 Sent: Saturday, July 26, 2014 8:39 AM
 To: linux-kernel@vger.kernel.org
 Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
 paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
 ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
 Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
 Wang
 Subject: [RFC PATCH 11/11] x86/MSI: Refactor x86 MSI code

Please provide description about what this refactoring is? Also does other 
architecture also need similar refactoring ?

Thanks
-Bharat

 
 Signed-off-by: Yijing Wang wangyij...@huawei.com
 ---
  arch/x86/include/asm/io_apic.h   |2 +-
  arch/x86/include/asm/irq_remapping.h |4 +-
  arch/x86/include/asm/pci.h   |6 ++--
  arch/x86/include/asm/x86_init.h  |   10 +++---
  arch/x86/kernel/apic/io_apic.c   |   23 +++
  arch/x86/kernel/x86_init.c   |   12 
  drivers/iommu/amd_iommu.c|   16 ++
  drivers/iommu/intel_irq_remapping.c  |9 --
  drivers/iommu/irq_remapping.c|   51 -
  drivers/iommu/irq_remapping.h|6 ++--
  drivers/msi/msi.c|3 +-
  11 files changed, 72 insertions(+), 70 deletions(-)
 
 diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
 index 90f97b4..692a90f 100644
 --- a/arch/x86/include/asm/io_apic.h
 +++ b/arch/x86/include/asm/io_apic.h
 @@ -158,7 +158,7 @@ extern int native_setup_ioapic_entry(int, struct
 IO_APIC_route_entry *,
struct io_apic_irq_attr *);
  extern void eoi_ioapic_irq(unsigned int irq, struct irq_cfg *cfg);
 
 -extern void native_compose_msi_msg(struct pci_dev *pdev,
 +extern void native_compose_msi_msg(struct msi_irqs *msi,
  unsigned int irq, unsigned int dest,
  struct msi_msg *msg, u8 hpet_id);
  extern void native_eoi_ioapic_pin(int apic, int pin, int vector);
 diff --git a/arch/x86/include/asm/irq_remapping.h
 b/arch/x86/include/asm/irq_remapping.h
 index b7747c4..a10003d 100644
 --- a/arch/x86/include/asm/irq_remapping.h
 +++ b/arch/x86/include/asm/irq_remapping.h
 @@ -47,7 +47,7 @@ extern int setup_ioapic_remapped_entry(int irq,
  int vector,
  struct io_apic_irq_attr *attr);
  extern void free_remapped_irq(int irq);
 -extern void compose_remapped_msi_msg(struct pci_dev *pdev,
 +extern void compose_remapped_msi_msg(struct msi_irqs *msi,
unsigned int irq, unsigned int dest,
struct msi_msg *msg, u8 hpet_id);
  extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
 @@ -77,7 +77,7 @@ static inline int setup_ioapic_remapped_entry(int irq,
   return -ENODEV;
  }
  static inline void free_remapped_irq(int irq) { }
 -static inline void compose_remapped_msi_msg(struct pci_dev *pdev,
 +static inline void compose_remapped_msi_msg(struct msi_irqs *msi,
   unsigned int irq, unsigned int dest,
   struct msi_msg *msg, u8 hpet_id)
  {
 diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
 index 0892ea0..04c9ef6 100644
 --- a/arch/x86/include/asm/pci.h
 +++ b/arch/x86/include/asm/pci.h
 @@ -96,10 +96,10 @@ extern void pci_iommu_alloc(void);
  #ifdef CONFIG_PCI_MSI
  /* implemented in arch/x86/kernel/apic/io_apic. */
  struct msi_desc;
 -int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 +int native_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type);
  void native_teardown_msi_irq(unsigned int irq);
 -void native_restore_msi_irqs(struct pci_dev *dev);
 -int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
 +void native_restore_msi_irqs(struct msi_irqs *msi);
 +int setup_msi_irq(struct msi_irqs *msi, struct msi_desc *msidesc,
 unsigned int irq_base, unsigned int irq_offset);
  #else
  #define native_setup_msi_irqsNULL
 diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
 index e45e4da..8e42f17 100644
 --- a/arch/x86/include/asm/x86_init.h
 +++ b/arch/x86/include/asm/x86_init.h
 @@ -170,18 +170,18 @@ struct x86_platform_ops {
   void (*apic_post_init)(void);
  };
 
 -struct pci_dev;
 +struct msi_irqs;
  struct msi_msg;
  struct msi_desc;
 
  struct x86_msi_ops {
 - int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
 - void (*compose_msi_msg)(struct pci_dev *dev, unsigned int irq,
 + int (*setup_msi_irqs)(struct msi_irqs *msi, int nvec, int type);
 + void (*compose_msi_msg)(struct msi_irqs *msi, unsigned int irq,
   unsigned int dest, struct 

RE: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device

2014-08-20 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Yijing Wang
 Sent: Wednesday, August 20, 2014 11:59 AM
 To: Bhushan Bharat-R65777; Basu Arnab-B45036
 Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
 paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
 ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org;
 virtualizat...@lists.linux-foundation.org; Hanjun Guo; linux-
 ker...@vger.kernel.org
 Subject: Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
 
  The key difference between PCI device and Non-PCI MSI is the
  interfaces to access hardware MSI registers.
  for instance, currently, msi_chip-setup_irq() to setup MSI irq and
  configure the MSI address/data registers, so we need to provide
  device specific
  write_msi_msg() interface, then when we call msi_chip-setup_irq(),
  the device MSI registers can be configured appropriately.
 
  What if we can register/override the setup_irq() from bus-driver (not sure,
 but may be device-driver itself). Example PCI bus-driver will provide
 setup_irq() (or the part of setup_irq which set address and data in h/w) by 
 PCI
 bus, which configure address/data in h/w as per PCI standard.
 
  We in Freescale will be using MSI for the devices behind a new-bus (which is
 not PCI based), We have a separate bus driver for same. And this new bus 
 driver
 register/provide its own address/data write function which is based on that
 specific bus protocol.
 
 Hi Bharat, I'm glad to know your MSI device working mode.
 Provide the private MSI setup functions in bus-driver layer can't apply to all
 Non-PCI MSI devices, because we can not guarantee Non-PCI MSI devices are 
 always
 on a bus. The existing HPET, DMAR device both have no bus bind.

Yes, that's why I was not sure of bus-driver or device-driver model.

 I'm working on a
 new MSI setup framework, as you mentioned before, in device-driver model.
 
 I abstracted a new virtual device (called struct msi_dev), this msi_dev will
 manage all MSI info,

Will this struct msi_dev will be part of struct device?

 and a new bus named msi_bus, also introduced a new driver
 msi_driver, msi_bus is responsible for binding msi_dev and msi_driver.
 All MSI devices will be classified into different MSI device types, like
 MSI_TYPE_PCI, MSI_TYPE_HPET, MSI_TYPE_DMAR, etc..
 
 Each MSI type device should provide a private struct msi_driver. msi_driver
 should contain the type specific MSI ops functions to help setup and enable 
 MSI
 device, request MSI irq.
 
 I almost finish the first draft, and will post out next week in plan :)

Will be looking forward to next version.

Thanks
-Bharat

 
 
 Thanks!
 Yijing.
 
 
  Thanks
  -Bharat
 
 
  My patchset is just a RFC draft, I will update it later, all we want
  to do is make kernel support Non-PCI MSI devices.
 
  Thanks!
  Yijing.
 
 
 
  Thanks
  Arnab
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-kernel in the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
  .
 
 
 
  --
  Thanks!
  Yijing
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-pci
  in the body of a message to majord...@vger.kernel.org More majordomo
  info at http://vger.kernel.org/majordomo-info.html
 
  .
 
 
 
 --
 Thanks!
 Yijing
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 03/11] PCI/MSI: Refactor pci_dev_msi_enabled()

2014-08-19 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Yijing Wang
> Sent: Saturday, July 26, 2014 8:39 AM
> To: linux-kernel@vger.kernel.org
> Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
> paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
> ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
> Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
> Wang
> Subject: [RFC PATCH 03/11] PCI/MSI: Refactor pci_dev_msi_enabled()
> 
> Pci_dev_msi_enabled() is used to check whether device MSI/MSIX enabled. 
> Refactor
> this function  to suuport checking only device MSI or MSIX enabled.

s/support/support

>From code it looks like you added one more parameter to pci_dev_msi_enabled() 
>to check for a specific type, which earlier it was checking for both MSI and 
>MSIX enable. While the description is not clear to me, Am I missing something ?

Thanks
-Bharat


> 
> Signed-off-by: Yijing Wang 
> ---
>  arch/cris/arch-v32/drivers/pci/bios.c |2 +-
>  arch/frv/mb93090-mb00/pci-vdk.c   |2 +-
>  arch/ia64/pci/pci.c   |4 ++--
>  arch/powerpc/kernel/eeh_driver.c  |2 +-
>  arch/x86/pci/common.c |5 +++--
>  drivers/block/nvme-core.c |4 ++--
>  drivers/dma/ioat/dma.c|2 +-
>  drivers/firewire/ohci.c   |2 +-
>  drivers/gpu/drm/i915/i915_dma.c   |4 ++--
>  drivers/misc/mei/hw-me.c  |2 +-
>  drivers/misc/mei/hw-txe.c |2 +-
>  drivers/misc/mei/pci-me.c |4 ++--
>  drivers/misc/mei/pci-txe.c|4 ++--
>  drivers/misc/mic/host/mic_debugfs.c   |4 ++--
>  drivers/misc/mic/host/mic_intr.c  |8 
>  drivers/ntb/ntb_hw.c  |2 +-
>  drivers/pci/irq.c |4 ++--
>  drivers/pci/msi.c |   15 +--
>  drivers/pci/pci.c |6 +++---
>  drivers/pci/pcie/portdrv_core.c   |4 ++--
>  drivers/scsi/esas2r/esas2r_init.c |4 ++--
>  drivers/scsi/esas2r/esas2r_ioctl.c|4 ++--
>  drivers/scsi/hpsa.c   |4 ++--
>  drivers/staging/crystalhd/crystalhd_lnx.c |2 +-
>  drivers/xen/xen-pciback/pciback_ops.c |   12 ++--
>  include/linux/pci.h   |   12 ++--
>  virt/kvm/assigned-dev.c   |2 +-
>  27 files changed, 67 insertions(+), 55 deletions(-)
> 
> diff --git a/arch/cris/arch-v32/drivers/pci/bios.c b/arch/cris/arch-
> v32/drivers/pci/bios.c
> index 64a5fb9..d9d8332 100644
> --- a/arch/cris/arch-v32/drivers/pci/bios.c
> +++ b/arch/cris/arch-v32/drivers/pci/bios.c
> @@ -93,7 +93,7 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
>   if ((err = pcibios_enable_resources(dev, mask)) < 0)
>   return err;
> 
> - if (!dev->msi_enabled)
> + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
>   pcibios_enable_irq(dev);
>   return 0;
>  }
> diff --git a/arch/frv/mb93090-mb00/pci-vdk.c b/arch/frv/mb93090-mb00/pci-vdk.c
> index efa5d65..b96c128 100644
> --- a/arch/frv/mb93090-mb00/pci-vdk.c
> +++ b/arch/frv/mb93090-mb00/pci-vdk.c
> @@ -409,7 +409,7 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
> 
>   if ((err = pci_enable_resources(dev, mask)) < 0)
>   return err;
> - if (!dev->msi_enabled)
> + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
>   pcibios_enable_irq(dev);
>   return 0;
>  }
> diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 291a582..da8ddff
> 100644
> --- a/arch/ia64/pci/pci.c
> +++ b/arch/ia64/pci/pci.c
> @@ -568,7 +568,7 @@ pcibios_enable_device (struct pci_dev *dev, int mask)
>   if (ret < 0)
>   return ret;
> 
> - if (!dev->msi_enabled)
> + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
>   return acpi_pci_irq_enable(dev);
>   return 0;
>  }
> @@ -577,7 +577,7 @@ void
>  pcibios_disable_device (struct pci_dev *dev)  {
>   BUG_ON(atomic_read(>enable_cnt));
> - if (!dev->msi_enabled)
> + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
>   acpi_pci_irq_disable(dev);
>  }
> 
> diff --git a/arch/powerpc/kernel/eeh_driver.c 
> b/arch/powerpc/kernel/eeh_driver.c
> index 420da61..e3f2074 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -123,7 +123,7 @@ static void eeh_disable_irq(struct pci_dev *dev)
>* effectively disabled by the DMA Stopped state
>* when an EEH error occurs.
>*/
> - if (dev->msi_enabled || dev->msix_enabled)
> + if (pci_dev_msi_enabled(dev, MSI_TYPE | MSIX_TYPE))
>   return;
> 
>   if (!irq_has_action(dev->irq))
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c 

RE: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device

2014-08-19 Thread bharat.bhus...@freescale.com
Hi Yijing

> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Yijing Wang
> Sent: Monday, August 04, 2014 8:34 AM
> To: Basu Arnab-B45036
> Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
> paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
> ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org;
> virtualizat...@lists.linux-foundation.org; Hanjun Guo; linux-
> ker...@vger.kernel.org
> Subject: Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
> 
> >> MSI was introduced in PCI Spec 2.2. Currently, kernel MSI driver
> >> codes are bonding with PCI device. Because MSI has a lot advantages in
> design.
> >> More and more non-PCI devices want to use MSI as their default interrupt.
> >> The existing MSI device include HPET. HPET driver provide its own MSI
> >> code to initialize and process MSI interrupts. In the latest GIC v3
> >> spec, legacy device can deliver MSI by the help of a relay device
> >> named consolidator.
> >> Consolidator can translate the legacy interrupts connected to it to
> >> MSI/MSI-X. And new non-PCI device will be designed to support MSI in
> >> future. So make the MSI driver code be generic will help the non-PCI
> >> device use MSI more simply.
> >
> > As per my understanding the GICv3 provides a service that will convert 
> > writes
> to a specified address to IRQs delivered to the core and as you mention above
> MSIs are part of the PCI spec. So I can see a strong case for non-PCI devices 
> to
> want MSI like functionality without being fully compliant with the 
> requirements
> of the MSI spec.
> 
> In GICv3, MBI is named for the service, but there is no more detailed
> information about it, only we can know is MBI is analogous to MSI, MBI devices
> must have address/data registers, but other registers like enable/mask/ctrl 
> are
> not mandatory requirement.
> I don't know whether the MBI spec will be release, but anyway I think MSI
> refactoring is make sense, there are some existing Non-PCI MSI device like 
> hpet,
> dmar.
> For simplicity, let name MSI and MBI to MSI temporarily.
> >
> > My question is do we necessarily want to rework so much of the PCI-MSI layer
> to support non PCI devices? Or will it be sufficient to create a framework to
> allow non PCI devices to hook up with a device that can convert their writes 
> to
> an IRQ to the core.
> >
> > As I understand it, the msi_chip is (almost) such a framework. The only
> problem being that it makes some PCI specific assumptions (such as PCI 
> specific
> writes from within msi_chip functions). Won't it be sufficient to make the
> msi_chip framework generic enough to be used by non-PCI devices and let each
> bus/device manage any additional requirements (such as configuration flow, bit
> definitions etc) that it places on message based interrupts?
> 
> msi_chip framework is important to support that, but I think maybe it's not
> enough, msi_chip is only responsible for IRQ allocation, teardown, etc..
> 
> The key difference between PCI device and Non-PCI MSI is the interfaces to
> access hardware MSI registers.
> for instance, currently, msi_chip->setup_irq() to setup MSI irq and configure
> the MSI address/data registers, so we need to provide device specific
> write_msi_msg() interface, then when we call msi_chip->setup_irq(), the device
> MSI registers can be configured appropriately.

What if we can register/override the setup_irq() from bus-driver (not sure, but 
may be device-driver itself). Example PCI bus-driver will provide setup_irq() 
(or the part of setup_irq which set address and data in h/w) by PCI bus, which 
configure address/data in h/w as per PCI standard. 

We in Freescale will be using MSI for the devices behind a new-bus (which is 
not PCI based), We have a separate bus driver for same. And this new bus driver 
register/provide its own address/data write function which is based on that 
specific bus protocol.

Thanks
-Bharat

> 
> My patchset is just a RFC draft, I will update it later, all we want to do is
> make kernel support Non-PCI MSI devices.
> 
> Thanks!
> Yijing.
> 
> 
> >
> > Thanks
> > Arnab
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > .
> >
> 
> 
> --
> Thanks!
> Yijing
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device

2014-08-19 Thread bharat.bhus...@freescale.com
Hi Yijing

 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Yijing Wang
 Sent: Monday, August 04, 2014 8:34 AM
 To: Basu Arnab-B45036
 Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
 paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
 ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org;
 virtualizat...@lists.linux-foundation.org; Hanjun Guo; linux-
 ker...@vger.kernel.org
 Subject: Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
 
  MSI was introduced in PCI Spec 2.2. Currently, kernel MSI driver
  codes are bonding with PCI device. Because MSI has a lot advantages in
 design.
  More and more non-PCI devices want to use MSI as their default interrupt.
  The existing MSI device include HPET. HPET driver provide its own MSI
  code to initialize and process MSI interrupts. In the latest GIC v3
  spec, legacy device can deliver MSI by the help of a relay device
  named consolidator.
  Consolidator can translate the legacy interrupts connected to it to
  MSI/MSI-X. And new non-PCI device will be designed to support MSI in
  future. So make the MSI driver code be generic will help the non-PCI
  device use MSI more simply.
 
  As per my understanding the GICv3 provides a service that will convert 
  writes
 to a specified address to IRQs delivered to the core and as you mention above
 MSIs are part of the PCI spec. So I can see a strong case for non-PCI devices 
 to
 want MSI like functionality without being fully compliant with the 
 requirements
 of the MSI spec.
 
 In GICv3, MBI is named for the service, but there is no more detailed
 information about it, only we can know is MBI is analogous to MSI, MBI devices
 must have address/data registers, but other registers like enable/mask/ctrl 
 are
 not mandatory requirement.
 I don't know whether the MBI spec will be release, but anyway I think MSI
 refactoring is make sense, there are some existing Non-PCI MSI device like 
 hpet,
 dmar.
 For simplicity, let name MSI and MBI to MSI temporarily.
 
  My question is do we necessarily want to rework so much of the PCI-MSI layer
 to support non PCI devices? Or will it be sufficient to create a framework to
 allow non PCI devices to hook up with a device that can convert their writes 
 to
 an IRQ to the core.
 
  As I understand it, the msi_chip is (almost) such a framework. The only
 problem being that it makes some PCI specific assumptions (such as PCI 
 specific
 writes from within msi_chip functions). Won't it be sufficient to make the
 msi_chip framework generic enough to be used by non-PCI devices and let each
 bus/device manage any additional requirements (such as configuration flow, bit
 definitions etc) that it places on message based interrupts?
 
 msi_chip framework is important to support that, but I think maybe it's not
 enough, msi_chip is only responsible for IRQ allocation, teardown, etc..
 
 The key difference between PCI device and Non-PCI MSI is the interfaces to
 access hardware MSI registers.
 for instance, currently, msi_chip-setup_irq() to setup MSI irq and configure
 the MSI address/data registers, so we need to provide device specific
 write_msi_msg() interface, then when we call msi_chip-setup_irq(), the device
 MSI registers can be configured appropriately.

What if we can register/override the setup_irq() from bus-driver (not sure, but 
may be device-driver itself). Example PCI bus-driver will provide setup_irq() 
(or the part of setup_irq which set address and data in h/w) by PCI bus, which 
configure address/data in h/w as per PCI standard. 

We in Freescale will be using MSI for the devices behind a new-bus (which is 
not PCI based), We have a separate bus driver for same. And this new bus driver 
register/provide its own address/data write function which is based on that 
specific bus protocol.

Thanks
-Bharat

 
 My patchset is just a RFC draft, I will update it later, all we want to do is
 make kernel support Non-PCI MSI devices.
 
 Thanks!
 Yijing.
 
 
 
  Thanks
  Arnab
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-kernel in the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
  .
 
 
 
 --
 Thanks!
 Yijing
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 03/11] PCI/MSI: Refactor pci_dev_msi_enabled()

2014-08-19 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Yijing Wang
 Sent: Saturday, July 26, 2014 8:39 AM
 To: linux-kernel@vger.kernel.org
 Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org;
 paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm-
 ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu
 Arnab-B45036; virtualizat...@lists.linux-foundation.org; Hanjun Guo; Yijing 
 Wang
 Subject: [RFC PATCH 03/11] PCI/MSI: Refactor pci_dev_msi_enabled()
 
 Pci_dev_msi_enabled() is used to check whether device MSI/MSIX enabled. 
 Refactor
 this function  to suuport checking only device MSI or MSIX enabled.

s/support/support

From code it looks like you added one more parameter to pci_dev_msi_enabled() 
to check for a specific type, which earlier it was checking for both MSI and 
MSIX enable. While the description is not clear to me, Am I missing something ?

Thanks
-Bharat


 
 Signed-off-by: Yijing Wang wangyij...@huawei.com
 ---
  arch/cris/arch-v32/drivers/pci/bios.c |2 +-
  arch/frv/mb93090-mb00/pci-vdk.c   |2 +-
  arch/ia64/pci/pci.c   |4 ++--
  arch/powerpc/kernel/eeh_driver.c  |2 +-
  arch/x86/pci/common.c |5 +++--
  drivers/block/nvme-core.c |4 ++--
  drivers/dma/ioat/dma.c|2 +-
  drivers/firewire/ohci.c   |2 +-
  drivers/gpu/drm/i915/i915_dma.c   |4 ++--
  drivers/misc/mei/hw-me.c  |2 +-
  drivers/misc/mei/hw-txe.c |2 +-
  drivers/misc/mei/pci-me.c |4 ++--
  drivers/misc/mei/pci-txe.c|4 ++--
  drivers/misc/mic/host/mic_debugfs.c   |4 ++--
  drivers/misc/mic/host/mic_intr.c  |8 
  drivers/ntb/ntb_hw.c  |2 +-
  drivers/pci/irq.c |4 ++--
  drivers/pci/msi.c |   15 +--
  drivers/pci/pci.c |6 +++---
  drivers/pci/pcie/portdrv_core.c   |4 ++--
  drivers/scsi/esas2r/esas2r_init.c |4 ++--
  drivers/scsi/esas2r/esas2r_ioctl.c|4 ++--
  drivers/scsi/hpsa.c   |4 ++--
  drivers/staging/crystalhd/crystalhd_lnx.c |2 +-
  drivers/xen/xen-pciback/pciback_ops.c |   12 ++--
  include/linux/pci.h   |   12 ++--
  virt/kvm/assigned-dev.c   |2 +-
  27 files changed, 67 insertions(+), 55 deletions(-)
 
 diff --git a/arch/cris/arch-v32/drivers/pci/bios.c b/arch/cris/arch-
 v32/drivers/pci/bios.c
 index 64a5fb9..d9d8332 100644
 --- a/arch/cris/arch-v32/drivers/pci/bios.c
 +++ b/arch/cris/arch-v32/drivers/pci/bios.c
 @@ -93,7 +93,7 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
   if ((err = pcibios_enable_resources(dev, mask))  0)
   return err;
 
 - if (!dev-msi_enabled)
 + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
   pcibios_enable_irq(dev);
   return 0;
  }
 diff --git a/arch/frv/mb93090-mb00/pci-vdk.c b/arch/frv/mb93090-mb00/pci-vdk.c
 index efa5d65..b96c128 100644
 --- a/arch/frv/mb93090-mb00/pci-vdk.c
 +++ b/arch/frv/mb93090-mb00/pci-vdk.c
 @@ -409,7 +409,7 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
 
   if ((err = pci_enable_resources(dev, mask))  0)
   return err;
 - if (!dev-msi_enabled)
 + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
   pcibios_enable_irq(dev);
   return 0;
  }
 diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 291a582..da8ddff
 100644
 --- a/arch/ia64/pci/pci.c
 +++ b/arch/ia64/pci/pci.c
 @@ -568,7 +568,7 @@ pcibios_enable_device (struct pci_dev *dev, int mask)
   if (ret  0)
   return ret;
 
 - if (!dev-msi_enabled)
 + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
   return acpi_pci_irq_enable(dev);
   return 0;
  }
 @@ -577,7 +577,7 @@ void
  pcibios_disable_device (struct pci_dev *dev)  {
   BUG_ON(atomic_read(dev-enable_cnt));
 - if (!dev-msi_enabled)
 + if (!pci_dev_msi_enabled(dev, MSI_TYPE))
   acpi_pci_irq_disable(dev);
  }
 
 diff --git a/arch/powerpc/kernel/eeh_driver.c 
 b/arch/powerpc/kernel/eeh_driver.c
 index 420da61..e3f2074 100644
 --- a/arch/powerpc/kernel/eeh_driver.c
 +++ b/arch/powerpc/kernel/eeh_driver.c
 @@ -123,7 +123,7 @@ static void eeh_disable_irq(struct pci_dev *dev)
* effectively disabled by the DMA Stopped state
* when an EEH error occurs.
*/
 - if (dev-msi_enabled || dev-msix_enabled)
 + if (pci_dev_msi_enabled(dev, MSI_TYPE | MSIX_TYPE))
   return;
 
   if (!irq_has_action(dev-irq))
 diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index
 059a76c..4597940 100644
 --- a/arch/x86/pci/common.c
 +++ b/arch/x86/pci/common.c
 @@ 

RE: [PATCH v2 2/2] PCI/MSI: Remove arch_msi_check_device()

2014-08-11 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Alexander Gordeev
> Sent: Monday, August 11, 2014 5:16 PM
> To: Bjorn Helgaas
> Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
> p...@vger.kernel.org
> Subject: [PATCH v2 2/2] PCI/MSI: Remove arch_msi_check_device()
> 
> There are no archs that override arch_msi_check_device() hook. Remove it as it
> is completely redundant.

Is not we are still overriding this in powerpc (not sure I missed some patch , 
if so please point to that).

$ grep -r  arch_msi_check_device arch/powerpc/
Binary file arch/powerpc/kernel/msi.o matches
arch/powerpc/kernel/msi.c:int arch_msi_check_device(struct pci_dev* dev, int 
nvec, int type)
Binary file arch/powerpc/kernel/built-in.o matches

Thanks
-Bharat

> 
> If an arch would need to check MSI/MSI-X possibility for a device it should 
> make
> it within arch_setup_msi_irqs() hook.
> 
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Alexander Gordeev 
> ---
>  drivers/pci/msi.c   | 49 +
>  include/linux/msi.h |  3 ---
>  2 files changed, 13 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 5a40516..6c2cc41 
> 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -56,16 +56,6 @@ void __weak arch_teardown_msi_irq(unsigned int irq)
>   chip->teardown_irq(chip, irq);
>  }
> 
> -int __weak arch_msi_check_device(struct pci_dev *dev, int nvec, int type) -{
> - struct msi_chip *chip = dev->bus->msi;
> -
> - if (!chip || !chip->check_device)
> - return 0;
> -
> - return chip->check_device(chip, dev, nvec, type);
> -}
> -
>  int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)  {
>   struct msi_desc *entry;
> @@ -806,22 +796,23 @@ out_free:
>  }
> 
>  /**
> - * pci_msi_check_device - check whether MSI may be enabled on a device
> + * pci_msi_supported - check whether MSI may be enabled on a device
>   * @dev: pointer to the pci_dev data structure of MSI device function
>   * @nvec: how many MSIs have been requested ?
> - * @type: are we checking for MSI or MSI-X ?
>   *
>   * Look at global flags, the device itself, and its parent buses
>   * to determine if MSI/-X are supported for the device. If MSI/-X is
>   * supported return 0, else return an error code.
>   **/
> -static int pci_msi_check_device(struct pci_dev *dev, int nvec, int type)
> +static int pci_msi_supported(struct pci_dev *dev, int nvec)
>  {
>   struct pci_bus *bus;
> - int ret;
> 
>   /* MSI must be globally enabled and supported by the device */
> - if (!pci_msi_enable || !dev || dev->no_msi)
> + if (!pci_msi_enable)
> + return -EINVAL;
> +
> + if (!dev || dev->no_msi || dev->current_state != PCI_D0)
>   return -EINVAL;
> 
>   /*
> @@ -843,10 +834,6 @@ static int pci_msi_check_device(struct pci_dev *dev, int
> nvec, int type)
>   if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>   return -EINVAL;
> 
> - ret = arch_msi_check_device(dev, nvec, type);
> - if (ret)
> - return ret;
> -
>   return 0;
>  }
> 
> @@ -949,13 +936,13 @@ int pci_enable_msix(struct pci_dev *dev, struct 
> msix_entry
> *entries, int nvec)
>   int status, nr_entries;
>   int i, j;
> 
> - if (!entries || !dev->msix_cap || dev->current_state != PCI_D0)
> - return -EINVAL;
> -
> - status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSIX);
> + status = pci_msi_supported(dev, nvec);
>   if (status)
>   return status;
> 
> + if (!entries)
> + return -EINVAL;
> +
>   nr_entries = pci_msix_vec_count(dev);
>   if (nr_entries < 0)
>   return nr_entries;
> @@ -1062,8 +1049,9 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
> minvec,
> int maxvec)
>   int nvec;
>   int rc;
> 
> - if (dev->current_state != PCI_D0)
> - return -EINVAL;
> + rc = pci_msi_supported(dev, minvec);
> + if (rc)
> + return rc;
> 
>   WARN_ON(!!dev->msi_enabled);
> 
> @@ -1086,17 +1074,6 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
> minvec,
> int maxvec)
>   nvec = maxvec;
> 
>   do {
> - rc = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSI);
> - if (rc < 0) {
> - return rc;
> - } else if (rc > 0) {
> - if (rc < minvec)
> - return -ENOSPC;
> - nvec = rc;
> - }
> - } while (rc);
> -
> - do {
>   rc = msi_capability_init(dev, nvec);
>   if (rc < 0) {
>   return rc;
> diff --git a/include/linux/msi.h b/include/linux/msi.h index 8103f32..dbf7cc9
> 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ 

RE: [PATCH v2 2/2] PCI/MSI: Remove arch_msi_check_device()

2014-08-11 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Alexander Gordeev
 Sent: Monday, August 11, 2014 5:16 PM
 To: Bjorn Helgaas
 Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
 p...@vger.kernel.org
 Subject: [PATCH v2 2/2] PCI/MSI: Remove arch_msi_check_device()
 
 There are no archs that override arch_msi_check_device() hook. Remove it as it
 is completely redundant.

Is not we are still overriding this in powerpc (not sure I missed some patch , 
if so please point to that).

$ grep -r  arch_msi_check_device arch/powerpc/
Binary file arch/powerpc/kernel/msi.o matches
arch/powerpc/kernel/msi.c:int arch_msi_check_device(struct pci_dev* dev, int 
nvec, int type)
Binary file arch/powerpc/kernel/built-in.o matches

Thanks
-Bharat

 
 If an arch would need to check MSI/MSI-X possibility for a device it should 
 make
 it within arch_setup_msi_irqs() hook.
 
 Cc: linuxppc-...@lists.ozlabs.org
 Cc: linux-...@vger.kernel.org
 Signed-off-by: Alexander Gordeev agord...@redhat.com
 ---
  drivers/pci/msi.c   | 49 +
  include/linux/msi.h |  3 ---
  2 files changed, 13 insertions(+), 39 deletions(-)
 
 diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 5a40516..6c2cc41 
 100644
 --- a/drivers/pci/msi.c
 +++ b/drivers/pci/msi.c
 @@ -56,16 +56,6 @@ void __weak arch_teardown_msi_irq(unsigned int irq)
   chip-teardown_irq(chip, irq);
  }
 
 -int __weak arch_msi_check_device(struct pci_dev *dev, int nvec, int type) -{
 - struct msi_chip *chip = dev-bus-msi;
 -
 - if (!chip || !chip-check_device)
 - return 0;
 -
 - return chip-check_device(chip, dev, nvec, type);
 -}
 -
  int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)  {
   struct msi_desc *entry;
 @@ -806,22 +796,23 @@ out_free:
  }
 
  /**
 - * pci_msi_check_device - check whether MSI may be enabled on a device
 + * pci_msi_supported - check whether MSI may be enabled on a device
   * @dev: pointer to the pci_dev data structure of MSI device function
   * @nvec: how many MSIs have been requested ?
 - * @type: are we checking for MSI or MSI-X ?
   *
   * Look at global flags, the device itself, and its parent buses
   * to determine if MSI/-X are supported for the device. If MSI/-X is
   * supported return 0, else return an error code.
   **/
 -static int pci_msi_check_device(struct pci_dev *dev, int nvec, int type)
 +static int pci_msi_supported(struct pci_dev *dev, int nvec)
  {
   struct pci_bus *bus;
 - int ret;
 
   /* MSI must be globally enabled and supported by the device */
 - if (!pci_msi_enable || !dev || dev-no_msi)
 + if (!pci_msi_enable)
 + return -EINVAL;
 +
 + if (!dev || dev-no_msi || dev-current_state != PCI_D0)
   return -EINVAL;
 
   /*
 @@ -843,10 +834,6 @@ static int pci_msi_check_device(struct pci_dev *dev, int
 nvec, int type)
   if (bus-bus_flags  PCI_BUS_FLAGS_NO_MSI)
   return -EINVAL;
 
 - ret = arch_msi_check_device(dev, nvec, type);
 - if (ret)
 - return ret;
 -
   return 0;
  }
 
 @@ -949,13 +936,13 @@ int pci_enable_msix(struct pci_dev *dev, struct 
 msix_entry
 *entries, int nvec)
   int status, nr_entries;
   int i, j;
 
 - if (!entries || !dev-msix_cap || dev-current_state != PCI_D0)
 - return -EINVAL;
 -
 - status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSIX);
 + status = pci_msi_supported(dev, nvec);
   if (status)
   return status;
 
 + if (!entries)
 + return -EINVAL;
 +
   nr_entries = pci_msix_vec_count(dev);
   if (nr_entries  0)
   return nr_entries;
 @@ -1062,8 +1049,9 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
 minvec,
 int maxvec)
   int nvec;
   int rc;
 
 - if (dev-current_state != PCI_D0)
 - return -EINVAL;
 + rc = pci_msi_supported(dev, minvec);
 + if (rc)
 + return rc;
 
   WARN_ON(!!dev-msi_enabled);
 
 @@ -1086,17 +1074,6 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
 minvec,
 int maxvec)
   nvec = maxvec;
 
   do {
 - rc = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSI);
 - if (rc  0) {
 - return rc;
 - } else if (rc  0) {
 - if (rc  minvec)
 - return -ENOSPC;
 - nvec = rc;
 - }
 - } while (rc);
 -
 - do {
   rc = msi_capability_init(dev, nvec);
   if (rc  0) {
   return rc;
 diff --git a/include/linux/msi.h b/include/linux/msi.h index 8103f32..dbf7cc9
 100644
 --- a/include/linux/msi.h
 +++ b/include/linux/msi.h
 @@ -60,7 +60,6 @@ int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc
 *desc);  void arch_teardown_msi_irq(unsigned int irq);  int
 

RE: [PATCH v2] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-07-02 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Wood Scott-B07421
> Sent: Thursday, July 03, 2014 5:22 AM
> To: linuxppc-...@lists.ozlabs.org
> Cc: linux-kernel@vger.kernel.org; Bhushan Bharat-R65777
> Subject: [PATCH v2] memory: Freescale CoreNet Coherency Fabric error reporting
> driver
> 
> The CoreNet Coherency Fabric is part of the memory subsystem on
> some Freescale QorIQ chips.  It can report coherency violations (e.g.
> due to misusing memory that is mapped noncoherent) as well as
> transactions that do not hit any local access window, or which hit a
> local access window with an invalid target ID.
> 
> Signed-off-by: Scott Wood 

Reviewed-by: Bharat Bhushan 


Regards
-Bharat

> ---
> v2:
>  - s/cecadr/cecaddr/ to consistently use ccf2 names
>  - fix ERRDET_CTYPE_MASK value
>  - use the ccf2 value for CECADDRH_ADDRH (harmless on ccf1)
>  - add comment explaining why we disable detection on remove for
>ccf1 but not ccf2
> ---
>  arch/powerpc/configs/corenet32_smp_defconfig |   1 +
>  arch/powerpc/configs/corenet64_smp_defconfig |   1 +
>  drivers/memory/Kconfig   |  10 ++
>  drivers/memory/Makefile  |   1 +
>  drivers/memory/fsl-corenet-cf.c  | 251 
> +++
>  5 files changed, 264 insertions(+)
>  create mode 100644 drivers/memory/fsl-corenet-cf.c
> 
> diff --git a/arch/powerpc/configs/corenet32_smp_defconfig
> b/arch/powerpc/configs/corenet32_smp_defconfig
> index 7d0c837..6a3c58a 100644
> --- a/arch/powerpc/configs/corenet32_smp_defconfig
> +++ b/arch/powerpc/configs/corenet32_smp_defconfig
> @@ -180,3 +180,4 @@ CONFIG_CRYPTO_SHA512=y
>  CONFIG_CRYPTO_AES=y
>  # CONFIG_CRYPTO_ANSI_CPRNG is not set
>  CONFIG_CRYPTO_DEV_FSL_CAAM=y
> +CONFIG_FSL_CORENET_CF=y
> diff --git a/arch/powerpc/configs/corenet64_smp_defconfig
> b/arch/powerpc/configs/corenet64_smp_defconfig
> index 6ae07e1..4b07bad 100644
> --- a/arch/powerpc/configs/corenet64_smp_defconfig
> +++ b/arch/powerpc/configs/corenet64_smp_defconfig
> @@ -179,3 +179,4 @@ CONFIG_CRYPTO_SHA256=y
>  CONFIG_CRYPTO_SHA512=y
>  # CONFIG_CRYPTO_ANSI_CPRNG is not set
>  CONFIG_CRYPTO_DEV_FSL_CAAM=y
> +CONFIG_FSL_CORENET_CF=y
> diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
> index c59e9c9..fab81a1 100644
> --- a/drivers/memory/Kconfig
> +++ b/drivers/memory/Kconfig
> @@ -61,6 +61,16 @@ config TEGRA30_MC
> analysis, especially for IOMMU/SMMU(System Memory Management
> Unit) module.
> 
> +config FSL_CORENET_CF
> + tristate "Freescale CoreNet Error Reporting"
> + depends on FSL_SOC_BOOKE
> + help
> +   Say Y for reporting of errors from the Freescale CoreNet
> +   Coherency Fabric.  Errors reported include accesses to
> +   physical addresses that mapped by no local access window
> +   (LAW) or an invalid LAW, as well as bad cache state that
> +   represents a coherency violation.
> +
>  config FSL_IFC
>   bool
>   depends on FSL_SOC
> diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
> index 71160a2..4055c47 100644
> --- a/drivers/memory/Makefile
> +++ b/drivers/memory/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_OF)  += of_memory.o
>  endif
>  obj-$(CONFIG_TI_AEMIF)   += ti-aemif.o
>  obj-$(CONFIG_TI_EMIF)+= emif.o
> +obj-$(CONFIG_FSL_CORENET_CF) += fsl-corenet-cf.o
>  obj-$(CONFIG_FSL_IFC)+= fsl_ifc.o
>  obj-$(CONFIG_MVEBU_DEVBUS)   += mvebu-devbus.o
>  obj-$(CONFIG_TEGRA20_MC) += tegra20-mc.o
> diff --git a/drivers/memory/fsl-corenet-cf.c b/drivers/memory/fsl-corenet-cf.c
> new file mode 100644
> index 000..c9443fc
> --- /dev/null
> +++ b/drivers/memory/fsl-corenet-cf.c
> @@ -0,0 +1,251 @@
> +/*
> + * CoreNet Coherency Fabric error reporting
> + *
> + * Copyright 2014 Freescale Semiconductor Inc.
> + *
> + * This program is free software; you can redistribute  it and/or modify it
> + * under  the terms of  the GNU General  Public License as published by the
> + * Free Software Foundation;  either version 2 of the  License, or (at your
> + * option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +enum ccf_version {
> + CCF1,
> + CCF2,
> +};
> +
> +struct ccf_info {
> + enum ccf_version version;
> + int err_reg_offs;
> +};
> +
> +static const struct ccf_info ccf1_info = {
> + .version = CCF1,
> + .err_reg_offs = 0xa00,
> +};
> +
> +static const struct ccf_info ccf2_info = {
> + .version = CCF2,
> + .err_reg_offs = 0xe40,
> +};
> +
> +static const struct of_device_id ccf_matches[] = {
> + {
> + .compatible = "fsl,corenet1-cf",
> + .data = _info,
> + },
> + {
> + .compatible = "fsl,corenet2-cf",
> + .data = _info,
> + },
> + {}
> +};
> +
> +struct ccf_err_regs {
> + u32 errdet; /* 0x00 Error Detect Register */
> + /* 0x04 Error 

RE: [PATCH v2] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-07-02 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Thursday, July 03, 2014 5:22 AM
 To: linuxppc-...@lists.ozlabs.org
 Cc: linux-kernel@vger.kernel.org; Bhushan Bharat-R65777
 Subject: [PATCH v2] memory: Freescale CoreNet Coherency Fabric error reporting
 driver
 
 The CoreNet Coherency Fabric is part of the memory subsystem on
 some Freescale QorIQ chips.  It can report coherency violations (e.g.
 due to misusing memory that is mapped noncoherent) as well as
 transactions that do not hit any local access window, or which hit a
 local access window with an invalid target ID.
 
 Signed-off-by: Scott Wood scottw...@freescale.com

Reviewed-by: Bharat Bhushan bharat.bhus...@freescale.com


Regards
-Bharat

 ---
 v2:
  - s/cecadr/cecaddr/ to consistently use ccf2 names
  - fix ERRDET_CTYPE_MASK value
  - use the ccf2 value for CECADDRH_ADDRH (harmless on ccf1)
  - add comment explaining why we disable detection on remove for
ccf1 but not ccf2
 ---
  arch/powerpc/configs/corenet32_smp_defconfig |   1 +
  arch/powerpc/configs/corenet64_smp_defconfig |   1 +
  drivers/memory/Kconfig   |  10 ++
  drivers/memory/Makefile  |   1 +
  drivers/memory/fsl-corenet-cf.c  | 251 
 +++
  5 files changed, 264 insertions(+)
  create mode 100644 drivers/memory/fsl-corenet-cf.c
 
 diff --git a/arch/powerpc/configs/corenet32_smp_defconfig
 b/arch/powerpc/configs/corenet32_smp_defconfig
 index 7d0c837..6a3c58a 100644
 --- a/arch/powerpc/configs/corenet32_smp_defconfig
 +++ b/arch/powerpc/configs/corenet32_smp_defconfig
 @@ -180,3 +180,4 @@ CONFIG_CRYPTO_SHA512=y
  CONFIG_CRYPTO_AES=y
  # CONFIG_CRYPTO_ANSI_CPRNG is not set
  CONFIG_CRYPTO_DEV_FSL_CAAM=y
 +CONFIG_FSL_CORENET_CF=y
 diff --git a/arch/powerpc/configs/corenet64_smp_defconfig
 b/arch/powerpc/configs/corenet64_smp_defconfig
 index 6ae07e1..4b07bad 100644
 --- a/arch/powerpc/configs/corenet64_smp_defconfig
 +++ b/arch/powerpc/configs/corenet64_smp_defconfig
 @@ -179,3 +179,4 @@ CONFIG_CRYPTO_SHA256=y
  CONFIG_CRYPTO_SHA512=y
  # CONFIG_CRYPTO_ANSI_CPRNG is not set
  CONFIG_CRYPTO_DEV_FSL_CAAM=y
 +CONFIG_FSL_CORENET_CF=y
 diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
 index c59e9c9..fab81a1 100644
 --- a/drivers/memory/Kconfig
 +++ b/drivers/memory/Kconfig
 @@ -61,6 +61,16 @@ config TEGRA30_MC
 analysis, especially for IOMMU/SMMU(System Memory Management
 Unit) module.
 
 +config FSL_CORENET_CF
 + tristate Freescale CoreNet Error Reporting
 + depends on FSL_SOC_BOOKE
 + help
 +   Say Y for reporting of errors from the Freescale CoreNet
 +   Coherency Fabric.  Errors reported include accesses to
 +   physical addresses that mapped by no local access window
 +   (LAW) or an invalid LAW, as well as bad cache state that
 +   represents a coherency violation.
 +
  config FSL_IFC
   bool
   depends on FSL_SOC
 diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
 index 71160a2..4055c47 100644
 --- a/drivers/memory/Makefile
 +++ b/drivers/memory/Makefile
 @@ -7,6 +7,7 @@ obj-$(CONFIG_OF)  += of_memory.o
  endif
  obj-$(CONFIG_TI_AEMIF)   += ti-aemif.o
  obj-$(CONFIG_TI_EMIF)+= emif.o
 +obj-$(CONFIG_FSL_CORENET_CF) += fsl-corenet-cf.o
  obj-$(CONFIG_FSL_IFC)+= fsl_ifc.o
  obj-$(CONFIG_MVEBU_DEVBUS)   += mvebu-devbus.o
  obj-$(CONFIG_TEGRA20_MC) += tegra20-mc.o
 diff --git a/drivers/memory/fsl-corenet-cf.c b/drivers/memory/fsl-corenet-cf.c
 new file mode 100644
 index 000..c9443fc
 --- /dev/null
 +++ b/drivers/memory/fsl-corenet-cf.c
 @@ -0,0 +1,251 @@
 +/*
 + * CoreNet Coherency Fabric error reporting
 + *
 + * Copyright 2014 Freescale Semiconductor Inc.
 + *
 + * This program is free software; you can redistribute  it and/or modify it
 + * under  the terms of  the GNU General  Public License as published by the
 + * Free Software Foundation;  either version 2 of the  License, or (at your
 + * option) any later version.
 + */
 +
 +#include linux/interrupt.h
 +#include linux/io.h
 +#include linux/irq.h
 +#include linux/module.h
 +#include linux/of.h
 +#include linux/of_address.h
 +#include linux/of_device.h
 +#include linux/of_irq.h
 +#include linux/platform_device.h
 +
 +enum ccf_version {
 + CCF1,
 + CCF2,
 +};
 +
 +struct ccf_info {
 + enum ccf_version version;
 + int err_reg_offs;
 +};
 +
 +static const struct ccf_info ccf1_info = {
 + .version = CCF1,
 + .err_reg_offs = 0xa00,
 +};
 +
 +static const struct ccf_info ccf2_info = {
 + .version = CCF2,
 + .err_reg_offs = 0xe40,
 +};
 +
 +static const struct of_device_id ccf_matches[] = {
 + {
 + .compatible = fsl,corenet1-cf,
 + .data = ccf1_info,
 + },
 + {
 + .compatible = fsl,corenet2-cf,
 + .data = ccf2_info,
 + },
 + {}
 +};
 +
 +struct ccf_err_regs {
 + u32 errdet; /* 0x00 Error

RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-30 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, July 01, 2014 2:30 AM
> To: Bhushan Bharat-R65777
> Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
> reporting driver
> 
> On Sun, 2014-06-29 at 23:58 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Wednesday, June 04, 2014 10:38 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
> > > ker...@vger.kernel.org
> > > Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency
> > > Fabric error reporting driver
> > >
> > > On Wed, 2014-06-04 at 12:04 -0500, Bhushan Bharat-R65777 wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Wood Scott-B07421
> > > > > Sent: Wednesday, June 04, 2014 10:12 PM
> > > > > To: Bhushan Bharat-R65777
> > > > > Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
> > > > > ker...@vger.kernel.org
> > > > > Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency
> > > > > Fabric error reporting driver
> > > > >
> > > > > On Wed, 2014-06-04 at 03:17 -0500, Bhushan Bharat-R65777 wrote:
> > > > > > > +static int ccf_remove(struct platform_device *pdev) {
> > > > > > > + struct ccf_private *ccf = dev_get_drvdata(>dev);
> > > > > > > +
> > > > > > > + switch (ccf->info->version) {
> > > > > > > + case CCF1:
> > > > > > > + iowrite32be(0, >err_regs->errdis);
> > > > > > > + break;
> > > > > > > +
> > > > > > > + case CCF2:
> > > > > > > + iowrite32be(0, >err_regs->errinten);
> > > > > >
> > > > > > Do you think it is same to disable detection bits in
> > > > > > ccf->err_regs-
> > > >errdis?
> > > > >
> > > > > Disabling the interrupt is what we're aiming for here, but ccf1
> > > > > doesn't provide a way to do that separate from disabling detection.
> > > >
> > > > What I wanted to say that do we also need to disable detection
> > > > (set ERRDET_LAE | ERRDET_CV bits in errdis) apart from clearing
> > > > errinten on
> > > > ccf2 ?
> > >
> > > I don't think we "need" to.  You could argue that we should for
> > > consistency, though I think there's value in errors continuing to be
> > > detected even without the driver (e.g. can dump the registers in a
> debugger).
> >
> > Yes this comment was for consistency. Also IIUC, the state which is left 
> > when
> the driver is removed is not default reset behavior.
> 
> How many drivers leave the hardware in pristine reset state when exiting?

I do not know :)

>  And
> you could argue that having detection off by default is poor hardware design
> (enabling interrupts is another matter of course).

Ok, then can you please add a comment in _remove() function describing why 
detection is still enabled.

Thanks
-Bharat

> 
> > If we want errors to be detected then should not we have a sysfs interface?
> 
> That may be useful but it's beyond the scope of what I'm doing with this 
> patch.
> We currently don't log machine checks anywhere but via printk either.
> 
> BTW, I thought I had sent v2 of this, but I don't see it anywhere...
> I'll respin soon.
> 
> -Scott
> 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-30 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 01, 2014 2:30 AM
 To: Bhushan Bharat-R65777
 Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
 reporting driver
 
 On Sun, 2014-06-29 at 23:58 -0500, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Wood Scott-B07421
   Sent: Wednesday, June 04, 2014 10:38 PM
   To: Bhushan Bharat-R65777
   Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
   ker...@vger.kernel.org
   Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency
   Fabric error reporting driver
  
   On Wed, 2014-06-04 at 12:04 -0500, Bhushan Bharat-R65777 wrote:
   
 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, June 04, 2014 10:12 PM
 To: Bhushan Bharat-R65777
 Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency
 Fabric error reporting driver

 On Wed, 2014-06-04 at 03:17 -0500, Bhushan Bharat-R65777 wrote:
   +static int ccf_remove(struct platform_device *pdev) {
   + struct ccf_private *ccf = dev_get_drvdata(pdev-dev);
   +
   + switch (ccf-info-version) {
   + case CCF1:
   + iowrite32be(0, ccf-err_regs-errdis);
   + break;
   +
   + case CCF2:
   + iowrite32be(0, ccf-err_regs-errinten);
 
  Do you think it is same to disable detection bits in
  ccf-err_regs-
   errdis?

 Disabling the interrupt is what we're aiming for here, but ccf1
 doesn't provide a way to do that separate from disabling detection.
   
What I wanted to say that do we also need to disable detection
(set ERRDET_LAE | ERRDET_CV bits in errdis) apart from clearing
errinten on
ccf2 ?
  
   I don't think we need to.  You could argue that we should for
   consistency, though I think there's value in errors continuing to be
   detected even without the driver (e.g. can dump the registers in a
 debugger).
 
  Yes this comment was for consistency. Also IIUC, the state which is left 
  when
 the driver is removed is not default reset behavior.
 
 How many drivers leave the hardware in pristine reset state when exiting?

I do not know :)

  And
 you could argue that having detection off by default is poor hardware design
 (enabling interrupts is another matter of course).

Ok, then can you please add a comment in _remove() function describing why 
detection is still enabled.

Thanks
-Bharat

 
  If we want errors to be detected then should not we have a sysfs interface?
 
 That may be useful but it's beyond the scope of what I'm doing with this 
 patch.
 We currently don't log machine checks anywhere but via printk either.
 
 BTW, I thought I had sent v2 of this, but I don't see it anywhere...
 I'll respin soon.
 
 -Scott
 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-29 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, June 04, 2014 10:38 PM
> To: Bhushan Bharat-R65777
> Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
> reporting driver
> 
> On Wed, 2014-06-04 at 12:04 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Wednesday, June 04, 2014 10:12 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
> > > ker...@vger.kernel.org
> > > Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency
> > > Fabric error reporting driver
> > >
> > > On Wed, 2014-06-04 at 03:17 -0500, Bhushan Bharat-R65777 wrote:
> > > > > +static int ccf_remove(struct platform_device *pdev) {
> > > > > + struct ccf_private *ccf = dev_get_drvdata(>dev);
> > > > > +
> > > > > + switch (ccf->info->version) {
> > > > > + case CCF1:
> > > > > + iowrite32be(0, >err_regs->errdis);
> > > > > + break;
> > > > > +
> > > > > + case CCF2:
> > > > > + iowrite32be(0, >err_regs->errinten);
> > > >
> > > > Do you think it is same to disable detection bits in ccf->err_regs-
> >errdis?
> > >
> > > Disabling the interrupt is what we're aiming for here, but ccf1
> > > doesn't provide a way to do that separate from disabling detection.
> >
> > What I wanted to say that do we also need to disable detection (set
> > ERRDET_LAE | ERRDET_CV bits in errdis) apart from clearing errinten on
> > ccf2 ?
> 
> I don't think we "need" to.  You could argue that we should for consistency,
> though I think there's value in errors continuing to be detected even without
> the driver (e.g. can dump the registers in a debugger).

Yes this comment was for consistency. Also IIUC, the state which is left when 
the driver is removed is not default reset behavior.
If we want errors to be detected then should not we have a sysfs interface?

Thanks
-Bharat

> 
> -Scott
> 



RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, June 04, 2014 10:38 PM
 To: Bhushan Bharat-R65777
 Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
 reporting driver
 
 On Wed, 2014-06-04 at 12:04 -0500, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Wood Scott-B07421
   Sent: Wednesday, June 04, 2014 10:12 PM
   To: Bhushan Bharat-R65777
   Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
   ker...@vger.kernel.org
   Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency
   Fabric error reporting driver
  
   On Wed, 2014-06-04 at 03:17 -0500, Bhushan Bharat-R65777 wrote:
 +static int ccf_remove(struct platform_device *pdev) {
 + struct ccf_private *ccf = dev_get_drvdata(pdev-dev);
 +
 + switch (ccf-info-version) {
 + case CCF1:
 + iowrite32be(0, ccf-err_regs-errdis);
 + break;
 +
 + case CCF2:
 + iowrite32be(0, ccf-err_regs-errinten);
   
Do you think it is same to disable detection bits in ccf-err_regs-
 errdis?
  
   Disabling the interrupt is what we're aiming for here, but ccf1
   doesn't provide a way to do that separate from disabling detection.
 
  What I wanted to say that do we also need to disable detection (set
  ERRDET_LAE | ERRDET_CV bits in errdis) apart from clearing errinten on
  ccf2 ?
 
 I don't think we need to.  You could argue that we should for consistency,
 though I think there's value in errors continuing to be detected even without
 the driver (e.g. can dump the registers in a debugger).

Yes this comment was for consistency. Also IIUC, the state which is left when 
the driver is removed is not default reset behavior.
If we want errors to be detected then should not we have a sysfs interface?

Thanks
-Bharat

 
 -Scott
 



RE: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
> Kardashevskiy
> Sent: Thursday, June 19, 2014 9:18 AM
> To: Alex Williamson
> Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org;
> Alexander Graf; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs
> 
> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
> > On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> >> On 06/19/2014 04:35 AM, Alex Williamson wrote:
> >>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>  VFIO exposes BARs to user space as a byte stream so userspace can
>  read it using pread()/pwrite(). Since this is a byte stream, VFIO
>  should not do byte swapping and simply return values as it gets
>  them from PCI device.
> 
>  Instead, the existing code assumes that byte stream in read/write
>  is little-endian and it fixes endianness for values which it passes
>  to ioreadXX/iowriteXX helpers. This works for little-endian as PCI
>  is little endian and le32_to_cpu/... are stubs.
> >>>
> >>> vfio read32:
> >>>
> >>> val = cpu_to_le32(ioread32(io + off));
> >>>
> >>> Where the typical x86 case, ioread32 is:
> >>>
> >>> #define ioread32(addr)  readl(addr)
> >>>
> >>> and readl is:
> >>>
> >>> __le32_to_cpu(__raw_readl(addr));
> >>>
> >>> So we do canceling byte swaps, which are both nops on x86, and end
> >>> up returning device endian, which we assume is little endian.
> >>>
> >>> vfio write32 is similar:
> >>>
> >>> iowrite32(le32_to_cpu(val), io + off);
> >>>
> >>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
> >>> out, so input data is device endian, which is assumed little.
> >>>
>  This also works for big endian but rather by an accident: it reads
>  4 bytes from the stream (@val is big endian), converts to CPU
>  format (which should be big endian) as it was little endian (@val
>  becomes actually little
>  endian) and calls iowrite32() which does not do swapping on big
>  endian system.
> >>>
> >>> Really?
> >>>
> >>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
> >>> writel(), which seems to use the generic implementation, which does
> >>> include a cpu_to_le32.
> >>
> >>
> >> Ouch, wrong comment. iowrite32() does swapping. My bad.
> >>
> >>
> >>>
> >>> I also see other big endian archs like parisc doing cpu_to_le32 on
> >>> iowrite32, so I don't think this statement is true.  I imagine it's
> >>> probably working for you because the swap cancel.
> >>>
>  This removes byte swapping and makes use ioread32be/iowrite32be
>  (and 16bit versions) on big-endian systems. The "be" helpers take
>  native endian values and do swapping at the moment of writing to a
>  PCI register using one of "store byte-reversed" instructions.
> >>>
> >>> So now you want iowrite32() on little endian and iowrite32be() on
> >>> big endian, the former does a cpu_to_le32 (which is a nop on little
> >>> endian) and the latter does a cpu_to_be32 (which is a nop on big 
> >>> endian)...
> >>> should we just be using __raw_writel() on both?
> >>
> >>
> >> We can do that too. The beauty of iowrite32be on ppc64 is that it
> >> does not swap and write separately, it is implemented via the "Store
> >> Word Byte-Reverse Indexed X-form" single instruction.
> >>
> >> And some archs (do not know which ones) may add memory barriers in
> >> their implementations of ioread/iowrite. __raw_writel is too raw :)
> >>
> >>>  There doesn't actually
> >>> seem to be any change in behavior here, it just eliminates
> >>> back-to-back byte swaps, which are a nop on x86, but not power, right?
> >>
> >> Exactly. No dependency for QEMU.
> >
> > How about that:
> > ===
> >
> > VFIO exposes BARs to user space as a byte stream so userspace can read
> > it using pread()/pwrite(). Since this is a byte stream, VFIO should
> > not do byte swapping and simply return values as it gets them from PCI
> > device.
> >
> > Instead, the existing code assumes that byte stream in read/write is
> > little-endian and it fixes endianness for values which it passes to
> > ioreadXX/iowriteXX helpers in native format. The IO helpers do
> > swapping again. Since both byte swaps are nops on little-endian host, this
> works.
> >
> > This also works for big endian but rather by an accident: it reads 4
> > bytes from the stream (@val is big endian), converts to CPU format
> > (which should be big endian) as it was little endian (and @val becomes
> > actually little
> > endian) and calls iowrite32() which does swapping on big endian system
> > again. So byte swap gets cancelled, __raw_writel() receives a native
> > value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) =
> > v; just does the right thing.
> 
> I am wrong here, sorry. This is what happens when you watch soccer between 2am

RE: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
 Kardashevskiy
 Sent: Thursday, June 19, 2014 9:18 AM
 To: Alex Williamson
 Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org;
 Alexander Graf; linuxppc-...@lists.ozlabs.org
 Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs
 
 On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
  On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
  On 06/19/2014 04:35 AM, Alex Williamson wrote:
  On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
  VFIO exposes BARs to user space as a byte stream so userspace can
  read it using pread()/pwrite(). Since this is a byte stream, VFIO
  should not do byte swapping and simply return values as it gets
  them from PCI device.
 
  Instead, the existing code assumes that byte stream in read/write
  is little-endian and it fixes endianness for values which it passes
  to ioreadXX/iowriteXX helpers. This works for little-endian as PCI
  is little endian and le32_to_cpu/... are stubs.
 
  vfio read32:
 
  val = cpu_to_le32(ioread32(io + off));
 
  Where the typical x86 case, ioread32 is:
 
  #define ioread32(addr)  readl(addr)
 
  and readl is:
 
  __le32_to_cpu(__raw_readl(addr));
 
  So we do canceling byte swaps, which are both nops on x86, and end
  up returning device endian, which we assume is little endian.
 
  vfio write32 is similar:
 
  iowrite32(le32_to_cpu(val), io + off);
 
  The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
  out, so input data is device endian, which is assumed little.
 
  This also works for big endian but rather by an accident: it reads
  4 bytes from the stream (@val is big endian), converts to CPU
  format (which should be big endian) as it was little endian (@val
  becomes actually little
  endian) and calls iowrite32() which does not do swapping on big
  endian system.
 
  Really?
 
  In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
  writel(), which seems to use the generic implementation, which does
  include a cpu_to_le32.
 
 
  Ouch, wrong comment. iowrite32() does swapping. My bad.
 
 
 
  I also see other big endian archs like parisc doing cpu_to_le32 on
  iowrite32, so I don't think this statement is true.  I imagine it's
  probably working for you because the swap cancel.
 
  This removes byte swapping and makes use ioread32be/iowrite32be
  (and 16bit versions) on big-endian systems. The be helpers take
  native endian values and do swapping at the moment of writing to a
  PCI register using one of store byte-reversed instructions.
 
  So now you want iowrite32() on little endian and iowrite32be() on
  big endian, the former does a cpu_to_le32 (which is a nop on little
  endian) and the latter does a cpu_to_be32 (which is a nop on big 
  endian)...
  should we just be using __raw_writel() on both?
 
 
  We can do that too. The beauty of iowrite32be on ppc64 is that it
  does not swap and write separately, it is implemented via the Store
  Word Byte-Reverse Indexed X-form single instruction.
 
  And some archs (do not know which ones) may add memory barriers in
  their implementations of ioread/iowrite. __raw_writel is too raw :)
 
   There doesn't actually
  seem to be any change in behavior here, it just eliminates
  back-to-back byte swaps, which are a nop on x86, but not power, right?
 
  Exactly. No dependency for QEMU.
 
  How about that:
  ===
 
  VFIO exposes BARs to user space as a byte stream so userspace can read
  it using pread()/pwrite(). Since this is a byte stream, VFIO should
  not do byte swapping and simply return values as it gets them from PCI
  device.
 
  Instead, the existing code assumes that byte stream in read/write is
  little-endian and it fixes endianness for values which it passes to
  ioreadXX/iowriteXX helpers in native format. The IO helpers do
  swapping again. Since both byte swaps are nops on little-endian host, this
 works.
 
  This also works for big endian but rather by an accident: it reads 4
  bytes from the stream (@val is big endian), converts to CPU format
  (which should be big endian) as it was little endian (and @val becomes
  actually little
  endian) and calls iowrite32() which does swapping on big endian system
  again. So byte swap gets cancelled, __raw_writel() receives a native
  value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) =
  v; just does the right thing.
 
 I am wrong here, sorry. This is what happens when you watch soccer between 2am
 and 4am :)
 
 
 
  This removes byte swaps and makes use of ioread32be/iowrite32be (and
  16bit versions) which do explicit byte swapping at the moment of write
  to a PCI register. PPC64 uses a special Store Word Byte-Reverse
  Indexed X-form instruction which does swap and store.
 
 No swapping is done here if we use ioread32be as it calls in_be32 and that
 animal does lwz which is 

RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-04 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, June 04, 2014 10:12 PM
> To: Bhushan Bharat-R65777
> Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
> reporting driver
> 
> On Wed, 2014-06-04 at 03:17 -0500, Bhushan Bharat-R65777 wrote:
> > > +struct ccf_err_regs {
> > > + u32 errdet; /* 0x00 Error Detect Register */
> > > + /* 0x04 Error Enable (ccf1)/Disable (ccf2) Register */
> > > + u32 errdis;
> > > + /* 0x08 Error Interrupt Enable Register (ccf2 only) */
> > > + u32 errinten;
> > > + u32 cecar;  /* 0x0c Error Capture Attribute Register */
> > > + u32 cecadrh;/* 0x10 Error Capture Address High */
> >
> > s/cecadrh/cecaddrh/g
> > This way we will be consistent with Reference manual.
> 
> It's "cecadrh" in ccf1 and "cecaddrh" in ccf2.  I suppose I should use the
> latter since "errdet/errdis/errinten" are the ccf2 names.
> 
> > > + u32 cecadrl;/* 0x14 Error Capture Address Low */
> >
> > s/cecadrl/cecaddrl/g
> >
> > > + u32 cecar2; /* 0x18 Error Capture Attribute Register 2 */
> > > +};
> > > +
> > > +/* LAE/CV also valid for errdis and errinten */
> > > +#define ERRDET_LAE   (1 << 0)  /* Local Access Error */
> > > +#define ERRDET_CV(1 << 1)  /* Coherency Violation */
> > > +#define ERRDET_CTYPE_SHIFT   26/* Capture Type (ccf2 only) */
> > > +#define ERRDET_CTYPE_MASK(0x3f << ERRDET_CTYPE_SHIFT)
> >
> > Should not this be (0x1f << ERRDET_CTYPE_SHIFT)
> 
> Yes, thanks for catching that.
> 
> > > +#define ERRDET_CAP   (1 << 31) /* Capture Valid (ccf2 only) 
> > > */
> > > +
> > > +#define CECAR_VAL(1 << 0)  /* Valid (ccf1 only) */
> > > +#define CECAR_UVT(1 << 15) /* Unavailable target ID 
> > > (ccf1) */
> > > +#define CECAR_SRCID_SHIFT_CCF1   24
> > > +#define CECAR_SRCID_MASK_CCF1(0xff << CECAR_SRCID_SHIFT_CCF1)
> > > +#define CECAR_SRCID_SHIFT_CCF2   18
> > > +#define CECAR_SRCID_MASK_CCF2(0xff << CECAR_SRCID_SHIFT_CCF2)
> > > +
> > > +#define CECADRH_ADDRH0xf
> >
> > On ccf2 this id 0xff.
> 
> OK.  I think we can get away with using 0xff on both.
> 
> > > +static int ccf_remove(struct platform_device *pdev) {
> > > + struct ccf_private *ccf = dev_get_drvdata(>dev);
> > > +
> > > + switch (ccf->info->version) {
> > > + case CCF1:
> > > + iowrite32be(0, >err_regs->errdis);
> > > + break;
> > > +
> > > + case CCF2:
> > > + iowrite32be(0, >err_regs->errinten);
> >
> > Do you think it is same to disable detection bits in ccf->err_regs->errdis?
> 
> Disabling the interrupt is what we're aiming for here, but ccf1 doesn't 
> provide
> a way to do that separate from disabling detection.

What I wanted to say that do we also need to disable detection (set ERRDET_LAE 
| ERRDET_CV bits in errdis) apart from clearing errinten on ccf2 ?

Thanks
-Bharat

> 
> -Scott
> 



RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-04 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Scott Wood
> Sent: Saturday, May 31, 2014 3:58 AM
> To: Greg Kroah-Hartman
> Cc: linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
> reporting driver
> 
> The CoreNet Coherency Fabric is part of the memory subsystem on some Freescale
> QorIQ chips.  It can report coherency violations (e.g.
> due to misusing memory that is mapped noncoherent) as well as transactions 
> that
> do not hit any local access window, or which hit a local access window with an
> invalid target ID.
> 
> Signed-off-by: Scott Wood 
> ---
> Resending to the proper list addresses -- sorry for the duplicate.
> 
>  arch/powerpc/configs/corenet32_smp_defconfig |   1 +
>  arch/powerpc/configs/corenet64_smp_defconfig |   1 +
>  drivers/memory/Kconfig   |  10 ++
>  drivers/memory/Makefile  |   1 +
>  drivers/memory/fsl-corenet-cf.c  | 246 
> +++
>  5 files changed, 259 insertions(+)
>  create mode 100644 drivers/memory/fsl-corenet-cf.c
> 
> diff --git a/arch/powerpc/configs/corenet32_smp_defconfig
> b/arch/powerpc/configs/corenet32_smp_defconfig
> index c19ff05..0c99d7e 100644
> --- a/arch/powerpc/configs/corenet32_smp_defconfig
> +++ b/arch/powerpc/configs/corenet32_smp_defconfig
> @@ -179,3 +179,4 @@ CONFIG_CRYPTO_SHA512=y  CONFIG_CRYPTO_AES=y  #
> CONFIG_CRYPTO_ANSI_CPRNG is not set  CONFIG_CRYPTO_DEV_FSL_CAAM=y
> +CONFIG_FSL_CORENET_CF=y
> diff --git a/arch/powerpc/configs/corenet64_smp_defconfig
> b/arch/powerpc/configs/corenet64_smp_defconfig
> index 5c7fa19..8fb616d 100644
> --- a/arch/powerpc/configs/corenet64_smp_defconfig
> +++ b/arch/powerpc/configs/corenet64_smp_defconfig
> @@ -175,3 +175,4 @@ CONFIG_CRYPTO_SHA256=y  CONFIG_CRYPTO_SHA512=y  #
> CONFIG_CRYPTO_ANSI_CPRNG is not set  CONFIG_CRYPTO_DEV_FSL_CAAM=y
> +CONFIG_FSL_CORENET_CF=y
> diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig index
> c59e9c9..fab81a1 100644
> --- a/drivers/memory/Kconfig
> +++ b/drivers/memory/Kconfig
> @@ -61,6 +61,16 @@ config TEGRA30_MC
> analysis, especially for IOMMU/SMMU(System Memory Management
> Unit) module.
> 
> +config FSL_CORENET_CF
> + tristate "Freescale CoreNet Error Reporting"
> + depends on FSL_SOC_BOOKE
> + help
> +   Say Y for reporting of errors from the Freescale CoreNet
> +   Coherency Fabric.  Errors reported include accesses to
> +   physical addresses that mapped by no local access window
> +   (LAW) or an invalid LAW, as well as bad cache state that
> +   represents a coherency violation.
> +
>  config FSL_IFC
>   bool
>   depends on FSL_SOC
> diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile index
> 71160a2..4055c47 100644
> --- a/drivers/memory/Makefile
> +++ b/drivers/memory/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_OF)  += of_memory.o
>  endif
>  obj-$(CONFIG_TI_AEMIF)   += ti-aemif.o
>  obj-$(CONFIG_TI_EMIF)+= emif.o
> +obj-$(CONFIG_FSL_CORENET_CF) += fsl-corenet-cf.o
>  obj-$(CONFIG_FSL_IFC)+= fsl_ifc.o
>  obj-$(CONFIG_MVEBU_DEVBUS)   += mvebu-devbus.o
>  obj-$(CONFIG_TEGRA20_MC) += tegra20-mc.o
> diff --git a/drivers/memory/fsl-corenet-cf.c b/drivers/memory/fsl-corenet-cf.c
> new file mode 100644 index 000..a57a614
> --- /dev/null
> +++ b/drivers/memory/fsl-corenet-cf.c
> @@ -0,0 +1,246 @@
> +/*
> + * CoreNet Coherency Fabric error reporting
> + *
> + * Copyright 2014 Freescale Semiconductor Inc.
> + *
> + * This program is free software; you can redistribute  it and/or
> +modify it
> + * under  the terms of  the GNU General  Public License as published by
> +the
> + * Free Software Foundation;  either version 2 of the  License, or (at
> +your
> + * option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +enum ccf_version {
> + CCF1,
> + CCF2,
> +};
> +
> +struct ccf_info {
> + enum ccf_version version;
> + int err_reg_offs;
> +};
> +
> +static const struct ccf_info ccf1_info = {
> + .version = CCF1,
> + .err_reg_offs = 0xa00,
> +};
> +
> +static const struct ccf_info ccf2_info = {
> + .version = CCF2,
> + .err_reg_offs = 0xe40,
> +};
> +
> +static const struct of_device_id ccf_matches[] = {
> + {
> + .compatible = "fsl,corenet1-cf",
> + .data = _info,
> + },
> + {
> + .compatible = "fsl,corenet2-cf",
> + .data = _info,
> + },
> + {}
> +};
> +
> +struct ccf_err_regs {
> + u32 errdet; /* 0x00 Error Detect Register */
> + /* 0x04 Error Enable (ccf1)/Disable (ccf2) Register */
> + u32 errdis;
> + /* 0x08 Error Interrupt Enable Register (ccf2 only) */
> +  

RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-04 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Scott Wood
 Sent: Saturday, May 31, 2014 3:58 AM
 To: Greg Kroah-Hartman
 Cc: linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org
 Subject: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
 reporting driver
 
 The CoreNet Coherency Fabric is part of the memory subsystem on some Freescale
 QorIQ chips.  It can report coherency violations (e.g.
 due to misusing memory that is mapped noncoherent) as well as transactions 
 that
 do not hit any local access window, or which hit a local access window with an
 invalid target ID.
 
 Signed-off-by: Scott Wood scottw...@freescale.com
 ---
 Resending to the proper list addresses -- sorry for the duplicate.
 
  arch/powerpc/configs/corenet32_smp_defconfig |   1 +
  arch/powerpc/configs/corenet64_smp_defconfig |   1 +
  drivers/memory/Kconfig   |  10 ++
  drivers/memory/Makefile  |   1 +
  drivers/memory/fsl-corenet-cf.c  | 246 
 +++
  5 files changed, 259 insertions(+)
  create mode 100644 drivers/memory/fsl-corenet-cf.c
 
 diff --git a/arch/powerpc/configs/corenet32_smp_defconfig
 b/arch/powerpc/configs/corenet32_smp_defconfig
 index c19ff05..0c99d7e 100644
 --- a/arch/powerpc/configs/corenet32_smp_defconfig
 +++ b/arch/powerpc/configs/corenet32_smp_defconfig
 @@ -179,3 +179,4 @@ CONFIG_CRYPTO_SHA512=y  CONFIG_CRYPTO_AES=y  #
 CONFIG_CRYPTO_ANSI_CPRNG is not set  CONFIG_CRYPTO_DEV_FSL_CAAM=y
 +CONFIG_FSL_CORENET_CF=y
 diff --git a/arch/powerpc/configs/corenet64_smp_defconfig
 b/arch/powerpc/configs/corenet64_smp_defconfig
 index 5c7fa19..8fb616d 100644
 --- a/arch/powerpc/configs/corenet64_smp_defconfig
 +++ b/arch/powerpc/configs/corenet64_smp_defconfig
 @@ -175,3 +175,4 @@ CONFIG_CRYPTO_SHA256=y  CONFIG_CRYPTO_SHA512=y  #
 CONFIG_CRYPTO_ANSI_CPRNG is not set  CONFIG_CRYPTO_DEV_FSL_CAAM=y
 +CONFIG_FSL_CORENET_CF=y
 diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig index
 c59e9c9..fab81a1 100644
 --- a/drivers/memory/Kconfig
 +++ b/drivers/memory/Kconfig
 @@ -61,6 +61,16 @@ config TEGRA30_MC
 analysis, especially for IOMMU/SMMU(System Memory Management
 Unit) module.
 
 +config FSL_CORENET_CF
 + tristate Freescale CoreNet Error Reporting
 + depends on FSL_SOC_BOOKE
 + help
 +   Say Y for reporting of errors from the Freescale CoreNet
 +   Coherency Fabric.  Errors reported include accesses to
 +   physical addresses that mapped by no local access window
 +   (LAW) or an invalid LAW, as well as bad cache state that
 +   represents a coherency violation.
 +
  config FSL_IFC
   bool
   depends on FSL_SOC
 diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile index
 71160a2..4055c47 100644
 --- a/drivers/memory/Makefile
 +++ b/drivers/memory/Makefile
 @@ -7,6 +7,7 @@ obj-$(CONFIG_OF)  += of_memory.o
  endif
  obj-$(CONFIG_TI_AEMIF)   += ti-aemif.o
  obj-$(CONFIG_TI_EMIF)+= emif.o
 +obj-$(CONFIG_FSL_CORENET_CF) += fsl-corenet-cf.o
  obj-$(CONFIG_FSL_IFC)+= fsl_ifc.o
  obj-$(CONFIG_MVEBU_DEVBUS)   += mvebu-devbus.o
  obj-$(CONFIG_TEGRA20_MC) += tegra20-mc.o
 diff --git a/drivers/memory/fsl-corenet-cf.c b/drivers/memory/fsl-corenet-cf.c
 new file mode 100644 index 000..a57a614
 --- /dev/null
 +++ b/drivers/memory/fsl-corenet-cf.c
 @@ -0,0 +1,246 @@
 +/*
 + * CoreNet Coherency Fabric error reporting
 + *
 + * Copyright 2014 Freescale Semiconductor Inc.
 + *
 + * This program is free software; you can redistribute  it and/or
 +modify it
 + * under  the terms of  the GNU General  Public License as published by
 +the
 + * Free Software Foundation;  either version 2 of the  License, or (at
 +your
 + * option) any later version.
 + */
 +
 +#include linux/interrupt.h
 +#include linux/io.h
 +#include linux/irq.h
 +#include linux/module.h
 +#include linux/of.h
 +#include linux/of_address.h
 +#include linux/of_device.h
 +#include linux/of_irq.h
 +#include linux/platform_device.h
 +
 +enum ccf_version {
 + CCF1,
 + CCF2,
 +};
 +
 +struct ccf_info {
 + enum ccf_version version;
 + int err_reg_offs;
 +};
 +
 +static const struct ccf_info ccf1_info = {
 + .version = CCF1,
 + .err_reg_offs = 0xa00,
 +};
 +
 +static const struct ccf_info ccf2_info = {
 + .version = CCF2,
 + .err_reg_offs = 0xe40,
 +};
 +
 +static const struct of_device_id ccf_matches[] = {
 + {
 + .compatible = fsl,corenet1-cf,
 + .data = ccf1_info,
 + },
 + {
 + .compatible = fsl,corenet2-cf,
 + .data = ccf2_info,
 + },
 + {}
 +};
 +
 +struct ccf_err_regs {
 + u32 errdet; /* 0x00 Error Detect Register */
 + /* 0x04 Error Enable (ccf1)/Disable (ccf2) Register */
 + u32 errdis;
 + /* 0x08 Error Interrupt Enable Register 

RE: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error reporting driver

2014-06-04 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, June 04, 2014 10:12 PM
 To: Bhushan Bharat-R65777
 Cc: Greg Kroah-Hartman; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [RESEND PATCH] memory: Freescale CoreNet Coherency Fabric error
 reporting driver
 
 On Wed, 2014-06-04 at 03:17 -0500, Bhushan Bharat-R65777 wrote:
   +struct ccf_err_regs {
   + u32 errdet; /* 0x00 Error Detect Register */
   + /* 0x04 Error Enable (ccf1)/Disable (ccf2) Register */
   + u32 errdis;
   + /* 0x08 Error Interrupt Enable Register (ccf2 only) */
   + u32 errinten;
   + u32 cecar;  /* 0x0c Error Capture Attribute Register */
   + u32 cecadrh;/* 0x10 Error Capture Address High */
 
  s/cecadrh/cecaddrh/g
  This way we will be consistent with Reference manual.
 
 It's cecadrh in ccf1 and cecaddrh in ccf2.  I suppose I should use the
 latter since errdet/errdis/errinten are the ccf2 names.
 
   + u32 cecadrl;/* 0x14 Error Capture Address Low */
 
  s/cecadrl/cecaddrl/g
 
   + u32 cecar2; /* 0x18 Error Capture Attribute Register 2 */
   +};
   +
   +/* LAE/CV also valid for errdis and errinten */
   +#define ERRDET_LAE   (1  0)  /* Local Access Error */
   +#define ERRDET_CV(1  1)  /* Coherency Violation */
   +#define ERRDET_CTYPE_SHIFT   26/* Capture Type (ccf2 only) */
   +#define ERRDET_CTYPE_MASK(0x3f  ERRDET_CTYPE_SHIFT)
 
  Should not this be (0x1f  ERRDET_CTYPE_SHIFT)
 
 Yes, thanks for catching that.
 
   +#define ERRDET_CAP   (1  31) /* Capture Valid (ccf2 only) 
   */
   +
   +#define CECAR_VAL(1  0)  /* Valid (ccf1 only) */
   +#define CECAR_UVT(1  15) /* Unavailable target ID 
   (ccf1) */
   +#define CECAR_SRCID_SHIFT_CCF1   24
   +#define CECAR_SRCID_MASK_CCF1(0xff  CECAR_SRCID_SHIFT_CCF1)
   +#define CECAR_SRCID_SHIFT_CCF2   18
   +#define CECAR_SRCID_MASK_CCF2(0xff  CECAR_SRCID_SHIFT_CCF2)
   +
   +#define CECADRH_ADDRH0xf
 
  On ccf2 this id 0xff.
 
 OK.  I think we can get away with using 0xff on both.
 
   +static int ccf_remove(struct platform_device *pdev) {
   + struct ccf_private *ccf = dev_get_drvdata(pdev-dev);
   +
   + switch (ccf-info-version) {
   + case CCF1:
   + iowrite32be(0, ccf-err_regs-errdis);
   + break;
   +
   + case CCF2:
   + iowrite32be(0, ccf-err_regs-errinten);
 
  Do you think it is same to disable detection bits in ccf-err_regs-errdis?
 
 Disabling the interrupt is what we're aiming for here, but ccf1 doesn't 
 provide
 a way to do that separate from disabling detection.

What I wanted to say that do we also need to disable detection (set ERRDET_LAE 
| ERRDET_CV bits in errdis) apart from clearing errinten on ccf2 ?

Thanks
-Bharat

 
 -Scott
 



RE: [PATCH] iommu: fsl_pamu.c: Fix for possible null pointer dereference

2014-05-20 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Rickard Strandqvist
> Sent: Saturday, May 17, 2014 10:47 PM
> To: Joerg Roedel; Grant Likely
> Cc: devicet...@vger.kernel.org; io...@lists.linux-foundation.org; Rob Herring;
> linux-kernel@vger.kernel.org; Rickard Strandqvist
> Subject: [PATCH] iommu: fsl_pamu.c: Fix for possible null pointer dereference
> 
> There is otherwise a risk of a possible null pointer dereference.

You are right. 

> 
> Was largely found by using a static code analysis program called cppcheck.
> 
> Signed-off-by: Rickard Strandqvist 

Reviewed-by: Bharat Bhushan 


Thanks
-Bharat

> ---
>  drivers/iommu/fsl_pamu.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index
> cba0498..b99dd88 100644
> --- a/drivers/iommu/fsl_pamu.c
> +++ b/drivers/iommu/fsl_pamu.c
> @@ -592,8 +592,7 @@ found_cpu_node:
>   /* advance to next node in cache hierarchy */
>   node = of_find_node_by_phandle(*prop);
>   if (!node) {
> - pr_debug("Invalid node for cache hierarchy %s\n",
> - node->full_name);
> + pr_debug("Invalid node for cache hierarchy\n");
>   return ~(u32)0;
>   }
>   }
> --
> 1.7.10.4
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] iommu: fsl_pamu.c: Fix for possible null pointer dereference

2014-05-20 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
 boun...@lists.linux-foundation.org] On Behalf Of Rickard Strandqvist
 Sent: Saturday, May 17, 2014 10:47 PM
 To: Joerg Roedel; Grant Likely
 Cc: devicet...@vger.kernel.org; io...@lists.linux-foundation.org; Rob Herring;
 linux-kernel@vger.kernel.org; Rickard Strandqvist
 Subject: [PATCH] iommu: fsl_pamu.c: Fix for possible null pointer dereference
 
 There is otherwise a risk of a possible null pointer dereference.

You are right. 

 
 Was largely found by using a static code analysis program called cppcheck.
 
 Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se

Reviewed-by: Bharat Bhushan bharat.bhus...@freescale.com


Thanks
-Bharat

 ---
  drivers/iommu/fsl_pamu.c |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index
 cba0498..b99dd88 100644
 --- a/drivers/iommu/fsl_pamu.c
 +++ b/drivers/iommu/fsl_pamu.c
 @@ -592,8 +592,7 @@ found_cpu_node:
   /* advance to next node in cache hierarchy */
   node = of_find_node_by_phandle(*prop);
   if (!node) {
 - pr_debug(Invalid node for cache hierarchy %s\n,
 - node-full_name);
 + pr_debug(Invalid node for cache hierarchy\n);
   return ~(u32)0;
   }
   }
 --
 1.7.10.4
 
 ___
 iommu mailing list
 io...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/iommu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support

2014-01-21 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, January 21, 2014 7:06 PM
> To: Bhushan Bharat-R65777
> Cc: Sethi Varun-B16395; io...@lists.linux-foundation.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> 
> On Tue, 2014-01-21 at 07:27 +, bharat.bhus...@freescale.com wrote:
> > > + domain->domain = iommu_domain_alloc(domain->bus);
> > > + if (!domain->domain) {
> > > + ret = -EIO;
> > > + goto out_free;
> > > + }
> > > +
> > > + ret = iommu_attach_group(domain->domain, iommu_group);
> > > + if (ret)
> > > + goto out_domain;
> > > +
> > > + INIT_LIST_HEAD(>group_list);
> > > + list_add(>next, >group_list);
> > > +
> > > + if (!allow_unsafe_interrupts &&
> > > + !iommu_domain_has_cap(domain->domain, IOMMU_CAP_INTR_REMAP)) {
> > > + pr_warn("%s: No interrupt remapping support.  Use the module
> > > +param
> > > \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this
> > > platform\n",
> > > +__func__);
> > > + ret = -EPERM;
> > > + goto out_detach;
> > > + }
> > > +
> > > + if (iommu_domain_has_cap(domain->domain, IOMMU_CAP_CACHE_COHERENCY))
> > > + domain->prot |= IOMMU_CACHE;
> > > +
> > > + /* Try to match an existing compatible domain. */
> > > + list_for_each_entry(d, >domain_list, next) {
> > > + if (d->bus == domain->bus && d->prot == domain->prot) {
> >
> > Are not we talking about allowing a domain to support different bus type if
> domain/iommu-group properties matches.
> 
> This is where I was suggesting to Varun that we could test iommu_ops instead 
> of
> bus_type.
> 
> > > + iommu_detach_group(domain->domain, iommu_group);
> > > + if (!iommu_attach_group(d->domain, iommu_group)) {
> > > + list_add(>next, >group_list);
> > > + iommu_domain_free(domain->domain);
> > > + kfree(domain);
> > > + mutex_unlock(>lock);
> > > + return 0;
> > > + }
> > > +
> > > + ret = iommu_attach_group(domain->domain, iommu_group);
> > > + if (ret)
> > > + goto out_domain;
> > > + }
> > > + }
> > > +
> > > + /* replay mappings on new domains */
> > > + ret = vfio_iommu_replay(iommu, domain);
> >
> > IIUC; We created a new domain because an already existing domain does
> > not have same attribute; say domain->prot; But in vfio_iommu_replay() we 
> > pick
> up any domain, first domain, and create mapping accordingly.
> > Should not we use attributes of this domain otherwise we may get "reserved 
> > bit
> faults"?
> 
> We use an existing domain to get the iova to physical mappings, should those 
> not
> be consistent regardless of the domain we pick?  We're not using any of the 
> low
> level attributes that could cause something like a reserved bit fault.  
> Thanks,

You are right, we use dma->addr etc from any domain and but uses "prot" from 
the domain passed to replay function().
So effectively the only difference (from dma mapping perspective) between 
domains in a container is "prot"

Thanks
-Bharat
> 
> Alex
> 
> 



RE: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support

2014-01-21 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Tuesday, January 21, 2014 7:06 PM
 To: Bhushan Bharat-R65777
 Cc: Sethi Varun-B16395; io...@lists.linux-foundation.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
 
 On Tue, 2014-01-21 at 07:27 +, bharat.bhus...@freescale.com wrote:
   + domain-domain = iommu_domain_alloc(domain-bus);
   + if (!domain-domain) {
   + ret = -EIO;
   + goto out_free;
   + }
   +
   + ret = iommu_attach_group(domain-domain, iommu_group);
   + if (ret)
   + goto out_domain;
   +
   + INIT_LIST_HEAD(domain-group_list);
   + list_add(group-next, domain-group_list);
   +
   + if (!allow_unsafe_interrupts 
   + !iommu_domain_has_cap(domain-domain, IOMMU_CAP_INTR_REMAP)) {
   + pr_warn(%s: No interrupt remapping support.  Use the module
   +param
   \allow_unsafe_interrupts\ to enable VFIO IOMMU support on this
   platform\n,
   +__func__);
   + ret = -EPERM;
   + goto out_detach;
   + }
   +
   + if (iommu_domain_has_cap(domain-domain, IOMMU_CAP_CACHE_COHERENCY))
   + domain-prot |= IOMMU_CACHE;
   +
   + /* Try to match an existing compatible domain. */
   + list_for_each_entry(d, iommu-domain_list, next) {
   + if (d-bus == domain-bus  d-prot == domain-prot) {
 
  Are not we talking about allowing a domain to support different bus type if
 domain/iommu-group properties matches.
 
 This is where I was suggesting to Varun that we could test iommu_ops instead 
 of
 bus_type.
 
   + iommu_detach_group(domain-domain, iommu_group);
   + if (!iommu_attach_group(d-domain, iommu_group)) {
   + list_add(group-next, d-group_list);
   + iommu_domain_free(domain-domain);
   + kfree(domain);
   + mutex_unlock(iommu-lock);
   + return 0;
   + }
   +
   + ret = iommu_attach_group(domain-domain, iommu_group);
   + if (ret)
   + goto out_domain;
   + }
   + }
   +
   + /* replay mappings on new domains */
   + ret = vfio_iommu_replay(iommu, domain);
 
  IIUC; We created a new domain because an already existing domain does
  not have same attribute; say domain-prot; But in vfio_iommu_replay() we 
  pick
 up any domain, first domain, and create mapping accordingly.
  Should not we use attributes of this domain otherwise we may get reserved 
  bit
 faults?
 
 We use an existing domain to get the iova to physical mappings, should those 
 not
 be consistent regardless of the domain we pick?  We're not using any of the 
 low
 level attributes that could cause something like a reserved bit fault.  
 Thanks,

You are right, we use dma-addr etc from any domain and but uses prot from 
the domain passed to replay function().
So effectively the only difference (from dma mapping perspective) between 
domains in a container is prot

Thanks
-Bharat
 
 Alex
 
 



RE: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support

2014-01-20 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
> Sent: Saturday, January 18, 2014 2:06 AM
> To: Sethi Varun-B16395
> Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> 
> RFC: This is not complete but I want to share with Varun the
> dirrection I'm thinking about.  In particular, I'm really not
> sure if we want to introduce a "v2" interface version with
> slightly different unmap semantics.  QEMU doesn't care about
> the difference, but other users might.  Be warned, I'm not even
> sure if this code works at the moment.  Thanks,
> 
> Alex
> 
> 
> We currently have a problem that we cannot support advanced features
> of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
> that those features will be supported by all of the hardware units
> involved with the domain over its lifetime.  For instance, the Intel
> VT-d architecture does not require that all DRHDs support snoop
> control.  If we create a domain based on a device behind a DRHD that
> does support snoop control and enable SNP support via the IOMMU_CACHE
> mapping option, we cannot then add a device behind a DRHD which does
> not support snoop control or we'll get reserved bit faults from the
> SNP bit in the pagetables.  To add to the complexity, we can't know
> the properties of a domain until a device is attached.
> 
> We could pass this problem off to userspace and require that a
> separate vfio container be used, but we don't know how to handle page
> accounting in that case.  How do we know that a page pinned in one
> container is the same page as a different container and avoid double
> billing the user for the page.
> 
> The solution is therefore to support multiple IOMMU domains per
> container.  In the majority of cases, only one domain will be required
> since hardware is typically consistent within a system.  However, this
> provides us the ability to validate compatibility of domains and
> support mixed environments where page table flags can be different
> between domains.
> 
> To do this, our DMA tracking needs to change.  We currently try to
> coalesce user mappings into as few tracking entries as possible.  The
> problem then becomes that we lose granularity of user mappings.  We've
> never guaranteed that a user is able to unmap at a finer granularity
> than the original mapping, but we must honor the granularity of the
> original mapping.  This coalescing code is therefore removed, allowing
> only unmaps covering complete maps.  The change in accounting is
> fairly small here, a typical QEMU VM will start out with roughly a
> dozen entries, so it's arguable if this coalescing was ever needed.
> 
> We also move IOMMU domain creation to the point where a group is
> attached to the container.  An interesting side-effect of this is that
> we now have access to the device at the time of domain creation and
> can probe the devices within the group to determine the bus_type.
> This finally makes vfio_iommu_type1 completely device/bus agnostic.
> In fact, each IOMMU domain can host devices on different buses managed
> by different physical IOMMUs, and present a single DMA mapping
> interface to the user.  When a new domain is created, mappings are
> replayed to bring the IOMMU pagetables up to the state of the current
> container.  And of course, DMA mapping and unmapping automatically
> traverse all of the configured IOMMU domains.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  drivers/vfio/vfio_iommu_type1.c |  631 
> ---
>  include/uapi/linux/vfio.h   |1
>  2 files changed, 329 insertions(+), 303 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4fb7a8f..983aae5 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,7 +30,6 @@
>  #include 
>  #include 
>  #include 
> -#include/* pci_bus_type */
>  #include 
>  #include 
>  #include 
> @@ -55,11 +54,18 @@ MODULE_PARM_DESC(disable_hugepages,
>"Disable VFIO IOMMU support for IOMMU hugepages.");
> 
>  struct vfio_iommu {
> - struct iommu_domain *domain;
> + struct list_headdomain_list;
>   struct mutexlock;
>   struct rb_root  dma_list;
> + bool v2;
> +};
> +
> +struct vfio_domain {
> + struct iommu_domain *domain;
> + struct bus_type *bus;
> + struct list_headnext;
>   struct list_headgroup_list;
> - boolcache;
> + int prot;   /* IOMMU_CACHE */
>  };
> 
>  struct vfio_dma {
> @@ -99,7 +105,7 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu
> *iommu,
>   return NULL;
>  }
> 
> -static void vfio_insert_dma(struct vfio_iommu *iommu, struct 

RE: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support

2014-01-20 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
 boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
 Sent: Saturday, January 18, 2014 2:06 AM
 To: Sethi Varun-B16395
 Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
 Subject: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
 
 RFC: This is not complete but I want to share with Varun the
 dirrection I'm thinking about.  In particular, I'm really not
 sure if we want to introduce a v2 interface version with
 slightly different unmap semantics.  QEMU doesn't care about
 the difference, but other users might.  Be warned, I'm not even
 sure if this code works at the moment.  Thanks,
 
 Alex
 
 
 We currently have a problem that we cannot support advanced features
 of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
 that those features will be supported by all of the hardware units
 involved with the domain over its lifetime.  For instance, the Intel
 VT-d architecture does not require that all DRHDs support snoop
 control.  If we create a domain based on a device behind a DRHD that
 does support snoop control and enable SNP support via the IOMMU_CACHE
 mapping option, we cannot then add a device behind a DRHD which does
 not support snoop control or we'll get reserved bit faults from the
 SNP bit in the pagetables.  To add to the complexity, we can't know
 the properties of a domain until a device is attached.
 
 We could pass this problem off to userspace and require that a
 separate vfio container be used, but we don't know how to handle page
 accounting in that case.  How do we know that a page pinned in one
 container is the same page as a different container and avoid double
 billing the user for the page.
 
 The solution is therefore to support multiple IOMMU domains per
 container.  In the majority of cases, only one domain will be required
 since hardware is typically consistent within a system.  However, this
 provides us the ability to validate compatibility of domains and
 support mixed environments where page table flags can be different
 between domains.
 
 To do this, our DMA tracking needs to change.  We currently try to
 coalesce user mappings into as few tracking entries as possible.  The
 problem then becomes that we lose granularity of user mappings.  We've
 never guaranteed that a user is able to unmap at a finer granularity
 than the original mapping, but we must honor the granularity of the
 original mapping.  This coalescing code is therefore removed, allowing
 only unmaps covering complete maps.  The change in accounting is
 fairly small here, a typical QEMU VM will start out with roughly a
 dozen entries, so it's arguable if this coalescing was ever needed.
 
 We also move IOMMU domain creation to the point where a group is
 attached to the container.  An interesting side-effect of this is that
 we now have access to the device at the time of domain creation and
 can probe the devices within the group to determine the bus_type.
 This finally makes vfio_iommu_type1 completely device/bus agnostic.
 In fact, each IOMMU domain can host devices on different buses managed
 by different physical IOMMUs, and present a single DMA mapping
 interface to the user.  When a new domain is created, mappings are
 replayed to bring the IOMMU pagetables up to the state of the current
 container.  And of course, DMA mapping and unmapping automatically
 traverse all of the configured IOMMU domains.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  drivers/vfio/vfio_iommu_type1.c |  631 
 ---
  include/uapi/linux/vfio.h   |1
  2 files changed, 329 insertions(+), 303 deletions(-)
 
 diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
 index 4fb7a8f..983aae5 100644
 --- a/drivers/vfio/vfio_iommu_type1.c
 +++ b/drivers/vfio/vfio_iommu_type1.c
 @@ -30,7 +30,6 @@
  #include linux/iommu.h
  #include linux/module.h
  #include linux/mm.h
 -#include linux/pci.h   /* pci_bus_type */
  #include linux/rbtree.h
  #include linux/sched.h
  #include linux/slab.h
 @@ -55,11 +54,18 @@ MODULE_PARM_DESC(disable_hugepages,
Disable VFIO IOMMU support for IOMMU hugepages.);
 
  struct vfio_iommu {
 - struct iommu_domain *domain;
 + struct list_headdomain_list;
   struct mutexlock;
   struct rb_root  dma_list;
 + bool v2;
 +};
 +
 +struct vfio_domain {
 + struct iommu_domain *domain;
 + struct bus_type *bus;
 + struct list_headnext;
   struct list_headgroup_list;
 - boolcache;
 + int prot;   /* IOMMU_CACHE */
  };
 
  struct vfio_dma {
 @@ -99,7 +105,7 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu
 *iommu,
   return NULL;
  }
 
 -static void vfio_insert_dma(struct vfio_iommu *iommu, struct vfio_dma 

RE: [PATCH v11] PPC: POWERNV: move iommu_add_device earlier

2013-12-13 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
> Sent: Thursday, December 12, 2013 1:24 PM
> To: linuxppc-...@lists.ozlabs.org
> Cc: Alexey Kardashevskiy; Benjamin Herrenschmidt; Bhushan Bharat-R65777; Alex
> Graf; linux-kernel@vger.kernel.org
> Subject: [PATCH v11] PPC: POWERNV: move iommu_add_device earlier
> 
> The current implementation of IOMMU on sPAPR does not use iommu_ops
> and therefore does not call IOMMU API's bus_set_iommu() which
> 1) sets iommu_ops for a bus
> 2) registers a bus notifier
> Instead, PCI devices are added to IOMMU groups from
> subsys_initcall_sync(tce_iommu_init) which does basically the same
> thing without using iommu_ops callbacks.
> 
> However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
> implements iommu_ops and when tce_iommu_init is called, every PCI device
> is already added to some group so there is a conflict.
> 
> This patch does 2 things:
> 1. removes the loop in which PCI devices were added to groups and
> adds explicit iommu_add_device() calls to add devices as soon as they get
> the iommu_table pointer assigned to them.
> 2. moves a bus notifier to powernv code in order to avoid conflict with
> the notifier from Freescale driver.
> 
> iommu_add_device() and iommu_del_device() are public now.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> Changes:
> v11:
> * rebased on upstream
> 
> v10:
> * fixed linker error when IOMMU_API is not enabled
> 
> v9:
> * removed "KVM" from the subject as it is not really a KVM patch so
> PPC mainainter (hi Ben!) can review/include it into his tree
> 
> v8:
> * added the check for iommu_group!=NULL before removing device from a group
> as suggested by Wei Yang 
> 
> v2:
> * added a helper - set_iommu_table_base_and_group - which does
> set_iommu_table_base() and iommu_add_device()
> ---
>  arch/powerpc/include/asm/iommu.h| 26 
>  arch/powerpc/kernel/iommu.c | 11 --
>  arch/powerpc/platforms/powernv/pci-ioda.c   |  8 
>  arch/powerpc/platforms/powernv/pci-p5ioc2.c |  2 +-
>  arch/powerpc/platforms/powernv/pci.c| 31 
> -
>  arch/powerpc/platforms/pseries/iommu.c  |  8 +---
>  6 files changed, 70 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index c34656a..774fa27 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -101,8 +101,34 @@ extern void iommu_free_table(struct iommu_table *tbl, 
> const
> char *node_name);
>   */
>  extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
>   int nid);
> +#ifdef CONFIG_IOMMU_API
>  extern void iommu_register_group(struct iommu_table *tbl,
>int pci_domain_number, unsigned long pe_num);
> +extern int iommu_add_device(struct device *dev);
> +extern void iommu_del_device(struct device *dev);
> +#else
> +static inline void iommu_register_group(struct iommu_table *tbl,
> + int pci_domain_number,
> + unsigned long pe_num)
> +{
> +}
> +
> +static inline int iommu_add_device(struct device *dev)
> +{
> + return 0;
> +}
> +
> +static inline void iommu_del_device(struct device *dev)
> +{
> +}
> +#endif /* !CONFIG_IOMMU_API */
> +
> +static inline void set_iommu_table_base_and_group(struct device *dev,
> +   void *base)
> +{
> + set_iommu_table_base(dev, base);
> + iommu_add_device(dev);
> +}
> 
>  extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl,
>   struct scatterlist *sglist, int nelems,
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 572bb5b..818a092 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1105,7 +1105,7 @@ void iommu_release_ownership(struct iommu_table *tbl)
>  }
>  EXPORT_SYMBOL_GPL(iommu_release_ownership);
> 
> -static int iommu_add_device(struct device *dev)
> +int iommu_add_device(struct device *dev)
>  {
>   struct iommu_table *tbl;
>   int ret = 0;
> @@ -1134,11 +1134,13 @@ static int iommu_add_device(struct device *dev)
> 
>   return ret;
>  }
> +EXPORT_SYMBOL_GPL(iommu_add_device);
> 
> -static void iommu_del_device(struct device *dev)
> +void iommu_del_device(struct device *dev)
>  {
>   iommu_group_remove_device(dev);
>  }
> +EXPORT_SYMBOL_GPL(iommu_del_device);
> 
>  static int iommu_bus_notifier(struct notifier_block *nb,
> unsigned long action, void *data)
> @@ -1162,13 +1164,8 @@ static struct notifier_block tce_iommu_bus_nb = {
> 
>  static int __init tce_iommu_init(void)
>  {
> - struct pci_dev *pdev = NULL;
> -
>   BUILD_BUG_ON(PAGE_SIZE < IOMMU_PAGE_SIZE);
> 
> - for_each_pci_dev(pdev)
> - 

RE: [1/3] powerpc/vfio: Enable on POWERNV platform

2013-12-13 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Wood Scott-B07421
> Sent: Saturday, December 14, 2013 2:33 AM
> To: Alexey Kardashevskiy
> Cc: linuxppc-...@lists.ozlabs.org; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Alex Williamson; Paul Mackerras; David Gibson; Sethi
> Varun-B16395; Bhushan Bharat-R65777
> Subject: Re: [1/3] powerpc/vfio: Enable on POWERNV platform
> 
> On Fri, 2013-12-13 at 14:02 +1100, Alexey Kardashevskiy wrote:
> > On 12/13/2013 10:35 AM, Scott Wood wrote:
> > > On Tue, May 21, 2013 at 01:33:09PM +1000, Alexey Kardashevskiy wrote:
> > >> +static int iommu_add_device(struct device *dev) {
> > >> +struct iommu_table *tbl;
> > >> +int ret = 0;
> > >> +
> > >> +if (WARN_ON(dev->iommu_group)) {
> > >> +pr_warn("iommu_tce: device %s is already in iommu group 
> > >> %d,
> skipping\n",
> > >> +dev_name(dev),
> > >> +iommu_group_id(dev->iommu_group));
> > >> +return -EBUSY;
> > >> +}
> > > [snip]
> > >> +static int __init tce_iommu_init(void) {
> > >> +struct pci_dev *pdev = NULL;
> > >> +
> > >> +BUILD_BUG_ON(PAGE_SIZE < IOMMU_PAGE_SIZE);
> > >> +
> > >> +for_each_pci_dev(pdev)
> > >> +iommu_add_device(>dev);
> > >> +
> > >> +bus_register_notifier(_bus_type, _iommu_bus_nb);
> > >> +return 0;
> > >> +}
> > >> +
> > >> +subsys_initcall_sync(tce_iommu_init);
> > >
> > > This is missing a check to see whether the appropriate hardware is
> > > present.  This file should also be renamed to something less
> > > generic, and depend on a kconfig symbol more specific than CONFIG_PPC64.
> > >
> > > When this is combined with CONFIG_FSL_PAMU on hardware with a PAMU,
> > > I get a bunch of those "WARN_ON(dev->iommu_group)" dumps because
> > > PAMU already got to them.  Presumably without PAMU it silently (or
> > > with just pr_debug) bails out at some other point.
> >
> >
> > I posted (yet again) yesterday "[PATCH v11] PPC: POWERNV: move
> > iommu_add_device earlier" which should fix this. And Bharat asked many
> > times for this to get accepted :)
> 
> I still get the WARN_ONs even with that patch.  You're still registering the 
> bus
> notifier unconditionally.

I have not tried v11 but tested V9 version of that patch. And yes, in that 
version the bus notifier was not registered unconditionally in kernel/iommu.c .

Thanks
-Bharat

> 
> -Scott
> 



RE: [1/3] powerpc/vfio: Enable on POWERNV platform

2013-12-13 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, December 14, 2013 2:33 AM
 To: Alexey Kardashevskiy
 Cc: linuxppc-...@lists.ozlabs.org; k...@vger.kernel.org; linux-
 ker...@vger.kernel.org; Alex Williamson; Paul Mackerras; David Gibson; Sethi
 Varun-B16395; Bhushan Bharat-R65777
 Subject: Re: [1/3] powerpc/vfio: Enable on POWERNV platform
 
 On Fri, 2013-12-13 at 14:02 +1100, Alexey Kardashevskiy wrote:
  On 12/13/2013 10:35 AM, Scott Wood wrote:
   On Tue, May 21, 2013 at 01:33:09PM +1000, Alexey Kardashevskiy wrote:
   +static int iommu_add_device(struct device *dev) {
   +struct iommu_table *tbl;
   +int ret = 0;
   +
   +if (WARN_ON(dev-iommu_group)) {
   +pr_warn(iommu_tce: device %s is already in iommu group 
   %d,
 skipping\n,
   +dev_name(dev),
   +iommu_group_id(dev-iommu_group));
   +return -EBUSY;
   +}
   [snip]
   +static int __init tce_iommu_init(void) {
   +struct pci_dev *pdev = NULL;
   +
   +BUILD_BUG_ON(PAGE_SIZE  IOMMU_PAGE_SIZE);
   +
   +for_each_pci_dev(pdev)
   +iommu_add_device(pdev-dev);
   +
   +bus_register_notifier(pci_bus_type, tce_iommu_bus_nb);
   +return 0;
   +}
   +
   +subsys_initcall_sync(tce_iommu_init);
  
   This is missing a check to see whether the appropriate hardware is
   present.  This file should also be renamed to something less
   generic, and depend on a kconfig symbol more specific than CONFIG_PPC64.
  
   When this is combined with CONFIG_FSL_PAMU on hardware with a PAMU,
   I get a bunch of those WARN_ON(dev-iommu_group) dumps because
   PAMU already got to them.  Presumably without PAMU it silently (or
   with just pr_debug) bails out at some other point.
 
 
  I posted (yet again) yesterday [PATCH v11] PPC: POWERNV: move
  iommu_add_device earlier which should fix this. And Bharat asked many
  times for this to get accepted :)
 
 I still get the WARN_ONs even with that patch.  You're still registering the 
 bus
 notifier unconditionally.

I have not tried v11 but tested V9 version of that patch. And yes, in that 
version the bus notifier was not registered unconditionally in kernel/iommu.c .

Thanks
-Bharat

 
 -Scott
 



RE: [PATCH v11] PPC: POWERNV: move iommu_add_device earlier

2013-12-13 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
 Sent: Thursday, December 12, 2013 1:24 PM
 To: linuxppc-...@lists.ozlabs.org
 Cc: Alexey Kardashevskiy; Benjamin Herrenschmidt; Bhushan Bharat-R65777; Alex
 Graf; linux-kernel@vger.kernel.org
 Subject: [PATCH v11] PPC: POWERNV: move iommu_add_device earlier
 
 The current implementation of IOMMU on sPAPR does not use iommu_ops
 and therefore does not call IOMMU API's bus_set_iommu() which
 1) sets iommu_ops for a bus
 2) registers a bus notifier
 Instead, PCI devices are added to IOMMU groups from
 subsys_initcall_sync(tce_iommu_init) which does basically the same
 thing without using iommu_ops callbacks.
 
 However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
 implements iommu_ops and when tce_iommu_init is called, every PCI device
 is already added to some group so there is a conflict.
 
 This patch does 2 things:
 1. removes the loop in which PCI devices were added to groups and
 adds explicit iommu_add_device() calls to add devices as soon as they get
 the iommu_table pointer assigned to them.
 2. moves a bus notifier to powernv code in order to avoid conflict with
 the notifier from Freescale driver.
 
 iommu_add_device() and iommu_del_device() are public now.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v11:
 * rebased on upstream
 
 v10:
 * fixed linker error when IOMMU_API is not enabled
 
 v9:
 * removed KVM from the subject as it is not really a KVM patch so
 PPC mainainter (hi Ben!) can review/include it into his tree
 
 v8:
 * added the check for iommu_group!=NULL before removing device from a group
 as suggested by Wei Yang weiy...@linux.vnet.ibm.com
 
 v2:
 * added a helper - set_iommu_table_base_and_group - which does
 set_iommu_table_base() and iommu_add_device()
 ---
  arch/powerpc/include/asm/iommu.h| 26 
  arch/powerpc/kernel/iommu.c | 11 --
  arch/powerpc/platforms/powernv/pci-ioda.c   |  8 
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |  2 +-
  arch/powerpc/platforms/powernv/pci.c| 31 
 -
  arch/powerpc/platforms/pseries/iommu.c  |  8 +---
  6 files changed, 70 insertions(+), 16 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index c34656a..774fa27 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -101,8 +101,34 @@ extern void iommu_free_table(struct iommu_table *tbl, 
 const
 char *node_name);
   */
  extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
   int nid);
 +#ifdef CONFIG_IOMMU_API
  extern void iommu_register_group(struct iommu_table *tbl,
int pci_domain_number, unsigned long pe_num);
 +extern int iommu_add_device(struct device *dev);
 +extern void iommu_del_device(struct device *dev);
 +#else
 +static inline void iommu_register_group(struct iommu_table *tbl,
 + int pci_domain_number,
 + unsigned long pe_num)
 +{
 +}
 +
 +static inline int iommu_add_device(struct device *dev)
 +{
 + return 0;
 +}
 +
 +static inline void iommu_del_device(struct device *dev)
 +{
 +}
 +#endif /* !CONFIG_IOMMU_API */
 +
 +static inline void set_iommu_table_base_and_group(struct device *dev,
 +   void *base)
 +{
 + set_iommu_table_base(dev, base);
 + iommu_add_device(dev);
 +}
 
  extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl,
   struct scatterlist *sglist, int nelems,
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index 572bb5b..818a092 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -1105,7 +1105,7 @@ void iommu_release_ownership(struct iommu_table *tbl)
  }
  EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 -static int iommu_add_device(struct device *dev)
 +int iommu_add_device(struct device *dev)
  {
   struct iommu_table *tbl;
   int ret = 0;
 @@ -1134,11 +1134,13 @@ static int iommu_add_device(struct device *dev)
 
   return ret;
  }
 +EXPORT_SYMBOL_GPL(iommu_add_device);
 
 -static void iommu_del_device(struct device *dev)
 +void iommu_del_device(struct device *dev)
  {
   iommu_group_remove_device(dev);
  }
 +EXPORT_SYMBOL_GPL(iommu_del_device);
 
  static int iommu_bus_notifier(struct notifier_block *nb,
 unsigned long action, void *data)
 @@ -1162,13 +1164,8 @@ static struct notifier_block tce_iommu_bus_nb = {
 
  static int __init tce_iommu_init(void)
  {
 - struct pci_dev *pdev = NULL;
 -
   BUILD_BUG_ON(PAGE_SIZE  IOMMU_PAGE_SIZE);
 
 - for_each_pci_dev(pdev)
 - iommu_add_device(pdev-dev);
 -
   bus_register_notifier(pci_bus_type, tce_iommu_bus_nb);
   return 0;
  }

Do we 

RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-10 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, December 10, 2013 11:23 AM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-
> B08248; io...@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
> d...@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
> 
> On Tue, 2013-12-10 at 05:37 +, bharat.bhus...@freescale.com wrote:
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Saturday, December 07, 2013 1:00 AM
> > > To: Wood Scott-B07421
> > > Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de;
> > > Yoder Stuart-B08248; io...@lists.linux-foundation.org;
> > > bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
> > > linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale
> > > IOMMU (PAMU)
> > >
> > > On Fri, 2013-12-06 at 12:59 -0600, Scott Wood wrote:
> > > > On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Wood Scott-B07421
> > > > > > Sent: Friday, December 06, 2013 5:52 AM
> > > > > > To: Bhushan Bharat-R65777
> > > > > > Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de;
> > > > > > Yoder Stuart- B08248; io...@lists.linux-foundation.org;
> > > > > > bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
> > > > > > linux-kernel@vger.kernel.org
> > > > > > Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for
> > > > > > Freescale IOMMU (PAMU)
> > > > > >
> > > > > > On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote:
> > > > > > >
> > > > > > > > -Original Message-
> > > > > > > > From: Bhushan Bharat-R65777
> > > > > > > > Sent: Wednesday, November 27, 2013 9:39 PM
> > > > > > > > To: 'Alex Williamson'
> > > > > > > > Cc: Wood Scott-B07421; linux-...@vger.kernel.org;
> > > > > > > > ag...@suse.de; Yoder Stuart- B08248;
> > > > > > > > io...@lists.linux-foundation.org; bhelg...@google.com;
> > > > > > > > linuxppc- d...@lists.ozlabs.org;
> > > > > > > > linux-kernel@vger.kernel.org
> > > > > > > > Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for
> > > > > > > > Freescale IOMMU (PAMU)
> > > > > > > >
> > > > > > > > If we just provide the size of MSI bank to userspace then
> > > > > > > > userspace cannot do anything wrong.
> > > > > > >
> > > > > > > So userspace does not know address, so it cannot mmap and
> > > > > > > cause any
> > > > > > interference by directly reading/writing.
> > > > > >
> > > > > > That's security through obscurity...  Couldn't the malicious
> > > > > > user find out the address via other means, such as
> > > > > > experimentation on another system over which they have full
> > > > > > control?  What would happen if the user reads from their
> > > > > > device's PCI config space?  Or gets the information via some
> > > > > > back door in the PCI device they own?  Or pokes throughout the
> > > > > > address space looking for something that
> > > generates an interrupt to its own device?
> > > > >
> > > > > So how to solve this problem, Any suggestion ?
> > > > >
> > > > > We have to map one window in PAMU for MSIs and a malicious user
> > > > > can ask its device to do DMA to MSI window region with any pair
> > > > > of address and data, which can lead to unexpected MSIs in system?
> > > >
> > > > I don't think there are any solutions other than to limit each
> > > > bank to one user, unless the admin turns some knob that says
> > > > they're OK with the partial loss of isolation.
> > >
> > > Even if the admin does opt-in to an allow_unsafe_interrupts options,
> > > it should still be reasonably difficult for one guest to inte

RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-10 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Tuesday, December 10, 2013 11:23 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-
 B08248; io...@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
 d...@lists.ozlabs.org; linux-kernel@vger.kernel.org
 Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
 
 On Tue, 2013-12-10 at 05:37 +, bharat.bhus...@freescale.com wrote:
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Saturday, December 07, 2013 1:00 AM
   To: Wood Scott-B07421
   Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de;
   Yoder Stuart-B08248; io...@lists.linux-foundation.org;
   bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
   linux-kernel@vger.kernel.org
   Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale
   IOMMU (PAMU)
  
   On Fri, 2013-12-06 at 12:59 -0600, Scott Wood wrote:
On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote:

  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, December 06, 2013 5:52 AM
  To: Bhushan Bharat-R65777
  Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de;
  Yoder Stuart- B08248; io...@lists.linux-foundation.org;
  bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
  linux-kernel@vger.kernel.org
  Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for
  Freescale IOMMU (PAMU)
 
  On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote:
  
-Original Message-
From: Bhushan Bharat-R65777
Sent: Wednesday, November 27, 2013 9:39 PM
To: 'Alex Williamson'
Cc: Wood Scott-B07421; linux-...@vger.kernel.org;
ag...@suse.de; Yoder Stuart- B08248;
io...@lists.linux-foundation.org; bhelg...@google.com;
linuxppc- d...@lists.ozlabs.org;
linux-kernel@vger.kernel.org
Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for
Freescale IOMMU (PAMU)
   
If we just provide the size of MSI bank to userspace then
userspace cannot do anything wrong.
  
   So userspace does not know address, so it cannot mmap and
   cause any
  interference by directly reading/writing.
 
  That's security through obscurity...  Couldn't the malicious
  user find out the address via other means, such as
  experimentation on another system over which they have full
  control?  What would happen if the user reads from their
  device's PCI config space?  Or gets the information via some
  back door in the PCI device they own?  Or pokes throughout the
  address space looking for something that
   generates an interrupt to its own device?

 So how to solve this problem, Any suggestion ?

 We have to map one window in PAMU for MSIs and a malicious user
 can ask its device to do DMA to MSI window region with any pair
 of address and data, which can lead to unexpected MSIs in system?
   
I don't think there are any solutions other than to limit each
bank to one user, unless the admin turns some knob that says
they're OK with the partial loss of isolation.
  
   Even if the admin does opt-in to an allow_unsafe_interrupts options,
   it should still be reasonably difficult for one guest to interfere
   with the other.  I don't think we want to rely on the blind luck of
   making the full MSI bank accessible to multiple guests and hoping they 
   don't
 step on each other.
 
  Not sure how to solve in this case (sharing MSI page)
 
That probably means that vfio needs to manage the space rather than the
 guest.
 
  What you mean by  vfio needs to manage the space rather than the guest?
 
 I mean there needs to be some kernel component managing the contents of the 
 MSI
 page rather than just handing it out to the user and hoping for the best.  The
 user API also needs to remain the same whether the user has the MSI page
 exclusively or it's shared with others (kernel or users).  Thanks,

We have limited number of MSI banks, so we cannot provide explicit MSI bank to 
each VMs.
Below is the summary of msi allocation/ownership model I am thinking of:

Option-1: User-space aware of MSI banks
= 
1 ) Userspace will make GET_MSI_REGION(request number of MSI banks)
- VFIO will allocate requested number of MSI bank;
- If allocation succeed then return number of banks
- If allocation fails then check opt-in flag set by administrator 
(allow_unsafe_interrupts);
 allow_unsafe_interrupts  == 0; Not allowed to share; return FAIL 
(-ENODEV)
 else share MSI bank of another VM.

2) Userspace will adjust geometry size as per number of banks and calls 
SET_GEOMETRY

3) Userspace will do DMA_MAP for its memory

4) Userspace will do MSI_MAP for number of banks

RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-09 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Saturday, December 07, 2013 1:00 AM
> To: Wood Scott-B07421
> Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de; Yoder
> Stuart-B08248; io...@lists.linux-foundation.org; bhelg...@google.com; 
> linuxppc-
> d...@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
> 
> On Fri, 2013-12-06 at 12:59 -0600, Scott Wood wrote:
> > On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote:
> > >
> > > > -Original Message-
> > > > From: Wood Scott-B07421
> > > > Sent: Friday, December 06, 2013 5:52 AM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de;
> > > > Yoder Stuart- B08248; io...@lists.linux-foundation.org;
> > > > bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
> > > > linux-kernel@vger.kernel.org
> > > > Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale
> > > > IOMMU (PAMU)
> > > >
> > > > On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Bhushan Bharat-R65777
> > > > > > Sent: Wednesday, November 27, 2013 9:39 PM
> > > > > > To: 'Alex Williamson'
> > > > > > Cc: Wood Scott-B07421; linux-...@vger.kernel.org;
> > > > > > ag...@suse.de; Yoder Stuart- B08248;
> > > > > > io...@lists.linux-foundation.org; bhelg...@google.com;
> > > > > > linuxppc- d...@lists.ozlabs.org; linux-kernel@vger.kernel.org
> > > > > > Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for
> > > > > > Freescale IOMMU (PAMU)
> > > > > >
> > > > > > If we just provide the size of MSI bank to userspace then
> > > > > > userspace cannot do anything wrong.
> > > > >
> > > > > So userspace does not know address, so it cannot mmap and cause
> > > > > any
> > > > interference by directly reading/writing.
> > > >
> > > > That's security through obscurity...  Couldn't the malicious user
> > > > find out the address via other means, such as experimentation on
> > > > another system over which they have full control?  What would
> > > > happen if the user reads from their device's PCI config space?  Or
> > > > gets the information via some back door in the PCI device they
> > > > own?  Or pokes throughout the address space looking for something that
> generates an interrupt to its own device?
> > >
> > > So how to solve this problem, Any suggestion ?
> > >
> > > We have to map one window in PAMU for MSIs and a malicious user can
> > > ask its device to do DMA to MSI window region with any pair of
> > > address and data, which can lead to unexpected MSIs in system?
> >
> > I don't think there are any solutions other than to limit each bank to
> > one user, unless the admin turns some knob that says they're OK with
> > the partial loss of isolation.
> 
> Even if the admin does opt-in to an allow_unsafe_interrupts options, it should
> still be reasonably difficult for one guest to interfere with the other.  I
> don't think we want to rely on the blind luck of making the full MSI bank
> accessible to multiple guests and hoping they don't step on each other.

Not sure how to solve in this case (sharing MSI page)

>  That probably means that vfio needs to manage the space rather than the 
> guest.

What you mean by " vfio needs to manage the space rather than the guest"?

Thanks
-Bharat

> Thanks,
> 
> Alex
> 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-09 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Saturday, December 07, 2013 1:00 AM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de; Yoder
 Stuart-B08248; io...@lists.linux-foundation.org; bhelg...@google.com; 
 linuxppc-
 d...@lists.ozlabs.org; linux-kernel@vger.kernel.org
 Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
 
 On Fri, 2013-12-06 at 12:59 -0600, Scott Wood wrote:
  On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote:
  
-Original Message-
From: Wood Scott-B07421
Sent: Friday, December 06, 2013 5:52 AM
To: Bhushan Bharat-R65777
Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de;
Yoder Stuart- B08248; io...@lists.linux-foundation.org;
bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale
IOMMU (PAMU)
   
On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote:

  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Wednesday, November 27, 2013 9:39 PM
  To: 'Alex Williamson'
  Cc: Wood Scott-B07421; linux-...@vger.kernel.org;
  ag...@suse.de; Yoder Stuart- B08248;
  io...@lists.linux-foundation.org; bhelg...@google.com;
  linuxppc- d...@lists.ozlabs.org; linux-kernel@vger.kernel.org
  Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for
  Freescale IOMMU (PAMU)
 
  If we just provide the size of MSI bank to userspace then
  userspace cannot do anything wrong.

 So userspace does not know address, so it cannot mmap and cause
 any
interference by directly reading/writing.
   
That's security through obscurity...  Couldn't the malicious user
find out the address via other means, such as experimentation on
another system over which they have full control?  What would
happen if the user reads from their device's PCI config space?  Or
gets the information via some back door in the PCI device they
own?  Or pokes throughout the address space looking for something that
 generates an interrupt to its own device?
  
   So how to solve this problem, Any suggestion ?
  
   We have to map one window in PAMU for MSIs and a malicious user can
   ask its device to do DMA to MSI window region with any pair of
   address and data, which can lead to unexpected MSIs in system?
 
  I don't think there are any solutions other than to limit each bank to
  one user, unless the admin turns some knob that says they're OK with
  the partial loss of isolation.
 
 Even if the admin does opt-in to an allow_unsafe_interrupts options, it should
 still be reasonably difficult for one guest to interfere with the other.  I
 don't think we want to rely on the blind luck of making the full MSI bank
 accessible to multiple guests and hoping they don't step on each other.

Not sure how to solve in this case (sharing MSI page)

  That probably means that vfio needs to manage the space rather than the 
 guest.

What you mean by  vfio needs to manage the space rather than the guest?

Thanks
-Bharat

 Thanks,
 
 Alex
 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i