Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-05-28 Thread Tomasz Figa
On Thu, May 23, 2019 at 10:46 PM Frederic Chen
 wrote:
>
> Dear Tomasz,
>
> Thank you for your comments.
>
>
> On Wed, 2019-05-22 at 19:25 +0900, Tomasz Figa wrote:
> > Hi Frederic,
> >
> > On Wed, May 22, 2019 at 03:14:15AM +0800, Frederic Chen wrote:
> > > Dear Tomasz,
> > >
> > > I appreciate your comment. It is very helpful for us.
> > >
> >
> > You're welcome. Thanks for replying to all the comments. I'll skip those
> > resolved in my reply to keep the message shorter.
> >
> > >
> > > On Thu, 2019-05-09 at 18:48 +0900, Tomasz Figa wrote:
> > > > Hi Frederic,
> > > >
> > > > On Wed, Apr 17, 2019 at 7:45 PM Frederic Chen 
> > > >  wrote:
[snip]
> > > > Also a general note - a work can be queued only once. This means that
> > > > current code races when two dip_works are attempted to be queued very
> > > > quickly one after another (or even at the same time from different 
> > > > threads).
> > > >
> > > > I can think of two potential options for fixing this:
> > > >
> > > > 1) Loop in the work function until there is nothing to queue to the 
> > > > hardware
> > > >anymore - but this needs tricky synchronization, because there is 
> > > > still
> > > >short time at the end of the work function when a new dip_work could 
> > > > be
> > > >added.
> > > >
> > > > 2) Change this to a kthread that just keeps running in a loop waiting 
> > > > for
> > > >some available dip_work to show up and then sending it to the 
> > > > firmware.
> > > >This should be simpler, as the kthread shouldn't have a chance to 
> > > > miss
> > > >any dip_work queued.
> > > >
> > > > I'm personally in favor of option 2, as it should simplify the
> > > > synchronization.
> > > >
> > >
> > > I would like to re-design this part with a kthread in the next patch.
> >
> > Actually I missed another option. We could have 1 work_struct for 1
> > request and then we could keep using a workqueue. Perhaps that could be
> > simpler than a kthread.
> >
> > Actually, similar approach could be used for the dip_runner_func.
> > Instead of having a kthread looping, we could just have another
> > workqueue and 1 dip_runner_work per 1 request. Then we wouldn't need to
> > do the waiting loop ourselves anymore.
> >
> > Does it make sense?
>
> Yes, it make sense. Let me summarize the modification about the flow.
>
> First, we will have two work_struct in mtk_dip_request.
>
> struct mtk_dip_request {
> struct media_request request;
> //...
> /* Prepare DIP part hardware configurtion */
> struct mtk_dip_hw_submit_work submit_work;
> /* Replace dip_running thread jobs*/
> struct mtk_dip_hw_composing_work composing_work;
> /* Only for composing error handling */
> struct mtk_dip_hw_mdpcb_timeout_work timeout_work;
> };
>
> Second, the overall flow of handling each request is :
>
> 1. mtk_dip_hw_enqueue calls queue_work() to put submit_work into its
>workqueue
> 2. submit_work sends IMG_IPI_FRAME command to SCP to prepare DIP
>hardware configuration
> 3. dip_scp_handler receives the IMG_IPI_FRAME result from SCP
> 4. dip_scp_handler calls queue_work() to put composing_work (instead
>of original dip_running thread jobs) into its workqueue
> 5. composing_work calls dip_mdp_cmdq_send() to finish the mdp part tasks
> 6. dip_mdp_cb_func() trigged by MDP driver calls vb2_buffer_done to
>return the buffer (no workqueue required here)
>

Sounds good to me, but actually then simply making the workqueues
freezable doesn't solve the suspend/resume problem, because the work
functions wouldn't wait for the firmware/hardware completion anymore.
That's also okay, but in this case we need to add some code to suspend
to wait for any pending operations to complete.

Best regards,
Tomasz


Re: [PATCH v5 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs

2019-05-28 Thread Lu Baolu

Hi,

On 5/28/19 7:50 PM, Eric Auger wrote:

Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory
region type, let's report USB and GFX RMRRs as relaxable ones.

We introduce a new device_rmrr_is_relaxable() helper to check
whether the rmrr belongs to the relaxable category.

This allows to have a finer reporting at IOMMU API level of
reserved memory regions. This will be exploitable by VFIO to
define the usable IOVA range and detect potential conflicts
between the guest physical address space and host reserved
regions.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- introduce device_rmrr_is_relaxable and reshuffle the comments
---
  drivers/iommu/intel-iommu.c | 55 +++--
  1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9302351818ab..01c82f848470 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2920,6 +2920,36 @@ static bool device_has_rmrr(struct device *dev)
return false;
  }
  
+/*

+ * device_rmrr_is_relaxable - Test whether the RMRR of this device
+ * is relaxable (ie. is allowed to be not enforced under some conditions)
+ *
+ * @dev: device handle
+ *
+ * We assume that PCI USB devices with RMRRs have them largely
+ * for historical reasons and that the RMRR space is not actively used post
+ * boot.  This exclusion may change if vendors begin to abuse it.
+ *
+ * The same exception is made for graphics devices, with the requirement that
+ * any use of the RMRR regions will be torn down before assigning the device
+ * to a guest.
+ *
+ * Return: true if the RMRR is relaxable
+ */
+static bool device_rmrr_is_relaxable(struct device *dev)
+{
+   struct pci_dev *pdev;
+
+   if (!dev_is_pci(dev))
+   return false;
+
+   pdev = to_pci_dev(dev);
+   if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
+   return true;
+   else
+   return false;
+}


I know this is only code refactoring. But strictly speaking, the rmrr of
any USB host device is ignorable only if quirk_usb_early_handoff() has
been called. There, the control of USB host controller will be handed
over from BIOS to OS and the corresponding SMI are disabled.

This function is registered in drivers/usb/host/pci-quirks.c

DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,
PCI_CLASS_SERIAL_USB, 8, quirk_usb_early_handoff);

and only get compiled if CONFIG_USB_PCI is enabled.

Hence, it's safer to say:

+#ifdef CONFIG_USB_PCI
+   if (IS_USB_DEVICE(pdev))
+   return true;
+#endif /* CONFIG_USB_PCI */

I am okay if we keep this untouched and make this change within a
separated patch.


+
  /*
   * There are a couple cases where we need to restrict the functionality of
   * devices associated with RMRRs.  The first is when evaluating a device for
@@ -2934,25 +2964,16 @@ static bool device_has_rmrr(struct device *dev)
   * We therefore prevent devices associated with an RMRR from participating in
   * the IOMMU API, which eliminates them from device assignment.
   *
- * In both cases we assume that PCI USB devices with RMRRs have them largely
- * for historical reasons and that the RMRR space is not actively used post
- * boot.  This exclusion may change if vendors begin to abuse it.
- *
- * The same exception is made for graphics devices, with the requirement that
- * any use of the RMRR regions will be torn down before assigning the device
- * to a guest.
+ * In both cases, devices which have relaxable RMRRs are not concerned by this
+ * restriction. See device_rmrr_is_relaxable comment.
   */
  static bool device_is_rmrr_locked(struct device *dev)
  {
if (!device_has_rmrr(dev))
return false;
  
-	if (dev_is_pci(dev)) {

-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
-   return false;
-   }
+   if (device_rmrr_is_relaxable(dev))
+   return false;
  
  	return true;

  }
@@ -5494,6 +5515,7 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
  i, i_dev) {
struct iommu_resv_region *resv;
+   enum iommu_resv_type type;
size_t length;
  
  			if (i_dev != device &&

@@ -5501,9 +5523,12 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
continue;
  
  			length = rmrr->end_address - rmrr->base_address + 1;

+
+   type = device_rmrr_is_relaxable(device) ?
+   IOMMU_RESV_DIRECT_RELAXABLE : IOMMU_RESV_DIRECT;
+
resv = iommu_alloc_resv_region(rmrr->base_address,
-  length, prot,
- 

Re: [PATCH v5 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper

2019-05-28 Thread Lu Baolu

Hi,

On 5/28/19 7:50 PM, Eric Auger wrote:

Several call sites are about to check whether a device belongs
to the PCI sub-hierarchy of a candidate PCI-PCI bridge.
Introduce an helper to perform that check.



This looks good to me.

Reviewed-by: Lu Baolu 

Best regards,
Baolu



Signed-off-by: Eric Auger 
---
  drivers/iommu/intel-iommu.c | 37 +
  1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5ec8b5bd308f..879f11c82b05 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev)
return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
  }
  
+/* is_downstream_to_pci_bridge - test if a device belongs to the

+ * PCI sub-hierarchy of a candidate PCI-PCI bridge
+ *
+ * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy
+ * @bridge: the candidate PCI-PCI bridge
+ *
+ * Return: true if @dev belongs to @bridge PCI sub-hierarchy
+ */
+static bool
+is_downstream_to_pci_bridge(struct device *dev, struct device *bridge)
+{
+   struct pci_dev *pdev, *pbridge;
+
+   if (!dev_is_pci(dev) || !dev_is_pci(bridge))
+   return false;
+
+   pdev = to_pci_dev(dev);
+   pbridge = to_pci_dev(bridge);
+
+   if (pbridge->subordinate &&
+   pbridge->subordinate->number <= pdev->bus->number &&
+   pbridge->subordinate->busn_res.end >= pdev->bus->number)
+   return true;
+
+   return false;
+}
+
  static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
*devfn)
  {
struct dmar_drhd_unit *drhd = NULL;
struct intel_iommu *iommu;
struct device *tmp;
-   struct pci_dev *ptmp, *pdev = NULL;
+   struct pci_dev *pdev = NULL;
u16 segment = 0;
int i;
  
@@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf

goto out;
}
  
-			if (!pdev || !dev_is_pci(tmp))

-   continue;
-
-   ptmp = to_pci_dev(tmp);
-   if (ptmp->subordinate &&
-   ptmp->subordinate->number <= pdev->bus->number &&
-   ptmp->subordinate->busn_res.end >= 
pdev->bus->number)
+   if (is_downstream_to_pci_bridge(dev, tmp))
goto got_pdev;
}
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions

2019-05-28 Thread Lu Baolu

Hi,

On 5/28/19 7:50 PM, Eric Auger wrote:

In the case the RMRR device scope is a PCI-PCI bridge, let's check
the device belongs to the PCI sub-hierarchy.



This looks good to me.

Reviewed-by: Lu Baolu 

Best regards,
Baolu



Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")

Signed-off-by: Eric Auger 
---
  drivers/iommu/intel-iommu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 35508687f178..9302351818ab 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5496,7 +5496,8 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
struct iommu_resv_region *resv;
size_t length;
  
-			if (i_dev != device)

+   if (i_dev != device &&
+   !is_downstream_to_pci_bridge(device, i_dev))
continue;
  
  			length = rmrr->end_address - rmrr->base_address + 1;




Re: [PATCH v5 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes

2019-05-28 Thread Lu Baolu

Hi,

On 5/28/19 7:50 PM, Eric Auger wrote:

When reading the vtd specification and especially the
Reserved Memory Region Reporting Structure chapter,
it is not obvious a device scope element cannot be a
PCI-PCI bridge, in which case all downstream ports are
likely to access the reserved memory region. Let's handle
this case in device_has_rmrr.



This looks good to me.

Reviewed-by: Lu Baolu 

Best regards,
Baolu



Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed into 
SI Domain")

Signed-off-by: Eric Auger 

---

v1 -> v2:
- is_downstream_to_pci_bridge helper introduced in a separate patch
---
  drivers/iommu/intel-iommu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 879f11c82b05..35508687f178 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2910,7 +2910,8 @@ static bool device_has_rmrr(struct device *dev)
 */
for_each_active_dev_scope(rmrr->devices,
  rmrr->devices_cnt, i, tmp)
-   if (tmp == dev) {
+   if (tmp == dev ||
+   is_downstream_to_pci_bridge(dev, tmp)) {
rcu_read_unlock();
return true;
}



Re: [PATCH v5 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list

2019-05-28 Thread Lu Baolu

Hi Eric,

On 5/28/19 7:50 PM, Eric Auger wrote:

intel_iommu_get_resv_regions() aims to return the list of
reserved regions accessible by a given @device. However several
devices can access the same reserved memory region and when
building the list it is not safe to use a single iommu_resv_region
object, whose container is the RMRR. This iommu_resv_region must
be duplicated per device reserved region list.

Let's remove the struct iommu_resv_region from the RMRR unit
and allocate the iommu_resv_region directly in
intel_iommu_get_resv_regions(). We hold the dmar_global_lock instead
of the rcu-lock to allow sleeping.

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")
Signed-off-by: Eric Auger 

---

v4 -> v5
- replace rcu-lock by the dmar_global_lock
---
  drivers/iommu/intel-iommu.c | 34 +-
  1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..5ec8b5bd308f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -322,7 +322,6 @@ struct dmar_rmrr_unit {
u64 end_address;/* reserved end address */
struct dmar_dev_scope *devices; /* target devices */
int devices_cnt;/* target device count */
-   struct iommu_resv_region *resv; /* reserved region handle */
  };
  
  struct dmar_atsr_unit {

@@ -4205,7 +4204,6 @@ static inline void init_iommu_pm_ops(void) {}
  int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
  {
struct acpi_dmar_reserved_memory *rmrr;
-   int prot = DMA_PTE_READ|DMA_PTE_WRITE;
struct dmar_rmrr_unit *rmrru;
size_t length;
  
@@ -4219,22 +4217,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)

rmrru->end_address = rmrr->end_address;
  
  	length = rmrr->end_address - rmrr->base_address + 1;

-   rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
- IOMMU_RESV_DIRECT);
-   if (!rmrru->resv)
-   goto free_rmrru;
  
  	rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),

((void *)rmrr) + rmrr->header.length,
>devices_cnt);
if (rmrru->devices_cnt && rmrru->devices == NULL)
-   goto free_all;
+   goto free_rmrru;
  
  	list_add(>list, _rmrr_units);
  
  	return 0;

-free_all:
-   kfree(rmrru->resv);
  free_rmrru:
kfree(rmrru);
  out:
@@ -4452,7 +,6 @@ static void intel_iommu_free_dmars(void)
list_for_each_entry_safe(rmrru, rmrr_n, _rmrr_units, list) {
list_del(>list);
dmar_free_dev_scope(>devices, >devices_cnt);
-   kfree(rmrru->resv);
kfree(rmrru);
}
  
@@ -5470,22 +5461,33 @@ static void intel_iommu_remove_device(struct device *dev)

  static void intel_iommu_get_resv_regions(struct device *device,
 struct list_head *head)
  {
+   int prot = DMA_PTE_READ|DMA_PTE_WRITE;


I know this is moved from above. How about adding spaces around the '|'?


struct iommu_resv_region *reg;
struct dmar_rmrr_unit *rmrr;
struct device *i_dev;
int i;
  
-	rcu_read_lock();

+   down_write(_global_lock);


Just out of curiosity, why not down_read()? We don't change the rmrr
list here, right?


for_each_rmrr_units(rmrr) {
for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
  i, i_dev) {
+   struct iommu_resv_region *resv;
+   size_t length;
+
if (i_dev != device)
continue;
  
-			list_add_tail(>resv->list, head);

+   length = rmrr->end_address - rmrr->base_address + 1;
+   resv = iommu_alloc_resv_region(rmrr->base_address,
+  length, prot,
+  IOMMU_RESV_DIRECT);
+   if (!resv)
+   break;
+
+   list_add_tail(>list, head);
}
}
-   rcu_read_unlock();
+   up_write(_global_lock);
  
  	reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,

  IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
@@ -5500,10 +5502,8 @@ static void intel_iommu_put_resv_regions(struct device 
*dev,
  {
struct iommu_resv_region *entry, *next;
  
-	list_for_each_entry_safe(entry, next, head, list) {

-   if (entry->type == IOMMU_RESV_MSI)
-   kfree(entry);
-   }
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
  }
  
  int intel_iommu_enable_pasid(struct intel_iommu 

Re: [PATCH v3 2/3] vfio: zpci: defining the VFIO headers

2019-05-28 Thread Alex Williamson
On Thu, 23 May 2019 18:24:33 +0200
Cornelia Huck  wrote:

> On Thu, 23 May 2019 14:25:25 +0200
> Pierre Morel  wrote:
> 
> > We define a new device region in vfio.h to be able to
> > get the ZPCI CLP information by reading this region from
> > userland.
> > 
> > We create a new file, vfio_zdev.h to define the structure
> > of the new region we defined in vfio.h
> > 
> > Signed-off-by: Pierre Morel 
> > ---
> >  include/uapi/linux/vfio.h  |  4 
> >  include/uapi/linux/vfio_zdev.h | 34 ++
> >  2 files changed, 38 insertions(+)
> >  create mode 100644 include/uapi/linux/vfio_zdev.h
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 8f10748..56595b8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -306,6 +306,10 @@ struct vfio_region_info_cap_type {
> >  #define VFIO_REGION_TYPE_GFX(1)
> >  #define VFIO_REGION_SUBTYPE_GFX_EDID(1)
> >  
> > +/* IBM Subtypes */
> > +#define VFIO_REGION_TYPE_IBM_ZDEV  (1)
> > +#define VFIO_REGION_SUBTYPE_ZDEV_CLP   (1)  
> 
> I'm afraid that confuses me a bit. You want to add the region to every
> vfio-pci device when we're running under s390, right? So this does not
> depend on the device type of the actual device (which may or may not be
> from IBM), but only on the architecture?

FWIW, I don't really have a strong opinion here but I welcome the
discussion.  It seems fair to me that a PCI vendor type could be used
for either a device with that vendor ID or by the vendor of the
platform.  We've got a lot of address space if want to use
VFIO_REGION_TYPE_IBM_ZDEV rather than a PCI vendor type (so long as
it's updated not to conflict with the GFX type).  Thanks,

Alex
 
> (Generally speaking, I think using regions for this makes sense,
> though.)
> 
> > +
> >  /**
> >   * struct vfio_region_gfx_edid - EDID region layout.
> >   *
> > diff --git a/include/uapi/linux/vfio_zdev.h
> > b/include/uapi/linux/vfio_zdev.h new file mode 100644
> > index 000..84b1a82
> > --- /dev/null
> > +++ b/include/uapi/linux/vfio_zdev.h
> > @@ -0,0 +1,34 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +/*
> > + * Region definition for ZPCI devices
> > + *
> > + * Copyright IBM Corp. 2019
> > + *
> > + * Author(s): Pierre Morel 
> > + */
> > +
> > +#ifndef _VFIO_ZDEV_H_
> > +#define _VFIO_ZDEV_H_
> > +
> > +#include 
> > +
> > +/**
> > + * struct vfio_region_zpci_info - ZPCI information.
> > + *
> > + */
> > +struct vfio_region_zpci_info {
> > +   __u64 dasm;
> > +   __u64 start_dma;
> > +   __u64 end_dma;
> > +   __u64 msi_addr;
> > +   __u64 flags;
> > +   __u16 pchid;
> > +   __u16 mui;
> > +   __u16 noi;
> > +   __u8 gid;
> > +   __u8 version;
> > +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1
> > +   __u8 util_str[CLP_UTIL_STR_LEN];
> > +} __packed;
> > +
> > +#endif  
> 



Re: [PATCH v3 2/3] vfio: zpci: defining the VFIO headers

2019-05-28 Thread Alex Williamson
On Tue, 28 May 2019 15:43:42 -0600
Alex Williamson  wrote:

> On Thu, 23 May 2019 14:25:25 +0200
> Pierre Morel  wrote:
> 
> > We define a new device region in vfio.h to be able to
> > get the ZPCI CLP information by reading this region from
> > userland.
> > 
> > We create a new file, vfio_zdev.h to define the structure
> > of the new region we defined in vfio.h
> > 
> > Signed-off-by: Pierre Morel 
> > ---
> >  include/uapi/linux/vfio.h  |  4 
> >  include/uapi/linux/vfio_zdev.h | 34 ++
> >  2 files changed, 38 insertions(+)
> >  create mode 100644 include/uapi/linux/vfio_zdev.h
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 8f10748..56595b8 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -306,6 +306,10 @@ struct vfio_region_info_cap_type {
> >  #define VFIO_REGION_TYPE_GFX(1)
> >  #define VFIO_REGION_SUBTYPE_GFX_EDID(1)
> >  
> > +/* IBM Subtypes */
> > +#define VFIO_REGION_TYPE_IBM_ZDEV  (1)  
> 
> This one defines (but never uses) a conflicting region type to GFX
> above.
> 
> > +#define VFIO_REGION_SUBTYPE_ZDEV_CLP   (1)  
> 
> If we're using a PCI vendor type, which the next patch indicates we
> are, this is the one you need.  But please also specify it as a PCI
> vendor sub-type with the hex PCI vendor ID, and perhaps group it with
> the Intel vendor sub-types.

BTW, we've already started a set of IBM sub-types:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/vfio.h?id=7f92891778dff62303c070ac81de7b7d80de331a

> > +
> >  /**
> >   * struct vfio_region_gfx_edid - EDID region layout.
> >   *
> > diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> > new file mode 100644
> > index 000..84b1a82
> > --- /dev/null
> > +++ b/include/uapi/linux/vfio_zdev.h
> > @@ -0,0 +1,34 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +/*
> > + * Region definition for ZPCI devices
> > + *
> > + * Copyright IBM Corp. 2019
> > + *
> > + * Author(s): Pierre Morel 
> > + */
> > +
> > +#ifndef _VFIO_ZDEV_H_
> > +#define _VFIO_ZDEV_H_
> > +
> > +#include 
> > +
> > +/**
> > + * struct vfio_region_zpci_info - ZPCI information.
> > + *
> > + */
> > +struct vfio_region_zpci_info {
> > +   __u64 dasm;
> > +   __u64 start_dma;
> > +   __u64 end_dma;
> > +   __u64 msi_addr;
> > +   __u64 flags;
> > +   __u16 pchid;
> > +   __u16 mui;
> > +   __u16 noi;
> > +   __u8 gid;
> > +   __u8 version;
> > +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1
> > +   __u8 util_str[CLP_UTIL_STR_LEN];  
> 
> I don't see where CLP_UTIL_STR_LEN is defined in a uapi consumable
> header.  Should this simply be [] where the string length is implied by
> the remaining region size?  QEMU hard codes it, that doesn't validate
> the vfio uapi.
> 
> > +} __packed;
> > +
> > +#endif  
> 



Re: [PATCH v3 2/3] vfio: zpci: defining the VFIO headers

2019-05-28 Thread Alex Williamson
On Thu, 23 May 2019 14:25:25 +0200
Pierre Morel  wrote:

> We define a new device region in vfio.h to be able to
> get the ZPCI CLP information by reading this region from
> userland.
> 
> We create a new file, vfio_zdev.h to define the structure
> of the new region we defined in vfio.h
> 
> Signed-off-by: Pierre Morel 
> ---
>  include/uapi/linux/vfio.h  |  4 
>  include/uapi/linux/vfio_zdev.h | 34 ++
>  2 files changed, 38 insertions(+)
>  create mode 100644 include/uapi/linux/vfio_zdev.h
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8f10748..56595b8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -306,6 +306,10 @@ struct vfio_region_info_cap_type {
>  #define VFIO_REGION_TYPE_GFX(1)
>  #define VFIO_REGION_SUBTYPE_GFX_EDID(1)
>  
> +/* IBM Subtypes */
> +#define VFIO_REGION_TYPE_IBM_ZDEV(1)

This one defines (but never uses) a conflicting region type to GFX
above.

> +#define VFIO_REGION_SUBTYPE_ZDEV_CLP (1)

If we're using a PCI vendor type, which the next patch indicates we
are, this is the one you need.  But please also specify it as a PCI
vendor sub-type with the hex PCI vendor ID, and perhaps group it with
the Intel vendor sub-types.

> +
>  /**
>   * struct vfio_region_gfx_edid - EDID region layout.
>   *
> diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
> new file mode 100644
> index 000..84b1a82
> --- /dev/null
> +++ b/include/uapi/linux/vfio_zdev.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Region definition for ZPCI devices
> + *
> + * Copyright IBM Corp. 2019
> + *
> + * Author(s): Pierre Morel 
> + */
> +
> +#ifndef _VFIO_ZDEV_H_
> +#define _VFIO_ZDEV_H_
> +
> +#include 
> +
> +/**
> + * struct vfio_region_zpci_info - ZPCI information.
> + *
> + */
> +struct vfio_region_zpci_info {
> + __u64 dasm;
> + __u64 start_dma;
> + __u64 end_dma;
> + __u64 msi_addr;
> + __u64 flags;
> + __u16 pchid;
> + __u16 mui;
> + __u16 noi;
> + __u8 gid;
> + __u8 version;
> +#define VFIO_PCI_ZDEV_FLAGS_REFRESH 1
> + __u8 util_str[CLP_UTIL_STR_LEN];

I don't see where CLP_UTIL_STR_LEN is defined in a uapi consumable
header.  Should this simply be [] where the string length is implied by
the remaining region size?  QEMU hard codes it, that doesn't validate
the vfio uapi.

> +} __packed;
> +
> +#endif



Re: [PATCH v3 3/3] vfio: pci: Using a device region to retrieve zPCI information

2019-05-28 Thread Alex Williamson
On Thu, 23 May 2019 14:25:26 +0200
Pierre Morel  wrote:

> We define a new configuration entry for VFIO/PCI, VFIO_PCI_ZDEV
> 
> When the VFIO_PCI_ZDEV feature is configured we initialize
> a new device region, VFIO_REGION_SUBTYPE_ZDEV_CLP, to hold
> the information from the ZPCI device the userland needs to
> give to a guest driving the zPCI function.
> 
> Signed-off-by: Pierre Morel 
> ---
>  drivers/vfio/pci/Kconfig|  7 
>  drivers/vfio/pci/Makefile   |  1 +
>  drivers/vfio/pci/vfio_pci.c |  9 
>  drivers/vfio/pci/vfio_pci_private.h | 10 +
>  drivers/vfio/pci/vfio_pci_zdev.c| 83 
> +
>  5 files changed, 110 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c
> 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index d0f8e4f..9c1181c 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -44,3 +44,10 @@ config VFIO_PCI_NVLINK2
>   depends on VFIO_PCI && PPC_POWERNV
>   help
> VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
> +
> +config VFIO_PCI_ZDEV
> + tristate "VFIO PCI Generic for ZPCI devices"

Shouldn't this be 'bool'?

> + depends on VFIO_PCI && S390
> + default y
> + help
> +   VFIO PCI support for S390 Z-PCI devices
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 9662c06..fd53819 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -2,5 +2,6 @@
>  vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
>  vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
>  vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
> +vfio-pci-$(CONFIG_VFIO_PCI_ZDEV) += vfio_pci_zdev.o
>  
>  obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 3fa20e9..b6087d6 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -362,6 +362,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
>   }
>   }
>  
> + if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV)) {
> + ret = vfio_pci_zdev_init(vdev);
> + if (ret) {
> + dev_warn(>pdev->dev,
> +  "Failed to setup ZDEV regions\n");
> + goto disable_exit;
> + }
> + }
> +
>   vfio_pci_probe_mmaps(vdev);
>  
>   return 0;
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 1812cf2..db73cdf 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -189,4 +189,14 @@ static inline int vfio_pci_ibm_npu2_init(struct 
> vfio_pci_device *vdev)
>   return -ENODEV;
>  }
>  #endif
> +
> +#ifdef(IS_ENABLED_VFIO_PCI_ZDEV)

I thought this might be some clever new macro, but is it just a typo?
Seems it should just be

#ifdef CONFIG_VFIO_PCI_ZDEV

> +extern int vfio_pci_zdev_init(struct vfio_pci_device *vdev);
> +#else
> +static inline int vfio_pci_zdev_init(struct vfio_pci_device *vdev)
> +{
> + return -ENODEV;
> +}
> +#endif
> +
>  #endif /* VFIO_PCI_PRIVATE_H */
> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c 
> b/drivers/vfio/pci/vfio_pci_zdev.c
> new file mode 100644
> index 000..230a4e4
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * VFIO ZPCI devices support
> + *
> + * Copyright (C) IBM Corp. 2019.  All rights reserved.
> + *   Author: Pierre Morel 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "vfio_pci_private.h"
> +
> +static size_t vfio_pci_zdev_rw(struct vfio_pci_device *vdev,
> +char __user *buf, size_t count, loff_t *ppos,
> +bool iswrite)
> +{
> + struct vfio_region_zpci_info *region;
> + struct zpci_dev *zdev;
> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +
> + if (!vdev->pdev->bus)
> + return -ENODEV;
> +
> + zdev = vdev->pdev->bus->sysdata;
> + if (!zdev)
> + return -ENODEV;
> +
> + if ((*ppos & VFIO_PCI_OFFSET_MASK) || (count != sizeof(*region)))
> + return -EINVAL;

Why?  This sort of restriction would need to be documented in the ABI.

> +
> + region = vdev->region[index - VFIO_PCI_NUM_REGIONS].data;
> + region->dasm = zdev->dma_mask;
> + region->start_dma = zdev->start_dma;
> + region->end_dma = zdev->end_dma;
> + region->msi_addr = zdev->msi_addr;
> + region->flags = VFIO_PCI_ZDEV_FLAGS_REFRESH;
> + region->gid = zdev->pfgid;
> + region->mui = zdev->fmb_update;
> + region->noi = zdev->max_msi;
> +  

[PATCH] dma: replace single-char identifiers in macros

2019-05-28 Thread Qian Cai
There are a few macros in DMA have single-char identifiers make the code
hard to read. Replace them with meaningful names.

Signed-off-by: Qian Cai 
---
 include/linux/dma-mapping.h | 24 
 include/linux/dmar.h| 14 --
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6309a721394b..2f0151dcfa4e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -602,14 +602,22 @@ static inline void 
dma_sync_single_range_for_device(struct device *dev,
return dma_sync_single_for_device(dev, addr + offset, size, dir);
 }
 
-#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
-#define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
-#define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
-#define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
-#define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
-#define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
-#define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
-#define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
+#define dma_map_single(dev, ptr, size, direction)  \
+   dma_map_single_attrs(dev, ptr, size, direction, 0)
+#define dma_unmap_single(dev, addr, size, direction)   \
+   dma_unmap_single_attrs(dev, addr, size, direction, 0)
+#define dma_map_sg(dev, sg, mapped_ents, direction)\
+   dma_map_sg_attrs(dev, sg, mapped_ents, direction, 0)
+#define dma_unmap_sg(dev, sg, mapped_ents, direction)  \
+   dma_unmap_sg_attrs(dev, sg, mapped_ents, direction, 0)
+#define dma_map_page(dev, page, offset, size, direction)   \
+   dma_map_page_attrs(dev, page, offset, size, direction, 0)
+#define dma_unmap_page(dev, addr, size, direction) \
+   dma_unmap_page_attrs(dev, addr, size, direction, 0)
+#define dma_get_sgtable(dev, sgt, cpu_addr, dma_addr, size)\
+   dma_get_sgtable_attrs(dev, sgt, cpu_addr, dma_addr, size, 0)
+#define dma_mmap_coherent(dev, vma, cpu_addr, dma_addr, size)  \
+   dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 0)
 
 extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index f8af1d770520..eb634912f475 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
 
 #definedmar_rcu_dereference(p) rcu_dereference_check((p), 
dmar_rcu_check())
 
-#definefor_each_dev_scope(a, c, p, d)  \
-   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev) : \
-   NULL, (p) < (c)); (p)++)
-
-#definefor_each_active_dev_scope(a, c, p, d)   \
-   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue; } else
+#define for_each_dev_scope(devs, cnt, i, tmp)  \
+   for ((i) = 0; ((tmp) = (i) < (cnt) ?\
+   dmar_rcu_dereference((devs)[(i)].dev) : NULL, (i) < (cnt)); \
+   (i)++)
+
+#define for_each_active_dev_scope(devs, cnt, i, tmp)   \
+   for_each_dev_scope((devs), (cnt), (i), (tmp))   \
+   if (!(tmp)) { continue; } else
 
 extern int dmar_table_init(void);
 extern int dmar_dev_scope_init(void);
-- 
1.8.3.1



[PATCH 3/3] iommu/amd: only free resources once on init error

2019-05-28 Thread Kevin Mitchell
When amd_iommu=off was specified on the command line, free_X_resources
functions were called immediately after early_amd_iommu_init. They were
then called again when amd_iommu_init also failed (as expected).

Instead, call them only once: at the end of state_next() whenever
there's an error. These functions should be safe to call any time and
any number of times. However, since state_next is never called again in
an error state, the cleanup will only ever be run once.

This also ensures that cleanup code is run as soon as possible after an
error is detected rather than waiting for amd_iommu_init() to be called.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5f3df5ae6ba8..24fc060fe596 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2638,8 +2638,6 @@ static int __init state_next(void)
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_ACPI_FINISHED;
if (init_state == IOMMU_ACPI_FINISHED && amd_iommu_disabled) {
pr_info("AMD IOMMU disabled on kernel command-line\n");
-   free_dma_resources();
-   free_iommu_resources();
init_state = IOMMU_CMDLINE_DISABLED;
ret = -EINVAL;
}
@@ -2680,6 +2678,19 @@ static int __init state_next(void)
BUG();
}
 
+   if (ret) {
+   free_dma_resources();
+   if (!irq_remapping_enabled) {
+   disable_iommus();
+   free_iommu_resources();
+   } else {
+   struct amd_iommu *iommu;
+
+   uninit_device_table_dma();
+   for_each_iommu(iommu)
+   iommu_flush_all_caches(iommu);
+   }
+   }
return ret;
 }
 
@@ -2753,18 +2764,6 @@ static int __init amd_iommu_init(void)
int ret;
 
ret = iommu_go_to_state(IOMMU_INITIALIZED);
-   if (ret) {
-   free_dma_resources();
-   if (!irq_remapping_enabled) {
-   disable_iommus();
-   free_iommu_resources();
-   } else {
-   uninit_device_table_dma();
-   for_each_iommu(iommu)
-   iommu_flush_all_caches(iommu);
-   }
-   }
-
 #ifdef CONFIG_GART_IOMMU
if (ret && list_empty(_iommu_list)) {
/*
-- 
2.20.1



[PATCH 0/3] handle init errors more gracefully in amd_iommu

2019-05-28 Thread Kevin Mitchell
This series makes error handling more robust in the amd_iommu init
code. It was initially motivated by problematic firmware that does not
set up the physical address of the iommu. This led to a NULL dereference
panic when iommu_disable was called during cleanup.

While the first patch is sufficient to avoid the panic, the subsequent
two move the cleanup closer to the actual error and avoid calling the
cleanup code twice when amd_iommu=off is specified on the command line.

I have tested this series on a variety of AMD CPUs with firmware
exhibiting the issue. I have additionally tested on platforms where the
firmware has been fixed. I tried both with and without amd_iommu=off. I
have also tested on older CPUs where no IOMMU is detected and even one
where the GART driver ends up running.

Thanks,

Kevin

Kevin Mitchell (3):
  iommu/amd: make iommu_disable safer
  iommu/amd: move gart fallback to amd_iommu_init
  iommu/amd: only free resources once on init error

 drivers/iommu/amd_iommu_init.c | 45 ++
 1 file changed, 24 insertions(+), 21 deletions(-)

-- 
2.20.1



[PATCH 2/3] iommu/amd: move gart fallback to amd_iommu_init

2019-05-28 Thread Kevin Mitchell
The fallback to the GART driver in the case amd_iommu doesn't work was
executed in a function called free_iommu_resources, which didn't really
make sense. This was even being called twice if amd_iommu=off was
specified on the command line.

The only complication is that it needs to be verified that amd_iommu has
fully relinquished control by calling free_iommu_resources and emptying
the amd_iommu_list.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 3798d7303c99..5f3df5ae6ba8 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2345,15 +2345,6 @@ static void __init free_iommu_resources(void)
amd_iommu_dev_table = NULL;
 
free_iommu_all();
-
-#ifdef CONFIG_GART_IOMMU
-   /*
-* We failed to initialize the AMD IOMMU - try fallback to GART
-* if possible.
-*/
-   gart_iommu_init();
-
-#endif
 }
 
 /* SB IOAPIC is always on this device in AMD systems */
@@ -2774,6 +2765,16 @@ static int __init amd_iommu_init(void)
}
}
 
+#ifdef CONFIG_GART_IOMMU
+   if (ret && list_empty(_iommu_list)) {
+   /*
+* We failed to initialize the AMD IOMMU - try fallback
+* to GART if possible.
+*/
+   gart_iommu_init();
+   }
+#endif
+
for_each_iommu(iommu)
amd_iommu_debugfs_setup(iommu);
 
-- 
2.20.1



[PATCH 1/3] iommu/amd: make iommu_disable safer

2019-05-28 Thread Kevin Mitchell
Make it safe to call iommu_disable during early init error conditions
before mmio_base is set, but after the struct amd_iommu has been added
to the amd_iommu_list. For example, this happens if firmware fails to
fill in mmio_phys in the ACPI table leading to a NULL pointer
dereference in iommu_feature_disable.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index f773792d77fd..3798d7303c99 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -424,6 +424,9 @@ static void iommu_enable(struct amd_iommu *iommu)
 
 static void iommu_disable(struct amd_iommu *iommu)
 {
+   if (!iommu->mmio_base)
+   return;
+
/* Disable command buffer */
iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
 
-- 
2.20.1



Re: [PATCH] iommu/amd: print out "tag" in INVALID_PPR_REQUEST

2019-05-28 Thread jroe...@suse.de
On Tue, May 14, 2019 at 10:55:46AM -0400, Qian Cai wrote:
> Jroedel, I am wondering what the plan for 41e59a41fc5d1 (iommu tree) or this
> patch to be pushed to the linux-next or mainline...

Looks like I applied that patch directly to the master branch, which is
not what goes upstream. I cherry-picked it to x86/amd, so it will go
upstream for v5.3.


Regards,

Joerg


Re: [PATCH v7 1/1] iommu: enhance IOMMU dma mode build options

2019-05-28 Thread Leizhen (ThunderTown)



On 2019/5/27 22:21, Joerg Roedel wrote:
> Hi Zhen Lei,
> 
> On Mon, May 20, 2019 at 09:59:47PM +0800, Zhen Lei wrote:
>>  arch/ia64/kernel/pci-dma.c|  2 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
>>  arch/s390/pci/pci_dma.c   |  2 +-
>>  arch/x86/kernel/pci-dma.c |  7 ++---
>>  drivers/iommu/Kconfig | 44 
>> ++-
>>  drivers/iommu/amd_iommu_init.c|  3 ++-
>>  drivers/iommu/intel-iommu.c   |  2 +-
>>  drivers/iommu/iommu.c |  3 ++-
>>  8 files changed, 48 insertions(+), 18 deletions(-)
> 
> This needs Acks from the arch maintainers of ia64, powerpc, s390 and
> x86, at least.
> 
> It is easier for them if you split it up into the Kconfig change and
> separete patches per arch and per iommu driver. Then collect the Acks on
> the individual patches.

OK, thanks. I will do it tomorrow.

> 
> Thanks,
> 
>   Joerg
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list

2019-05-28 Thread Auger Eric
Hi Joerg,

On 5/27/19 5:23 PM, Joerg Roedel wrote:
> On Mon, May 27, 2019 at 10:55:36AM +0200, Eric Auger wrote:
>> -list_add_tail(>resv->list, head);
>> +length = rmrr->end_address - rmrr->base_address + 1;
>> +resv = iommu_alloc_resv_region(rmrr->base_address,
>> +   length, prot,
>> +   IOMMU_RESV_DIRECT,
>> +   GFP_ATOMIC);
>> +if (!resv)
>> +break;
>> +
>> +list_add_tail(>list, head);
> 
> Okay, so this happens in a rcu_read_locked section and must be atomic,
> but I don't like this extra parameter to iommu_alloc_resv_region().
> 
> How about replacing the rcu-lock with the dmar_global_lock, which
> protects against changes to the global rmrr list? This will make this
> loop preemptible and taking the global lock is okay because this
> function is in no way performance relevant.

After studying in more details the for_each_active_dev_scope macro and
rcu_dereference_check it looks OK to me. I respinned accordingly.

Thanks

Eric
> 
> Regards,
> 
>   Joerg
> 


[PATCH v5 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes

2019-05-28 Thread Eric Auger
When reading the vtd specification and especially the
Reserved Memory Region Reporting Structure chapter,
it is not obvious a device scope element cannot be a
PCI-PCI bridge, in which case all downstream ports are
likely to access the reserved memory region. Let's handle
this case in device_has_rmrr.

Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed 
into SI Domain")

Signed-off-by: Eric Auger 

---

v1 -> v2:
- is_downstream_to_pci_bridge helper introduced in a separate patch
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 879f11c82b05..35508687f178 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2910,7 +2910,8 @@ static bool device_has_rmrr(struct device *dev)
 */
for_each_active_dev_scope(rmrr->devices,
  rmrr->devices_cnt, i, tmp)
-   if (tmp == dev) {
+   if (tmp == dev ||
+   is_downstream_to_pci_bridge(dev, tmp)) {
rcu_read_unlock();
return true;
}
-- 
2.20.1



[PATCH v5 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs

2019-05-28 Thread Eric Auger
Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory
region type, let's report USB and GFX RMRRs as relaxable ones.

We introduce a new device_rmrr_is_relaxable() helper to check
whether the rmrr belongs to the relaxable category.

This allows to have a finer reporting at IOMMU API level of
reserved memory regions. This will be exploitable by VFIO to
define the usable IOVA range and detect potential conflicts
between the guest physical address space and host reserved
regions.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- introduce device_rmrr_is_relaxable and reshuffle the comments
---
 drivers/iommu/intel-iommu.c | 55 +++--
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9302351818ab..01c82f848470 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2920,6 +2920,36 @@ static bool device_has_rmrr(struct device *dev)
return false;
 }
 
+/*
+ * device_rmrr_is_relaxable - Test whether the RMRR of this device
+ * is relaxable (ie. is allowed to be not enforced under some conditions)
+ *
+ * @dev: device handle
+ *
+ * We assume that PCI USB devices with RMRRs have them largely
+ * for historical reasons and that the RMRR space is not actively used post
+ * boot.  This exclusion may change if vendors begin to abuse it.
+ *
+ * The same exception is made for graphics devices, with the requirement that
+ * any use of the RMRR regions will be torn down before assigning the device
+ * to a guest.
+ *
+ * Return: true if the RMRR is relaxable
+ */
+static bool device_rmrr_is_relaxable(struct device *dev)
+{
+   struct pci_dev *pdev;
+
+   if (!dev_is_pci(dev))
+   return false;
+
+   pdev = to_pci_dev(dev);
+   if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
+   return true;
+   else
+   return false;
+}
+
 /*
  * There are a couple cases where we need to restrict the functionality of
  * devices associated with RMRRs.  The first is when evaluating a device for
@@ -2934,25 +2964,16 @@ static bool device_has_rmrr(struct device *dev)
  * We therefore prevent devices associated with an RMRR from participating in
  * the IOMMU API, which eliminates them from device assignment.
  *
- * In both cases we assume that PCI USB devices with RMRRs have them largely
- * for historical reasons and that the RMRR space is not actively used post
- * boot.  This exclusion may change if vendors begin to abuse it.
- *
- * The same exception is made for graphics devices, with the requirement that
- * any use of the RMRR regions will be torn down before assigning the device
- * to a guest.
+ * In both cases, devices which have relaxable RMRRs are not concerned by this
+ * restriction. See device_rmrr_is_relaxable comment.
  */
 static bool device_is_rmrr_locked(struct device *dev)
 {
if (!device_has_rmrr(dev))
return false;
 
-   if (dev_is_pci(dev)) {
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
-   return false;
-   }
+   if (device_rmrr_is_relaxable(dev))
+   return false;
 
return true;
 }
@@ -5494,6 +5515,7 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
  i, i_dev) {
struct iommu_resv_region *resv;
+   enum iommu_resv_type type;
size_t length;
 
if (i_dev != device &&
@@ -5501,9 +5523,12 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
continue;
 
length = rmrr->end_address - rmrr->base_address + 1;
+
+   type = device_rmrr_is_relaxable(device) ?
+   IOMMU_RESV_DIRECT_RELAXABLE : IOMMU_RESV_DIRECT;
+
resv = iommu_alloc_resv_region(rmrr->base_address,
-  length, prot,
-  IOMMU_RESV_DIRECT);
+  length, prot, type);
if (!resv)
break;
 
-- 
2.20.1



[PATCH v5 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper

2019-05-28 Thread Eric Auger
Several call sites are about to check whether a device belongs
to the PCI sub-hierarchy of a candidate PCI-PCI bridge.
Introduce an helper to perform that check.

Signed-off-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 37 +
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5ec8b5bd308f..879f11c82b05 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev)
return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
 }
 
+/* is_downstream_to_pci_bridge - test if a device belongs to the
+ * PCI sub-hierarchy of a candidate PCI-PCI bridge
+ *
+ * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy
+ * @bridge: the candidate PCI-PCI bridge
+ *
+ * Return: true if @dev belongs to @bridge PCI sub-hierarchy
+ */
+static bool
+is_downstream_to_pci_bridge(struct device *dev, struct device *bridge)
+{
+   struct pci_dev *pdev, *pbridge;
+
+   if (!dev_is_pci(dev) || !dev_is_pci(bridge))
+   return false;
+
+   pdev = to_pci_dev(dev);
+   pbridge = to_pci_dev(bridge);
+
+   if (pbridge->subordinate &&
+   pbridge->subordinate->number <= pdev->bus->number &&
+   pbridge->subordinate->busn_res.end >= pdev->bus->number)
+   return true;
+
+   return false;
+}
+
 static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
*devfn)
 {
struct dmar_drhd_unit *drhd = NULL;
struct intel_iommu *iommu;
struct device *tmp;
-   struct pci_dev *ptmp, *pdev = NULL;
+   struct pci_dev *pdev = NULL;
u16 segment = 0;
int i;
 
@@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
goto out;
}
 
-   if (!pdev || !dev_is_pci(tmp))
-   continue;
-
-   ptmp = to_pci_dev(tmp);
-   if (ptmp->subordinate &&
-   ptmp->subordinate->number <= pdev->bus->number &&
-   ptmp->subordinate->busn_res.end >= 
pdev->bus->number)
+   if (is_downstream_to_pci_bridge(dev, tmp))
goto got_pdev;
}
 
-- 
2.20.1



[PATCH v5 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions

2019-05-28 Thread Eric Auger
Introduce a new type for reserved region. This corresponds
to directly mapped regions which are known to be relaxable
in some specific conditions, such as device assignment use
case. Well known examples are those used by USB controllers
providing PS/2 keyboard emulation for pre-boot BIOS and
early BOOT or RMRRs associated to IGD working in legacy mode.

Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
RMRR on graphics devices too"), those regions are currently
considered "safe" with respect to device assignment use case
which requires a non direct mapping at IOMMU physical level
(RAM GPA -> HPA mapping).

Those RMRRs currently exist and sometimes the device is
attempting to access it but this has not been considered
an issue until now.

However at the moment, iommu_get_group_resv_regions() is
not able to make any difference between directly mapped
regions: those which must be absolutely enforced and those
like above ones which are known as relaxable.

This is a blocker for reporting severe conflicts between
non relaxable RMRRs (like MSI doorbells) and guest GPA space.

With this new reserved region type we will be able to use
iommu_get_group_resv_regions() to enumerate the IOVA space
that is usable through the IOMMU API without introducing
regressions with respect to existing device assignment
use cases (USB and IGD).

Signed-off-by: Eric Auger 

---

v3 -> v4:
- expose the relaxable regions as "direct-relaxable" in the sysfs
- update ABI documentation

v2 -> v3:
- fix direct type check
---
 Documentation/ABI/testing/sysfs-kernel-iommu_groups |  9 +
 drivers/iommu/iommu.c   | 12 +++-
 include/linux/iommu.h   |  6 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index 35c64e00b35c..017f5bc3920c 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -24,3 +24,12 @@ Description:/sys/kernel/iommu_groups/reserved_regions 
list IOVA
region is described on a single line: the 1st field is
the base IOVA, the second is the end IOVA and the third
field describes the type of the region.
+
+What:  /sys/kernel/iommu_groups/reserved_regions
+Date:  June 2019
+KernelVersion:  v5.3
+Contact:   Eric Auger 
+Description:In case an RMRR is used only by graphics or USB devices
+   it is now exposed as "direct-relaxable" instead of "direct".
+   In device assignment use case, for instance, those RMRR
+   are considered to be relaxable and safe.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f961f71e4ff8..276eae9822f2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -73,10 +73,11 @@ struct iommu_group_attribute {
 };
 
 static const char * const iommu_group_resv_type_string[] = {
-   [IOMMU_RESV_DIRECT] = "direct",
-   [IOMMU_RESV_RESERVED]   = "reserved",
-   [IOMMU_RESV_MSI]= "msi",
-   [IOMMU_RESV_SW_MSI] = "msi",
+   [IOMMU_RESV_DIRECT] = "direct",
+   [IOMMU_RESV_DIRECT_RELAXABLE]   = "direct-relaxable",
+   [IOMMU_RESV_RESERVED]   = "reserved",
+   [IOMMU_RESV_MSI]= "msi",
+   [IOMMU_RESV_SW_MSI] = "msi",
 };
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
@@ -575,7 +576,8 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
start = ALIGN(entry->start, pg_size);
end   = ALIGN(entry->start + entry->length, pg_size);
 
-   if (entry->type != IOMMU_RESV_DIRECT)
+   if (entry->type != IOMMU_RESV_DIRECT &&
+   entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
continue;
 
for (addr = start; addr < end; addr += pg_size) {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6f6f47..d7d1c8de9bbc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -135,6 +135,12 @@ enum iommu_attr {
 enum iommu_resv_type {
/* Memory regions which must be mapped 1:1 at all times */
IOMMU_RESV_DIRECT,
+   /*
+* Memory regions which are advertised to be 1:1 but are
+* commonly considered relaxable in some conditions,
+* for instance in device assignment use case (USB, Graphics)
+*/
+   IOMMU_RESV_DIRECT_RELAXABLE,
/* Arbitrary "never map this or give it to a device" address ranges */
IOMMU_RESV_RESERVED,
/* Hardware MSI region (untranslated) */
-- 
2.20.1



[PATCH v5 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions

2019-05-28 Thread Eric Auger
In the case the RMRR device scope is a PCI-PCI bridge, let's check
the device belongs to the PCI sub-hierarchy.

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")

Signed-off-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 35508687f178..9302351818ab 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5496,7 +5496,8 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
struct iommu_resv_region *resv;
size_t length;
 
-   if (i_dev != device)
+   if (i_dev != device &&
+   !is_downstream_to_pci_bridge(device, i_dev))
continue;
 
length = rmrr->end_address - rmrr->base_address + 1;
-- 
2.20.1



[PATCH v5 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list

2019-05-28 Thread Eric Auger
intel_iommu_get_resv_regions() aims to return the list of
reserved regions accessible by a given @device. However several
devices can access the same reserved memory region and when
building the list it is not safe to use a single iommu_resv_region
object, whose container is the RMRR. This iommu_resv_region must
be duplicated per device reserved region list.

Let's remove the struct iommu_resv_region from the RMRR unit
and allocate the iommu_resv_region directly in
intel_iommu_get_resv_regions(). We hold the dmar_global_lock instead
of the rcu-lock to allow sleeping.

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")
Signed-off-by: Eric Auger 

---

v4 -> v5
- replace rcu-lock by the dmar_global_lock
---
 drivers/iommu/intel-iommu.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..5ec8b5bd308f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -322,7 +322,6 @@ struct dmar_rmrr_unit {
u64 end_address;/* reserved end address */
struct dmar_dev_scope *devices; /* target devices */
int devices_cnt;/* target device count */
-   struct iommu_resv_region *resv; /* reserved region handle */
 };
 
 struct dmar_atsr_unit {
@@ -4205,7 +4204,6 @@ static inline void init_iommu_pm_ops(void) {}
 int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 {
struct acpi_dmar_reserved_memory *rmrr;
-   int prot = DMA_PTE_READ|DMA_PTE_WRITE;
struct dmar_rmrr_unit *rmrru;
size_t length;
 
@@ -4219,22 +4217,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header 
*header, void *arg)
rmrru->end_address = rmrr->end_address;
 
length = rmrr->end_address - rmrr->base_address + 1;
-   rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
- IOMMU_RESV_DIRECT);
-   if (!rmrru->resv)
-   goto free_rmrru;
 
rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),
((void *)rmrr) + rmrr->header.length,
>devices_cnt);
if (rmrru->devices_cnt && rmrru->devices == NULL)
-   goto free_all;
+   goto free_rmrru;
 
list_add(>list, _rmrr_units);
 
return 0;
-free_all:
-   kfree(rmrru->resv);
 free_rmrru:
kfree(rmrru);
 out:
@@ -4452,7 +,6 @@ static void intel_iommu_free_dmars(void)
list_for_each_entry_safe(rmrru, rmrr_n, _rmrr_units, list) {
list_del(>list);
dmar_free_dev_scope(>devices, >devices_cnt);
-   kfree(rmrru->resv);
kfree(rmrru);
}
 
@@ -5470,22 +5461,33 @@ static void intel_iommu_remove_device(struct device 
*dev)
 static void intel_iommu_get_resv_regions(struct device *device,
 struct list_head *head)
 {
+   int prot = DMA_PTE_READ|DMA_PTE_WRITE;
struct iommu_resv_region *reg;
struct dmar_rmrr_unit *rmrr;
struct device *i_dev;
int i;
 
-   rcu_read_lock();
+   down_write(_global_lock);
for_each_rmrr_units(rmrr) {
for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
  i, i_dev) {
+   struct iommu_resv_region *resv;
+   size_t length;
+
if (i_dev != device)
continue;
 
-   list_add_tail(>resv->list, head);
+   length = rmrr->end_address - rmrr->base_address + 1;
+   resv = iommu_alloc_resv_region(rmrr->base_address,
+  length, prot,
+  IOMMU_RESV_DIRECT);
+   if (!resv)
+   break;
+
+   list_add_tail(>list, head);
}
}
-   rcu_read_unlock();
+   up_write(_global_lock);
 
reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
  IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
@@ -5500,10 +5502,8 @@ static void intel_iommu_put_resv_regions(struct device 
*dev,
 {
struct iommu_resv_region *entry, *next;
 
-   list_for_each_entry_safe(entry, next, head, list) {
-   if (entry->type == IOMMU_RESV_MSI)
-   kfree(entry);
-   }
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
 }
 
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev)
-- 
2.20.1



[PATCH v5 1/7] iommu: Fix a leak in iommu_insert_resv_region

2019-05-28 Thread Eric Auger
In case we expand an existing region, we unlink
this latter and insert the larger one. In
that case we should free the original region after
the insertion. Also we can immediately return.

Fixes: 6c65fb318e8b ("iommu: iommu_get_group_resv_regions")

Signed-off-by: Eric Auger 
---
 drivers/iommu/iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee6623f9b2..f961f71e4ff8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -237,18 +237,21 @@ static int iommu_insert_resv_region(struct 
iommu_resv_region *new,
pos = pos->next;
} else if ((start >= a) && (end <= b)) {
if (new->type == type)
-   goto done;
+   return 0;
else
pos = pos->next;
} else {
if (new->type == type) {
phys_addr_t new_start = min(a, start);
phys_addr_t new_end = max(b, end);
+   int ret;
 
list_del(>list);
entry->start = new_start;
entry->length = new_end - new_start + 1;
-   iommu_insert_resv_region(entry, regions);
+   ret = iommu_insert_resv_region(entry, regions);
+   kfree(entry);
+   return ret;
} else {
pos = pos->next;
}
@@ -261,7 +264,6 @@ static int iommu_insert_resv_region(struct 
iommu_resv_region *new,
return -ENOMEM;
 
list_add_tail(>list, pos);
-done:
return 0;
 }
 
-- 
2.20.1



[PATCH v5 0/7] RMRR related fixes and enhancements

2019-05-28 Thread Eric Auger
Currently the Intel reserved region is attached to the
RMRR unit and when building the list of RMRR seen by a device
we link this unique reserved region without taking care of
potential multiple usage of this reserved region by several devices.

Also while reading the vtd spec it is unclear to me whether
the RMRR device scope referenced by an RMRR ACPI struct could
be a PCI-PCI bridge, in which case I think we also need to
check the device belongs to the PCI sub-hierarchy of the device
referenced in the scope. This would be true for device_has_rmrr()
and intel_iommu_get_resv_regions().

Last, the VFIO subsystem would need to compute the usable IOVA range
by querying the iommu_get_group_resv_regions() API. This would allow,
for instance, to report potential conflicts between the guest physical
address space and host reserved regions.

However iommu_get_group_resv_regions() currently fails to differentiate
RMRRs that are known safe for device assignment and RMRRs that must be
enforced. So we introduce a new reserved memory region type (relaxable),
reported when associated to an USB or GFX device. The last 2 patches aim
at unblocking [1] which is stuck since 4.18.

[1-5] are fixes
[6-7] are enhancements

The two parts can be considered separately if needed.

References:
[1] [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
https://patchwork.kernel.org/patch/10425309/

Branch: This series is available at:
https://github.com/eauger/linux/tree/v5.2-rc2-rmrr-v5

History:

v4 -> v5:
- remove iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
- use dmar_global_lock instead of rcu-lock in intel_iommu_get_resv_regions

v3 -> v4:
- added "iommu: Fix a leak in iommu_insert_resv_region"
- introduced device_rmrr_is_relaxable and fixed to_pci_dev call
  without checking dev_is_pci
- Despite Robin suggested to hide direct relaxable behind direct
  ones, I think this would lead to a very complex implementation
  of iommu_insert_resv_region while in general the relaxable
  regions are going to be ignored by the caller. By the way I
  found a leak in this function, hence the new first patch

v2 -> v3:
s/||/&& in iommu_group_create_direct_mappings

v1 -> v2:
- introduce is_downstream_to_pci_bridge() in a separate patch, change param
  names and add kerneldoc comment
- add 6,7


Eric Auger (7):
  iommu: Fix a leak in iommu_insert_resv_region
  iommu/vt-d: Duplicate iommu_resv_region objects per device list
  iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
  iommu/vt-d: Handle RMRR with PCI bridge device scopes
  iommu/vt-d: Handle PCI bridge RMRR device scopes in
intel_iommu_get_resv_regions
  iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  iommu/vt-d: Differentiate relaxable and non relaxable RMRRs

 .../ABI/testing/sysfs-kernel-iommu_groups |   9 ++
 drivers/iommu/intel-iommu.c   | 128 --
 drivers/iommu/iommu.c |  20 +--
 include/linux/iommu.h |   6 +
 4 files changed, 115 insertions(+), 48 deletions(-)

-- 
2.20.1



Re: [PATCH v7 0/7] Add virtio-iommu driver

2019-05-28 Thread Jean-Philippe Brucker
On 27/05/2019 16:15, Michael S. Tsirkin wrote:
> On Mon, May 27, 2019 at 11:26:04AM +0200, Joerg Roedel wrote:
>> On Sun, May 12, 2019 at 12:31:59PM -0400, Michael S. Tsirkin wrote:
>>> OK this has been in next for a while.
>>>
>>> Last time IOMMU maintainers objected. Are objections
>>> still in force?
>>>
>>> If not could we get acks please?
>>
>> No objections against the code, I only hesitated because the Spec was
>> not yet official.
>>
>> So for the code:
>>
>>  Acked-by: Joerg Roedel 
> 
> Last spec patch had a bunch of comments not yet addressed.
> But I do not remember whether comments are just about wording
> or about the host/guest interface as well.
> Jean-Philippe could you remind me please?

It's mostly wording, but there is a small change in the config space
layout and two new feature bits. I'll send a new version of the driver
when possible.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH, resend] ARM: dma-mapping: don't use the atomic pool for DMA_ATTR_NO_KERNEL_MAPPING

2019-05-28 Thread Christoph Hellwig
DMA allocations with the DMA_ATTR_NO_KERNEL_MAPPING do not return a kernel
virtual address for use in driver, but are expected to be used entirely
for userspace mappings and/or device private memory.

Because of that we don't need to remap them as uncached, and thus don't need
the atomic pool for non-blocking allocations.  Note that using the
DMA allocator with DMA_ATTR_NO_KERNEL_MAPPING from non-blocking context
on a non-coherent device is actually broken without this patch as well, as
we feed the address passes to dma_free_attrs directly to the genpool
allocator, but for DMA_ATTR_NO_KERNEL_MAPPING allocations it actually
contains the address of the first page pointer.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0a75058c11f3..30e891f54d36 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -724,6 +724,7 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
 gfp_t gfp, pgprot_t prot, bool is_coherent,
 unsigned long attrs, const void *caller)
 {
+   bool want_vaddr = !(attrs & DMA_ATTR_NO_KERNEL_MAPPING);
u64 mask = get_coherent_dma_mask(dev);
struct page *page = NULL;
void *addr;
@@ -735,7 +736,7 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
.gfp = gfp,
.prot = prot,
.caller = caller,
-   .want_vaddr = ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0),
+   .want_vaddr = want_vaddr,
.coherent_flag = is_coherent ? COHERENT : NORMAL,
};
 
@@ -773,14 +774,14 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
allowblock = gfpflags_allow_blocking(gfp);
cma = allowblock ? dev_get_cma_area(dev) : false;
 
-   if (cma)
+   if (!allowblock && !is_coherent && want_vaddr)
+   buf->allocator = _allocator;
+   else if (cma)
buf->allocator = _allocator;
else if (is_coherent)
buf->allocator = _allocator;
-   else if (allowblock)
-   buf->allocator = _allocator;
else
-   buf->allocator = _allocator;
+   buf->allocator = _allocator;
 
addr = buf->allocator->alloc(, );
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/2] Optimize dma_*_from_contiguous calls

2019-05-28 Thread Christoph Hellwig
Thanks,

applied to dma-mapping for-next.

Can you also send a conversion of drivers/iommu/dma-iommu.c to your
new helpers against this tree?

http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next