Re: [RFC PATCH v3 1/2] iommu: Add capability IOMMU_CAP_VIOMMU

2021-01-14 Thread Leon Romanovsky
On Fri, Jan 15, 2021 at 07:49:47AM +0800, Lu Baolu wrote:
> Hi Leon,
>
> On 1/14/21 9:26 PM, Leon Romanovsky wrote:
> > On Thu, Jan 14, 2021 at 09:30:02AM +0800, Lu Baolu wrote:
> > > Some vendor IOMMU drivers are able to declare that it is running in a VM
> > > context. This is very valuable for the features that only want to be
> > > supported on bare metal. Add a capability bit so that it could be used.
> >
> > And how is it used? Who and how will set it?
>
> Use the existing iommu_capable(). I should add more descriptions about
> who and how to use it.

I want to see the code that sets this capability.

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 06/33] of/device: Move dma_range_map before of_iommu_configure

2021-01-14 Thread Yong Wu
On Thu, 2021-01-14 at 13:27 -0600, Rob Herring wrote:
> On Mon, Jan 11, 2021 at 07:18:47PM +0800, Yong Wu wrote:
> > "dev->dma_range_map" contains the devices' dma_ranges information,
> > This patch moves dma_range_map before of_iommu_configure. The iommu
> > driver may need to know the dma_address requirements of its iommu
> > consumer devices.
> > 
> > CC: Rob Herring 
> > CC: Frank Rowand 
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/of/device.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index aedfaaafd3e7..1d84636149df 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -170,9 +170,11 @@ int of_dma_configure_id(struct device *dev, struct 
> > device_node *np,
> > dev_dbg(dev, "device is%sdma coherent\n",
> > coherent ? " " : " not ");
> >  
> > +   dev->dma_range_map = map;
> > iommu = of_iommu_configure(dev, np, id);
> > if (PTR_ERR(iommu) == -EPROBE_DEFER) {
> > kfree(map);
> > +   dev->dma_range_map = NULL;
> 
> Not really going to matter, but you should probably clear dma_range_map 
> before what it points to is freed.
> 
> With that,
> 
> Reviewed-by: Rob Herring 

Thanks for the review. I will move it before "kfree(map)" in next
version.

> 
> > return -EPROBE_DEFER;
> > }
> >  
> > @@ -181,7 +183,6 @@ int of_dma_configure_id(struct device *dev, struct 
> > device_node *np,
> >  
> > arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
> >  
> > -   dev->dma_range_map = map;
> > return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(of_dma_configure_id);
> > -- 
> > 2.18.0
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 1/2] iommu: Add capability IOMMU_CAP_VIOMMU

2021-01-14 Thread Jason Wang


On 2021/1/14 上午9:30, Lu Baolu wrote:

Some vendor IOMMU drivers are able to declare that it is running in a VM
context. This is very valuable for the features that only want to be
supported on bare metal. Add a capability bit so that it could be used.

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/iommu.c  | 20 
  drivers/iommu/virtio-iommu.c |  9 +
  include/linux/iommu.h|  1 +
  3 files changed, 30 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cb205a04fe4c..8eb022d0e8aa 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5738,12 +5738,32 @@ static inline bool nested_mode_support(void)
return ret;
  }
  
+static inline bool caching_mode_enabled(void)

+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   bool ret = false;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (cap_caching_mode(iommu->cap)) {
+   ret = true;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return ret;
+}
+
  static bool intel_iommu_capable(enum iommu_cap cap)
  {
if (cap == IOMMU_CAP_CACHE_COHERENCY)
return domain_update_iommu_snooping(NULL) == 1;
if (cap == IOMMU_CAP_INTR_REMAP)
return irq_remapping_enabled == 1;
+   if (cap == IOMMU_CAP_VIOMMU)
+   return caching_mode_enabled();



This part I don't understand. Does it mean Intel IOMMU can't be used in 
VM without caching mode?


Thanks


  
  	return false;

  }
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 2bfdd5734844..719793e103db 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -931,7 +931,16 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
  }
  
+static bool viommu_capable(enum iommu_cap cap)

+{
+   if (cap == IOMMU_CAP_VIOMMU)
+   return true;
+
+   return false;
+}
+
  static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.domain_free= viommu_domain_free,
.attach_dev = viommu_attach_dev,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b95a6f8db6ff..1d24be667a03 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,7 @@ enum iommu_cap {
   transactions */
IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
+   IOMMU_CAP_VIOMMU,   /* IOMMU can declar running in a VM */
  };
  
  /*


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

2021-01-14 Thread Claire Chang
On Fri, Jan 15, 2021 at 2:52 AM Florian Fainelli  wrote:
>
> On 1/14/21 1:08 AM, Claire Chang wrote:
> > On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli  
> > wrote:
> >>
> >> On 1/5/21 7:41 PM, Claire Chang wrote:
> >>> If a device is not behind an IOMMU, we look up the device node and set
> >>> up the restricted DMA when the restricted-dma-pool is presented.
> >>>
> >>> Signed-off-by: Claire Chang 
> >>> ---
> >>
> >> [snip]
> >>
> >>> +int of_dma_set_restricted_buffer(struct device *dev)
> >>> +{
> >>> + struct device_node *node;
> >>> + int count, i;
> >>> +
> >>> + if (!dev->of_node)
> >>> + return 0;
> >>> +
> >>> + count = of_property_count_elems_of_size(dev->of_node, 
> >>> "memory-region",
> >>> + sizeof(phandle));
> >>
> >> You could have an early check for count < 0, along with an error
> >> message, if that is deemed useful.
> >>
> >>> + for (i = 0; i < count; i++) {
> >>> + node = of_parse_phandle(dev->of_node, "memory-region", i);
> >>> + if (of_device_is_compatible(node, "restricted-dma-pool"))
> >>
> >> And you may want to add here an of_device_is_available(node). A platform
> >> that provides the Device Tree firmware and try to support multiple
> >> different SoCs may try to determine if an IOMMU is present, and if it
> >> is, it could be marking the restriced-dma-pool region with a 'status =
> >> "disabled"' property, or any variant of that scheme.
> >
> > This function is called only when there is no IOMMU present (check in
> > drivers/of/device.c). I can still add of_device_is_available(node)
> > here if you think it's helpful.
>
> I believe it is, since boot loader can have a shared Device Tree blob
> skeleton and do various adaptations based on the chip (that's what we
> do) and adding a status property is much simpler than insertion new
> nodes are run time.
>
> >
> >>
> >>> + return of_reserved_mem_device_init_by_idx(
> >>> + dev, dev->of_node, i);
> >>
> >> This does not seem to be supporting more than one memory region, did not
> >> you want something like instead:
> >>
> >> ret = of_reserved_mem_device_init_by_idx(...);
> >> if (ret)
> >> return ret;
> >>
> >
> > Yes. This implement only supports one restriced-dma-pool memory region
> > with the assumption that there is only one memory region with the
> > compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
> > to shared-dma-pool.
>
> Then if here is such a known limitation it should be both documented and
> enforced here, you shouldn ot be iterating over all of the phandles that
> you find, stop at the first one and issue a warning if count > 1?

What I have in mind is there might be multiple memory regions, but
only one is for restriced-dma-pool.
Say, if you want a separated region for coherent DMA and only do
streaming DMA in this restriced-dma-pool region, you can add another
reserved-memory node with shared-dma-pool in dts and the current
implementation will try to allocate the memory via
dma_alloc_from_dev_coherent() first (see dma_alloc_attrs() in
/kernel/dma/mapping.c).
Or if you have vendor specific memory region, you can still set up
restriced-dma-pool by adding another reserved-memory node in dts.
Dose this make sense to you? I'll document this for sure.

> --
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/vt-d: Preset Access/Dirty bits for IOVA over FL

2021-01-14 Thread Lu Baolu
The Access/Dirty bits in the first level page table entry will be set
whenever a page table entry was used for address translation or write
permission was successfully translated. This is always true when using
the first-level page table for kernel IOVA. Instead of wasting hardware
cycles to update the certain bits, it's better to set them up at the
beginning.

Suggested-by: Ashok Raj 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 14 --
 include/linux/intel-iommu.h |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0ea2e1440a9b..54b8d1bf2009 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1018,8 +1018,11 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain 
*domain,
 
domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE);
pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << 
VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
-   if (domain_use_first_level(domain))
+   if (domain_use_first_level(domain)) {
pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US;
+   if (domain->domain.type == IOMMU_DOMAIN_DMA)
+   pteval |= DMA_FL_PTE_ACCESS;
+   }
if (cmpxchg64(>val, 0ULL, pteval))
/* Someone else set it while we were thinking; 
use theirs. */
free_pgtable_page(tmp_page);
@@ -2293,9 +2296,16 @@ __domain_mapping(struct dmar_domain *domain, unsigned 
long iov_pfn,
return -EINVAL;
 
attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP);
-   if (domain_use_first_level(domain))
+   if (domain_use_first_level(domain)) {
attr |= DMA_FL_PTE_PRESENT | DMA_FL_PTE_XD | DMA_FL_PTE_US;
 
+   if (domain->domain.type == IOMMU_DOMAIN_DMA) {
+   attr |= DMA_FL_PTE_ACCESS;
+   if (prot & DMA_PTE_WRITE)
+   attr |= DMA_FL_PTE_DIRTY;
+   }
+   }
+
pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
 
while (nr_pages > 0) {
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 65aa963cc115..832730549c52 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -42,6 +42,8 @@
 
 #define DMA_FL_PTE_PRESENT BIT_ULL(0)
 #define DMA_FL_PTE_US  BIT_ULL(2)
+#define DMA_FL_PTE_ACCESS  BIT_ULL(5)
+#define DMA_FL_PTE_DIRTY   BIT_ULL(6)
 #define DMA_FL_PTE_XD  BIT_ULL(63)
 
 #define ADDR_WIDTH_5LEVEL  (57)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 1/2] iommu: Add capability IOMMU_CAP_VIOMMU

2021-01-14 Thread Lu Baolu

Hi Leon,

On 1/14/21 9:26 PM, Leon Romanovsky wrote:

On Thu, Jan 14, 2021 at 09:30:02AM +0800, Lu Baolu wrote:

Some vendor IOMMU drivers are able to declare that it is running in a VM
context. This is very valuable for the features that only want to be
supported on bare metal. Add a capability bit so that it could be used.


And how is it used? Who and how will set it?


Use the existing iommu_capable(). I should add more descriptions about
who and how to use it.





Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/iommu.c  | 20 
  drivers/iommu/virtio-iommu.c |  9 +
  include/linux/iommu.h|  1 +
  3 files changed, 30 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cb205a04fe4c..8eb022d0e8aa 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5738,12 +5738,32 @@ static inline bool nested_mode_support(void)
return ret;
  }

+static inline bool caching_mode_enabled(void)
+{


Kernel coding style is not in favour of inline functions in *.c files.


Yes, agreed.

Best regards,
baolu




+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   bool ret = false;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (cap_caching_mode(iommu->cap)) {
+   ret = true;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return ret;
+}
+
  static bool intel_iommu_capable(enum iommu_cap cap)
  {
if (cap == IOMMU_CAP_CACHE_COHERENCY)
return domain_update_iommu_snooping(NULL) == 1;
if (cap == IOMMU_CAP_INTR_REMAP)
return irq_remapping_enabled == 1;
+   if (cap == IOMMU_CAP_VIOMMU)
+   return caching_mode_enabled();

return false;
  }
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 2bfdd5734844..719793e103db 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -931,7 +931,16 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
  }

+static bool viommu_capable(enum iommu_cap cap)
+{
+   if (cap == IOMMU_CAP_VIOMMU)
+   return true;
+
+   return false;
+}
+
  static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.domain_free= viommu_domain_free,
.attach_dev = viommu_attach_dev,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b95a6f8db6ff..1d24be667a03 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,7 @@ enum iommu_cap {
   transactions */
IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
+   IOMMU_CAP_VIOMMU,   /* IOMMU can declar running in a VM */
  };

  /*
--
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Intel gvt-g with nvidia prime render offload causes several issues (regression)

2021-01-14 Thread Anton Gubar'kov
Dears,

there is thread https://github.com/intel/gvt-linux/issues/162 discussing
issues when gvt-g is used in Windows guest while the host has prime render
offload enabled.

I have this issue as well and I did a kernel bisect between v5.7 (works
kind of OK, at least there are no qemu/guest crashes) and v5.8 and the
bisect landed on the commit 327d5b2fee91c404a3956c324193892cf2cc9528.

Thanks for your help.
Anton Gubarkov.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 06/33] of/device: Move dma_range_map before of_iommu_configure

2021-01-14 Thread Rob Herring
On Mon, Jan 11, 2021 at 07:18:47PM +0800, Yong Wu wrote:
> "dev->dma_range_map" contains the devices' dma_ranges information,
> This patch moves dma_range_map before of_iommu_configure. The iommu
> driver may need to know the dma_address requirements of its iommu
> consumer devices.
> 
> CC: Rob Herring 
> CC: Frank Rowand 
> Signed-off-by: Yong Wu 
> ---
>  drivers/of/device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index aedfaaafd3e7..1d84636149df 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -170,9 +170,11 @@ int of_dma_configure_id(struct device *dev, struct 
> device_node *np,
>   dev_dbg(dev, "device is%sdma coherent\n",
>   coherent ? " " : " not ");
>  
> + dev->dma_range_map = map;
>   iommu = of_iommu_configure(dev, np, id);
>   if (PTR_ERR(iommu) == -EPROBE_DEFER) {
>   kfree(map);
> + dev->dma_range_map = NULL;

Not really going to matter, but you should probably clear dma_range_map 
before what it points to is freed.

With that,

Reviewed-by: Rob Herring 

>   return -EPROBE_DEFER;
>   }
>  
> @@ -181,7 +183,6 @@ int of_dma_configure_id(struct device *dev, struct 
> device_node *np,
>  
>   arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
>  
> - dev->dma_range_map = map;
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(of_dma_configure_id);
> -- 
> 2.18.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

2021-01-14 Thread Florian Fainelli
On 1/14/21 1:08 AM, Claire Chang wrote:
> On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli  wrote:
>>
>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>> If a device is not behind an IOMMU, we look up the device node and set
>>> up the restricted DMA when the restricted-dma-pool is presented.
>>>
>>> Signed-off-by: Claire Chang 
>>> ---
>>
>> [snip]
>>
>>> +int of_dma_set_restricted_buffer(struct device *dev)
>>> +{
>>> + struct device_node *node;
>>> + int count, i;
>>> +
>>> + if (!dev->of_node)
>>> + return 0;
>>> +
>>> + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
>>> + sizeof(phandle));
>>
>> You could have an early check for count < 0, along with an error
>> message, if that is deemed useful.
>>
>>> + for (i = 0; i < count; i++) {
>>> + node = of_parse_phandle(dev->of_node, "memory-region", i);
>>> + if (of_device_is_compatible(node, "restricted-dma-pool"))
>>
>> And you may want to add here an of_device_is_available(node). A platform
>> that provides the Device Tree firmware and try to support multiple
>> different SoCs may try to determine if an IOMMU is present, and if it
>> is, it could be marking the restriced-dma-pool region with a 'status =
>> "disabled"' property, or any variant of that scheme.
> 
> This function is called only when there is no IOMMU present (check in
> drivers/of/device.c). I can still add of_device_is_available(node)
> here if you think it's helpful.

I believe it is, since boot loader can have a shared Device Tree blob
skeleton and do various adaptations based on the chip (that's what we
do) and adding a status property is much simpler than insertion new
nodes are run time.

> 
>>
>>> + return of_reserved_mem_device_init_by_idx(
>>> + dev, dev->of_node, i);
>>
>> This does not seem to be supporting more than one memory region, did not
>> you want something like instead:
>>
>> ret = of_reserved_mem_device_init_by_idx(...);
>> if (ret)
>> return ret;
>>
> 
> Yes. This implement only supports one restriced-dma-pool memory region
> with the assumption that there is only one memory region with the
> compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
> to shared-dma-pool.

Then if here is such a known limitation it should be both documented and
enforced here, you shouldn ot be iterating over all of the phandles that
you find, stop at the first one and issue a warning if count > 1?
-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Auger Eric
Hi Jean,

On 1/14/21 6:33 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Thu, Jan 14, 2021 at 05:58:27PM +0100, Auger Eric wrote:
  The uacce-devel branches from
> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> (they track the latest sva/zip-devel branch
> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
>> As I plan to respin shortly, please could you confirm the best branch to
>> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
>> repo). Is it up to date? Commits seem to be quite old there.
> 
> Right I meant the uacce-devel-X branches. The uacce-devel-5.11 branch
> currently has the latest patches

OK thanks!

Eric
> 
> Thanks,
> Jean
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Jean-Philippe Brucker
Hi Eric,

On Thu, Jan 14, 2021 at 05:58:27PM +0100, Auger Eric wrote:
> >>  The uacce-devel branches from
> >>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> >>> (they track the latest sva/zip-devel branch
> >>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> As I plan to respin shortly, please could you confirm the best branch to
> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
> repo). Is it up to date? Commits seem to be quite old there.

Right I meant the uacce-devel-X branches. The uacce-devel-5.11 branch
currently has the latest patches

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] vfio/iommu_type1: Fixes vfio_dma_populate_bitmap to avoid dirty lose

2021-01-14 Thread Alex Williamson
On Thu, 14 Jan 2021 21:05:23 +0800
Keqian Zhu  wrote:

> Hi Alex,
> 
> On 2021/1/13 5:20, Alex Williamson wrote:
> > On Thu, 7 Jan 2021 17:28:57 +0800
> > Keqian Zhu  wrote:
> >   
> >> Defer checking whether vfio_dma is of fully-dirty in update_user_bitmap
> >> is easy to lose dirty log. For example, after promoting pinned_scope of
> >> vfio_iommu, vfio_dma is not considered as fully-dirty, then we may lose
> >> dirty log that occurs before vfio_iommu is promoted.
> >>
> >> The key point is that pinned-dirty is not a real dirty tracking way, it
> >> can't continuously track dirty pages, but just restrict dirty scope. It
> >> is essentially the same as fully-dirty. Fully-dirty is of full-scope and
> >> pinned-dirty is of pinned-scope.
> >>
> >> So we must mark pinned-dirty or fully-dirty after we start dirty tracking
> >> or clear dirty bitmap, to ensure that dirty log is marked right away.  
> > 
> > I was initially convinced by these first three patches, but upon
> > further review, I think the premise is wrong.  AIUI, the concern across
> > these patches is that our dirty bitmap is only populated with pages
> > dirtied by pinning and we only take into account the pinned page dirty
> > scope at the time the bitmap is retrieved by the user.  You suppose
> > this presents a gap where if a vendor driver has not yet identified
> > with a page pinning scope that the entire bitmap should be considered
> > dirty regardless of whether that driver later pins pages prior to the
> > user retrieving the dirty bitmap.  
> Yes, this is my concern. And there are some other scenarios.
> 
> 1. If a non-pinned iommu-backed domain is detached between starting
> dirty log and retrieving dirty bitmap, then the dirty log is missed
> (As you said in the last paragraph).
> 
> 2. If all vendor drivers pinned (means iommu pinned_scope is true),
> and an vendor driver do pin/unpin pair between starting dirty log and
> retrieving dirty bitmap, then the dirty log of these unpinned pages
> are missed.

Can you identity where this happen?  I believe this assertion is
incorrect.  When dirty logging is enabled vfio_dma_populate_bitmap()
marks all existing pinned pages as dirty.  In the scenario you
describe, the iommu pinned page dirty scope is irrelevant.  We only
track pinned or external DMA access pages for exactly this reason.
Unpinning a page never clears the dirty bitmap, only the user
retrieving the bitmap or disabling dirty logging clears the bitmap.  A
page that has been unpinned is transiently dirty, it is not repopulated
in the bitmap after the user has retrieved the bitmap because the
device no longer has access to it to consider it perpetually dirty.

> > I don't think this is how we intended the cooperation between the
> > iommu driver and vendor driver to work.  By pinning pages a vendor
> > driver is not declaring that only their future dirty page scope is
> > limited to pinned pages, instead they're declaring themselves as a
> > participant in dirty page tracking and take responsibility for
> > pinning any necessary pages.  For example we might extend
> > VFIO_IOMMU_DIRTY_PAGES_FLAG_START to trigger a blocking
> > notification to groups to not only begin dirty tracking, but also
> > to synchronously register their current device DMA footprint.  This
> > patch would require a vendor driver to possibly perform a
> > gratuitous page pinning in order to set the scope prior to dirty
> > logging being enabled, or else the initial bitmap will be fully
> > dirty.  
> I get what you mean ;-). You said that there is time gap between we
> attach a device and this device declares itself is dirty traceable.
> 
> However, this makes it difficult to management dirty log tracking (I
> list two bugs). If the vfio devices can declare themselves dirty
> traceable when attach to container, then the logic of dirty tracking
> is much more clear ;-) .

There's only one bug above afaict, which should be easily fixed.  I
think if you actually dig into the problem of a device declaring
themselves as dirty tracking capable, when the IOMMU backend works on
group, not devices, and it's groups that are attached to containers,
you might see that the logic is not so clear.  I also don't see this as
a significant issue, you're focusing on a niche scenario where a device
is hot-added to a VM, which is immediately migrated before the device
identifies itself by pinning pages.  In that scenario the iommu dirty
scope would be overly broad and we'd indicate all pages are dirty.
This errors on the side of reporting too much is dirty, which still
provides a correct result to the user.  The more common scenario would
be migration of a "long" running, stable VM, where drivers are active
and devices have already pinned pages if they intend to.

> > Therefore, I don't see that this series is necessary or correct.
> > Kirti, does this match your thinking?
> > 
> > Thinking about these semantics, it seems there might still be an
> > issue if a group with 

RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 14 January 2021 16:58
> To: Shameerali Kolothum Thodi ;
> Jean-Philippe Brucker 
> Cc: Xieyingtai ; alex.william...@redhat.com;
> wangxingang ; k...@vger.kernel.org;
> m...@kernel.org; linux-ker...@vger.kernel.org; vivek.gau...@arm.com;
> iommu@lists.linux-foundation.org; qubingbing ;
> Zengtao (B) ; zhangfei@linaro.org;
> eric.auger@gmail.com; w...@kernel.org; kvm...@lists.cs.columbia.edu;
> robin.mur...@arm.com
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Shameer, Jean-Philippe,
> 
> On 12/4/20 11:23 AM, Auger Eric wrote:
> > Hi Shameer, Jean-Philippe,
> >
> > On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
> >> Hi Jean,
> >>
> >>> -Original Message-
> >>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> >>> Sent: 04 December 2020 09:54
> >>> To: Shameerali Kolothum Thodi
> 
> >>> Cc: Auger Eric ; wangxingang
> >>> ; Xieyingtai ;
> >>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org;
> w...@kernel.org;
> >>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> >>> vivek.gau...@arm.com; alex.william...@redhat.com;
> >>> zhangfei@linaro.org; robin.mur...@arm.com;
> >>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
> >>> ; qubingbing 
> >>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation
> with
> >>> unmanaged ASIDs
> >>>
> >>> Hi Shameer,
> >>>
> >>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi
> wrote:
>  Hi Jean/zhangfei,
>  Is it possible to have a branch with minimum required SVA/UACCE related
> >>> patches
>  that are already public and can be a "stable" candidate for future respin
> of
> >>> Eric's series?
>  Please share your thoughts.
> >>>
> >>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
> >>> based on mainline?
> >>
> >> Yes.
> >>
> >>  The uacce-devel branches from
> >>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> >>> (they track the latest sva/zip-devel branch
> >>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> As I plan to respin shortly, please could you confirm the best branch to
> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
> repo). Is it up to date? Commits seem to be quite old there.

I think it is the uacce-devel-5.11 branch, but will wait for Jean or Zhangfei
to confirm.

Thanks,
Shameer

> Thanks
> 
> Eric
> >>
> >> Thanks.
> >>
> >> Hi Eric,
> >>
> >> Could you please take a look at the above branches and see whether it make
> sense
> >> to rebase on top of either of those?
> >>
> >> From vSVA point of view, it will be less rebase hassle if we can do that.
> >
> > Sure. I will rebase on top of this ;-)
> >
> > Thanks
> >
> > Eric
> >>
> >> Thanks,
> >> Shameer
> >>
> >>> Thanks,
> >>> Jean
> >>
> >
> > ___
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Auger Eric
Hi Shameer, Jean-Philippe,

On 12/4/20 11:23 AM, Auger Eric wrote:
> Hi Shameer, Jean-Philippe,
> 
> On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
>> Hi Jean,
>>
>>> -Original Message-
>>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
>>> Sent: 04 December 2020 09:54
>>> To: Shameerali Kolothum Thodi 
>>> Cc: Auger Eric ; wangxingang
>>> ; Xieyingtai ;
>>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>>> vivek.gau...@arm.com; alex.william...@redhat.com;
>>> zhangfei@linaro.org; robin.mur...@arm.com;
>>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
>>> ; qubingbing 
>>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>>> unmanaged ASIDs
>>>
>>> Hi Shameer,
>>>
>>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
 Hi Jean/zhangfei,
 Is it possible to have a branch with minimum required SVA/UACCE related
>>> patches
 that are already public and can be a "stable" candidate for future respin 
 of
>>> Eric's series?
 Please share your thoughts.
>>>
>>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>>> based on mainline? 
>>
>> Yes. 
>>
>>  The uacce-devel branches from
>>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>>> (they track the latest sva/zip-devel branch
>>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
As I plan to respin shortly, please could you confirm the best branch to
rebase on still is that one (uacce-devel from the linux-kernel-uadk git
repo). Is it up to date? Commits seem to be quite old there.

Thanks

Eric
>>
>> Thanks. 
>>
>> Hi Eric,
>>
>> Could you please take a look at the above branches and see whether it make 
>> sense
>> to rebase on top of either of those?
>>
>> From vSVA point of view, it will be less rebase hassle if we can do that.
> 
> Sure. I will rebase on top of this ;-)
> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Shameer
>>
>>> Thanks,
>>> Jean
>>
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA

2021-01-14 Thread Jean-Philippe Brucker
On Wed, Jan 13, 2021 at 08:10:28AM +, Tian, Kevin wrote:
> > >> Is this only for SVA? We may see more scenarios of using IOPF. For
> > >> example, when passing through devices to user level, the user's pages
> > >> could be managed dynamically instead of being allocated and pinned
> > >> statically.
> > >
> > > Hm, isn't that precisely what SVA does?  I don't understand the
> > > difference. That said FEAT_IOPF doesn't have to be only for SVA. It could
> > > later be used as a prerequisite some another feature. For special cases
> > > device drivers can always use the iommu_register_device_fault_handler()
> > > API and handle faults themselves.
> > 
> >  From the perspective of IOMMU, there is a little difference between
> > these two. For SVA, the page table is from CPU side, so IOMMU only needs
> > to call handle_mm_fault(); For above pass-through case, the page table
> > is from IOMMU side, so the device driver (probably VFIO) needs to
> > register a fault handler and call iommu_map/unmap() to serve the page
> > faults.
> > 
> > If we think about the nested mode (or dual-stage translation), it's more
> > complicated since the kernel (probably VFIO) handles the second level
> > page faults, while the first level page faults need to be delivered to
> > user-level guest. Obviously, this hasn't been fully implemented in any
> > IOMMU driver.
> > 
> 
> Thinking more the confusion might come from the fact that we mixed
> hardware capability with software capability. IOMMU_FEAT describes
> the hardware capability. When FEAT_IOPF is set, it purely means whatever
> page faults that are enabled by the software are routed through the IOMMU.
> Nothing more. Then the software (IOMMU drivers) may choose to support
> only limited faulting scenarios and then evolve to support more complex 
> usages gradually. For example, the intel-iommu driver only supports 1st-level
> fault (thus SVA) for now, while FEAT_IOPF as a separate feature may give the
> impression that 2nd-level faults are also allowed. From this angle once we 
> start to separate page fault from SVA, we may also need a way to report 
> the software capability (e.g. a set of faulting categories) and also extend
> iommu_register_device_fault_handler to allow specifying which 
> category is enabled respectively. The example categories could be:
> 
> - IOPF_BIND, for page tables which are bound/linked to the IOMMU. 
> Apply to bare metal SVA and guest SVA case;

These don't seem to fit in the same software capability, since the action
to perform on incoming page faults is very different. In the first case
the fault handling is entirely contained within the IOMMU driver; in the
second case the IOMMU driver only tracks page requests, and offloads
handling to VFIO.

> - IOPF_MAP, for page tables which are managed through explicit IOMMU
> map interfaces. Apply to removing VFIO pinning restriction;

>From the IOMMU perspective this is the same as guest SVA, no? VFIO
registering a fault handler and doing the bulk of the work itself.

> Both categories can be enabled together in nested translation, with 
> additional information provided to differentiate them in fault information.
> Using paging/staging level doesn't make much sense as it's IOMMU driver's 
> internal knowledge, e.g. VT-d driver plans to use 1st level for GPA if no 
> nesting and then turn to 2nd level when nesting is enabled.

I guess detailing what's needed for nested IOPF can help the discussion,
although I haven't seen any concrete plan about implementing it, and it
still seems a couple of years away. There are two important steps with
nested IOPF:

(1) Figuring out whether a fault comes from L1 or L2. A SMMU stall event
comes with this information, but a PRI page request doesn't. The IOMMU
driver has to first translate the IOVA to a GPA, injecting the fault
into the guest if this translation fails by using the usual
iommu_report_device_fault().

(2) Translating the faulting GPA to a HVA that can be fed to
handle_mm_fault(). That requires help from KVM, so another interface -
either KVM registering GPA->HVA translation tables or IOMMU driver
querying each translation. Either way it should be reusable by device
drivers that implement IOPF themselves.

(1) could be enabled with iommu_dev_enable_feature(). (2) requires a more
complex interface. (2) alone might also be desirable - demand-paging for
level 2 only, no SVA for level 1.

Anyway, back to this patch. What I'm trying to convey is "can the IOMMU
receive incoming I/O page faults for this device and, when SVA is enabled,
feed them to the mm subsystem?  Enable that or return an error." I'm stuck
on the name. IOPF alone is too vague. Not IOPF_L1 as Kevin noted, since L1
is also used in virtualization. IOPF_BIND and IOPF_SVA could also mean (2)
above. IOMMU_DEV_FEAT_IOPF_FLAT?

That leaves space for the nested extensions. (1) above could be
IOMMU_FEAT_IOPF_NESTED, and (2) requires some new interfacing with KVM 

Re: [PATCH] iommu/arm-smmu-v3: Handle duplicated Stream IDs from other masters

2021-01-14 Thread Robin Murphy

On 2021-01-12 20:29, Ajay Kumar wrote:

On Tue, Jan 12, 2021 at 12:57 AM Robin Murphy  wrote:


On 2021-01-07 13:03, Will Deacon wrote:

On Thu, Jan 07, 2021 at 03:03:40PM +0530, Ajay Kumar wrote:

When PCI function drivers(ex:pci-endpoint-test) are probed for already
initialized PCIe-RC(Root Complex), and PCIe-RC is already bound to SMMU,
then we encounter a situation where the function driver tries to attach
itself to the smmu with the same stream-id as PCIe-RC and re-initialize
an already initialized STE. This causes ste_live BUG_ON() in the driver.


Note that this is actually expected behaviour, since Stream ID aliasing
has remained officially not supported until a sufficiently compelling
reason to do so appears. I always thought the most likely scenario would
be a legacy PCI bridge with multiple devices behind it, but even that
seems increasingly improbable for a modern SMMUv3-based system to ever see.

Thanks to Will and Robin for reviewing this. I am pretty new to PCI,
sorry about that.
I assumed that the support for stream-id alias is already handled as
part of this patch:
https://www.spinics.net/lists/arm-kernel/msg626087.html
which prevents STE re-initialization. But, what I do not understand is
why the path
taken by the arm-smmu-v3 driver misses the aforementioned check for my usecase.


That case is where a single device, due to combinations of PCI DMA 
aliasing conditions, has multiple IDs of its own, and two or more of 
those IDs also happen to end up the same as each other. What you have is 
two distinct devices both claiming the same ID, since apparently they 
both represent the same underlying physical device (I don't know how the 
endpoint and pcieport drivers interoperate and/or coexist, but I can 
easily imagine that some liberties may be taken that the IOMMU layer 
doesn't really anticipate).



I don't understand why the endpoint is using the same stream ID as the root
complex in this case. Why is that? Is the grouping logic not working
properly?


It's not so much that it isn't working properly, it's more that it needs
to be implemented at all ;)

The pci_endpoint_test picks up the same of_ DMA config node as the PCI RC
because they sit on the same PCI bus [via pci_dma_configure( )]
While in the arm-smmu-v3 driver, I can see that the pci_device_group( ) hands
over the same iommu group as the Root Complex to the newly added master
device (pci_endpoint_test in our case) because they share the same stream ID.
Shouldn't they?


I'd imagine it's most likely that the PCI grouping rules are just 
putting everything together due to a lack of ACS. Either way, I'm pretty 
sure the PCI logic *doesn't* consider actual PCI devices having 
overlapping Requester IDs, because that isn't a real thing (how would 
config accesses work?). You can consider yourself lucky that the devices 
do happen to be grouped already in your particular case, but that 
doesn't change the fact that there's basically no point in trying to 
handle Stream ID aliasing within groups without properly implementing 
the grouping-by-Stream-ID logic in the first place (note that even 
distinct ACS-isolated PCI endpoints could still alias beyond the host 
bridge at the SMMU input if the system's translation from Requester ID 
space to Stream ID space isn't 1:1)



There is an already existing check in the driver to manage duplicated ids
if duplicated ids are added in same master device, but there can be
scenarios like above where we need to extend the check for other masters
using the same stream-id.

Signed-off-by: Ajay Kumar 
---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 33 +
   1 file changed, 33 insertions(+)


It doesn't feel like the driver is the right place to fix this, as the same
issue could surely occur for other IOMMUs too, right? In which case, I think
we should avoid getting into the situation where different groups have
overlapping stream IDs.


Yes, this patch does not represent the correct thing to do either way.
The main reason that Stream ID aliasing hasn't been supported so far is
that the required Stream ID to group lookup is rather awkward, and
adding all of that complexity just for the sake of a rather unlikely
possibility seemed dubious. However, PRI support has always had a more
pressing need to implement almost the same thing (Stream ID to device),
so once that lands we can finally get round to adding the rest of proper
group support relatively easily.

I hope the support will be added soon. Also, can you point me to few drivers
which already handle this type of stream-ID aliasing?


We handle it in the SMMUv2 driver - the way that architecture does 
stream mapping makes it really easy to hang group pointers off the S2CRs 
and have arm_smmu_master_alloc_smes() and arm_smmu_device_group() work 
nicely together. Unfortunately it's not feasible to take the same 
approach for SMMUv3, since the Stream Table there may be up to 2^32 
entries (vs. up to 128 S2CRs), and there just 

Re: [RFC PATCH v3 1/2] iommu: Add capability IOMMU_CAP_VIOMMU

2021-01-14 Thread Leon Romanovsky
On Thu, Jan 14, 2021 at 09:30:02AM +0800, Lu Baolu wrote:
> Some vendor IOMMU drivers are able to declare that it is running in a VM
> context. This is very valuable for the features that only want to be
> supported on bare metal. Add a capability bit so that it could be used.

And how is it used? Who and how will set it?

>
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c  | 20 
>  drivers/iommu/virtio-iommu.c |  9 +
>  include/linux/iommu.h|  1 +
>  3 files changed, 30 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index cb205a04fe4c..8eb022d0e8aa 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5738,12 +5738,32 @@ static inline bool nested_mode_support(void)
>   return ret;
>  }
>
> +static inline bool caching_mode_enabled(void)
> +{

Kernel coding style is not in favour of inline functions in *.c files.

> + struct dmar_drhd_unit *drhd;
> + struct intel_iommu *iommu;
> + bool ret = false;
> +
> + rcu_read_lock();
> + for_each_active_iommu(iommu, drhd) {
> + if (cap_caching_mode(iommu->cap)) {
> + ret = true;
> + break;
> + }
> + }
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
>  static bool intel_iommu_capable(enum iommu_cap cap)
>  {
>   if (cap == IOMMU_CAP_CACHE_COHERENCY)
>   return domain_update_iommu_snooping(NULL) == 1;
>   if (cap == IOMMU_CAP_INTR_REMAP)
>   return irq_remapping_enabled == 1;
> + if (cap == IOMMU_CAP_VIOMMU)
> + return caching_mode_enabled();
>
>   return false;
>  }
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 2bfdd5734844..719793e103db 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -931,7 +931,16 @@ static int viommu_of_xlate(struct device *dev, struct 
> of_phandle_args *args)
>   return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
>
> +static bool viommu_capable(enum iommu_cap cap)
> +{
> + if (cap == IOMMU_CAP_VIOMMU)
> + return true;
> +
> + return false;
> +}
> +
>  static struct iommu_ops viommu_ops = {
> + .capable= viommu_capable,
>   .domain_alloc   = viommu_domain_alloc,
>   .domain_free= viommu_domain_free,
>   .attach_dev = viommu_attach_dev,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b95a6f8db6ff..1d24be667a03 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -94,6 +94,7 @@ enum iommu_cap {
>  transactions */
>   IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
>   IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
> + IOMMU_CAP_VIOMMU,   /* IOMMU can declar running in a VM */
>  };
>
>  /*
> --
> 2.25.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] vfio/iommu_type1: Fixes vfio_dma_populate_bitmap to avoid dirty lose

2021-01-14 Thread Keqian Zhu
Hi Alex,

On 2021/1/13 5:20, Alex Williamson wrote:
> On Thu, 7 Jan 2021 17:28:57 +0800
> Keqian Zhu  wrote:
> 
>> Defer checking whether vfio_dma is of fully-dirty in update_user_bitmap
>> is easy to lose dirty log. For example, after promoting pinned_scope of
>> vfio_iommu, vfio_dma is not considered as fully-dirty, then we may lose
>> dirty log that occurs before vfio_iommu is promoted.
>>
>> The key point is that pinned-dirty is not a real dirty tracking way, it
>> can't continuously track dirty pages, but just restrict dirty scope. It
>> is essentially the same as fully-dirty. Fully-dirty is of full-scope and
>> pinned-dirty is of pinned-scope.
>>
>> So we must mark pinned-dirty or fully-dirty after we start dirty tracking
>> or clear dirty bitmap, to ensure that dirty log is marked right away.
> 
> I was initially convinced by these first three patches, but upon
> further review, I think the premise is wrong.  AIUI, the concern across
> these patches is that our dirty bitmap is only populated with pages
> dirtied by pinning and we only take into account the pinned page dirty
> scope at the time the bitmap is retrieved by the user.  You suppose
> this presents a gap where if a vendor driver has not yet identified
> with a page pinning scope that the entire bitmap should be considered
> dirty regardless of whether that driver later pins pages prior to the
> user retrieving the dirty bitmap.
Yes, this is my concern. And there are some other scenarios.

1. If a non-pinned iommu-backed domain is detached between starting dirty log 
and retrieving
dirty bitmap, then the dirty log is missed (As you said in the last paragraph).

2. If all vendor drivers pinned (means iommu pinned_scope is true), and an 
vendor driver do pin/unpin
pair between starting dirty log and retrieving dirty bitmap, then the dirty log 
of these unpinned pages
are missed.

> 
> I don't think this is how we intended the cooperation between the iommu
> driver and vendor driver to work.  By pinning pages a vendor driver is
> not declaring that only their future dirty page scope is limited to
> pinned pages, instead they're declaring themselves as a participant in
> dirty page tracking and take responsibility for pinning any necessary
> pages.  For example we might extend VFIO_IOMMU_DIRTY_PAGES_FLAG_START
> to trigger a blocking notification to groups to not only begin dirty
> tracking, but also to synchronously register their current device DMA
> footprint.  This patch would require a vendor driver to possibly perform
> a gratuitous page pinning in order to set the scope prior to dirty
> logging being enabled, or else the initial bitmap will be fully dirty.
I get what you mean ;-). You said that there is time gap between we attach a 
device
and this device declares itself is dirty traceable.

However, this makes it difficult to management dirty log tracking (I list two 
bugs).
If the vfio devices can declare themselves dirty traceable when attach to 
container,
then the logic of dirty tracking is much more clear ;-) .

> 
> Therefore, I don't see that this series is necessary or correct.  Kirti,
> does this match your thinking?
> 
> Thinking about these semantics, it seems there might still be an issue
> if a group with non-pinned-page dirty scope is detached with dirty
> logging enabled.  It seems this should in fact fully populate the dirty
> bitmaps at the time it's removed since we don't know the extent of its
> previous DMA, nor will the group be present to trigger the full bitmap
> when the user retrieves the dirty bitmap.  Creating fully populated
> bitmaps at the time tracking is enabled negates our ability to take
> advantage of later enlightenment though.  Thanks,
> 
Since that you want to allow the time gap between we attach device and the 
device
declare itself dirty traceable, I am going to fix these two bugs in patch v2. 
Do you
agree?

Thanks,
Keqian

> Alex
> 
>> Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
>> tracking")
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 33 ++---
>>  1 file changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index bceda5e8baaa..b0a26e8e0adf 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -224,7 +224,7 @@ static void vfio_dma_bitmap_free(struct vfio_dma *dma)
>>  dma->bitmap = NULL;
>>  }
>>  
>> -static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize)
>> +static void vfio_dma_populate_bitmap_pinned(struct vfio_dma *dma, size_t 
>> pgsize)
>>  {
>>  struct rb_node *p;
>>  unsigned long pgshift = __ffs(pgsize);
>> @@ -236,6 +236,25 @@ static void vfio_dma_populate_bitmap(struct vfio_dma 
>> *dma, size_t pgsize)
>>  }
>>  }
>>  
>> +static void vfio_dma_populate_bitmap_full(struct vfio_dma *dma, size_t 
>> pgsize)
>> +{
>> +unsigned long pgshift 

Re: [PATCH 4/5] vfio/iommu_type1: Carefully use unmap_unpin_all during dirty tracking

2021-01-14 Thread Keqian Zhu



On 2021/1/13 3:53, Alex Williamson wrote:
> On Tue, 12 Jan 2021 20:04:38 +0800
> Keqian Zhu  wrote:
> 
>> On 2021/1/12 5:49, Alex Williamson wrote:
>>> On Thu, 7 Jan 2021 17:29:00 +0800
>>> Keqian Zhu  wrote:
>>>   
 If we detach group during dirty page tracking, we shouldn't remove
 vfio_dma, because dirty log will lose.

 But we don't prevent unmap_unpin_all in vfio_iommu_release, because
 under normal procedure, dirty tracking has been stopped.  
>>>
>>> This looks like it's creating a larger problem than it's fixing, it's
>>> not our job to maintain the dirty bitmap regardless of what the user
>>> does.  If the user detaches the last group in a container causing the
>>> mappings within that container to be deconstructed before the user has
>>> collected dirty pages, that sounds like a user error.  A container with
>>> no groups is de-privileged and therefore loses all state.  Thanks,
>>>
>>> Alex  
>>
>> Hi Alex,
>>
>> This looks good to me ;-). That's a reasonable constraint for user behavior.
>>
>> What about replacing this patch with an addition to the uapi document of
>> VFIO_GROUP_UNSET_CONTAINER? User should pay attention to this when call this
>> ioctl during dirty tracking.
> 
> Here's the current uapi comment:
> 
> /**
>  * VFIO_GROUP_UNSET_CONTAINER - _IO(VFIO_TYPE, VFIO_BASE + 5)
>  *
>  * Remove the group from the attached container.  This is the
>  * opposite of the SET_CONTAINER call and returns the group to
>  * an initial state.  All device file descriptors must be released
>  * prior to calling this interface.  When removing the last group
>  * from a container, the IOMMU will be disabled and all state lost,
>  * effectively also returning the VFIO file descriptor to an initial
>  * state.
>  * Return: 0 on success, -errno on failure.
>  * Availability: When attached to container
>  */
> 
> So we already indicate that "all state" of the container is lost when
> removing the last group, I don't see that it's necessarily to
> explicitly include dirty bitmap state beyond that statement.  Without
> mappings there can be no dirty bitmap to track.
OK :-) .

> 
>  > And any comments on other patches? thanks.
> 
> I had a difficult time mapping the commit log to the actual code
> change, I'll likely have some wording suggestions.  Is patch 5/5 still
> necessary if this patch is dropped?  Thanks,
> 
I think the 5th patch is still necessary. vfio_sanity_check_pfn_list() is used 
to check
whether pfn_list of vfio_dma is empty. but we apply this check just for 
external domain.
If the iommu backed domain also pin some pages, then this check fails. So I 
think we should
use this check only when all domains are about to be removed.

Besides, this patch should extract the "WARN_ON(iommu->notifier.head);" just 
for external domain.

Thanks,
Keqian

> Alex
> 
 Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
 tracking")
 Signed-off-by: Keqian Zhu 
 ---
  drivers/vfio/vfio_iommu_type1.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)

 diff --git a/drivers/vfio/vfio_iommu_type1.c 
 b/drivers/vfio/vfio_iommu_type1.c
 index 26b7eb2a5cfc..9776a059904d 100644
 --- a/drivers/vfio/vfio_iommu_type1.c
 +++ b/drivers/vfio/vfio_iommu_type1.c
 @@ -2373,7 +2373,12 @@ static void vfio_iommu_type1_detach_group(void 
 *iommu_data,
if (list_empty(>external_domain->group_list)) {
vfio_sanity_check_pfn_list(iommu);
  
 -  if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
 +  /*
 +   * During dirty page tracking, we can't remove
 +   * vfio_dma because dirty log will lose.
 +   */
 +  if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) &&
 +  !iommu->dirty_page_tracking)
vfio_iommu_unmap_unpin_all(iommu);
  
kfree(iommu->external_domain);
 @@ -2406,10 +2411,15 @@ static void vfio_iommu_type1_detach_group(void 
 *iommu_data,
 * iommu and external domain doesn't exist, then all the
 * mappings go away too. If it's the last domain with iommu and
 * external domain exist, update accounting
 +   *
 +   * Note: During dirty page tracking, we can't remove vfio_dma
 +   * because dirty log will lose. Just update accounting is a good
 +   * choice.
 */
if (list_empty(>group_list)) {
if (list_is_singular(>domain_list)) {
 -  if (!iommu->external_domain)
 +  if (!iommu->external_domain &&
 +  !iommu->dirty_page_tracking)

[PATCH 1/1] iommu/vt-d: Add qi_submit trace event

2021-01-14 Thread Lu Baolu
This adds a new trace event to track the submissions of requests to the
invalidation queue. This event will provide the information like:
- IOMMU name
- Invalidation type
- Descriptor raw data

A sample output like:
| qi_submit: iotlb_inv dmar1: 0x100e2 0x0 0x0 0x0
| qi_submit: dev_tlb_inv dmar1: 0x13 0x7001 0x0 0x0
| qi_submit: iotlb_inv dmar2: 0x800f2 0xf9a5 0x0 0x0

This will be helpful for queued invalidation related debugging.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/dmar.c |  3 +++
 include/trace/events/intel_iommu.h | 37 ++
 2 files changed, 40 insertions(+)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 004feaed3c72..bd51f33642e0 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../irq_remapping.h"
 
@@ -1307,6 +1308,8 @@ int qi_submit_sync(struct intel_iommu *iommu, struct 
qi_desc *desc,
offset = ((index + i) % QI_LENGTH) << shift;
memcpy(qi->desc + offset, [i], 1 << shift);
qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
+   trace_qi_submit(iommu, desc[i].qw0, desc[i].qw1,
+   desc[i].qw2, desc[i].qw3);
}
qi->desc_status[wait_index] = QI_IN_USE;
 
diff --git a/include/trace/events/intel_iommu.h 
b/include/trace/events/intel_iommu.h
index 112bd06487bf..aad2ff0c1e2e 100644
--- a/include/trace/events/intel_iommu.h
+++ b/include/trace/events/intel_iommu.h
@@ -135,6 +135,43 @@ DEFINE_EVENT(dma_map_sg, bounce_map_sg,
 struct scatterlist *sg),
TP_ARGS(dev, index, total, sg)
 );
+
+TRACE_EVENT(qi_submit,
+   TP_PROTO(struct intel_iommu *iommu, u64 qw0, u64 qw1, u64 qw2, u64 qw3),
+
+   TP_ARGS(iommu, qw0, qw1, qw2, qw3),
+
+   TP_STRUCT__entry(
+   __field(u64, qw0)
+   __field(u64, qw1)
+   __field(u64, qw2)
+   __field(u64, qw3)
+   __string(iommu, iommu->name)
+   ),
+
+   TP_fast_assign(
+   __assign_str(iommu, iommu->name);
+   __entry->qw0 = qw0;
+   __entry->qw1 = qw1;
+   __entry->qw2 = qw2;
+   __entry->qw3 = qw3;
+   ),
+
+   TP_printk("%s %s: 0x%llx 0x%llx 0x%llx 0x%llx",
+ __print_symbolic(__entry->qw0 & 0xf,
+  { QI_CC_TYPE,"cc_inv" },
+  { QI_IOTLB_TYPE, "iotlb_inv" },
+  { QI_DIOTLB_TYPE,"dev_tlb_inv" },
+  { QI_IEC_TYPE,   "iec_inv" },
+  { QI_IWD_TYPE,   "inv_wait" },
+  { QI_EIOTLB_TYPE,"p_iotlb_inv" },
+  { QI_PC_TYPE,"pc_inv" },
+  { QI_DEIOTLB_TYPE,   "p_dev_tlb_inv" },
+  { QI_PGRP_RESP_TYPE, "page_grp_resp" }),
+   __get_str(iommu),
+   __entry->qw0, __entry->qw1, __entry->qw2, __entry->qw3
+   )
+);
 #endif /* _TRACE_INTEL_IOMMU_H */
 
 /* This part must be outside protection */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-14 Thread Claire Chang
On Wed, Jan 13, 2021 at 8:42 PM Christoph Hellwig  wrote:
>
> > +#ifdef CONFIG_SWIOTLB
> > + struct io_tlb_mem   *dma_io_tlb_mem;
> >  #endif
>
> Please add a new config option for this code instead of always building
> it when swiotlb is enabled.
>
> > +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> > start,
> > +size_t size)
>
> Can you split the refactoring in swiotlb.c into one or more prep
> patches?
>
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > + struct device *dev)
> > +{
> > + struct io_tlb_mem *mem = rmem->priv;
> > + int ret;
> > +
> > + if (dev->dma_io_tlb_mem)
> > + return -EBUSY;
> > +
> > + if (!mem) {
> > + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > + if (!mem)
> > + return -ENOMEM;
>
> What is the calling convention here that allows for a NULL and non-NULL
> private data?

Since multiple devices can share the same pool, the private data,
io_tlb_mem struct, will be initialized by the first device attached to
it.
This is similar to rmem_dma_device_init() in kernel/dma/coherent.c.
I'll add a comment for it in next version.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool

2021-01-14 Thread Claire Chang
On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli  wrote:
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > If a device is not behind an IOMMU, we look up the device node and set
> > up the restricted DMA when the restricted-dma-pool is presented.
> >
> > Signed-off-by: Claire Chang 
> > ---
>
> [snip]
>
> > +int of_dma_set_restricted_buffer(struct device *dev)
> > +{
> > + struct device_node *node;
> > + int count, i;
> > +
> > + if (!dev->of_node)
> > + return 0;
> > +
> > + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
> > + sizeof(phandle));
>
> You could have an early check for count < 0, along with an error
> message, if that is deemed useful.
>
> > + for (i = 0; i < count; i++) {
> > + node = of_parse_phandle(dev->of_node, "memory-region", i);
> > + if (of_device_is_compatible(node, "restricted-dma-pool"))
>
> And you may want to add here an of_device_is_available(node). A platform
> that provides the Device Tree firmware and try to support multiple
> different SoCs may try to determine if an IOMMU is present, and if it
> is, it could be marking the restriced-dma-pool region with a 'status =
> "disabled"' property, or any variant of that scheme.

This function is called only when there is no IOMMU present (check in
drivers/of/device.c). I can still add of_device_is_available(node)
here if you think it's helpful.

>
> > + return of_reserved_mem_device_init_by_idx(
> > + dev, dev->of_node, i);
>
> This does not seem to be supporting more than one memory region, did not
> you want something like instead:
>
> ret = of_reserved_mem_device_init_by_idx(...);
> if (ret)
> return ret;
>

Yes. This implement only supports one restriced-dma-pool memory region
with the assumption that there is only one memory region with the
compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
to shared-dma-pool.


> > + }
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index aedfaaafd3e7..e2c7409956ab 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -182,6 +182,10 @@ int of_dma_configure_id(struct device *dev, struct 
> > device_node *np,
> >   arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
> >
> >   dev->dma_range_map = map;
> > +
> > + if (!iommu)
> > + return of_dma_set_restricted_buffer(dev);
> > +
> >   return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(of_dma_configure_id);
> > diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> > index d9e6a324de0a..28a2dfa197ba 100644
> > --- a/drivers/of/of_private.h
> > +++ b/drivers/of/of_private.h
> > @@ -161,12 +161,17 @@ struct bus_dma_region;
> >  #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
> >  int of_dma_get_range(struct device_node *np,
> >   const struct bus_dma_region **map);
> > +int of_dma_set_restricted_buffer(struct device *dev);
> >  #else
> >  static inline int of_dma_get_range(struct device_node *np,
> >   const struct bus_dma_region **map)
> >  {
> >   return -ENODEV;
> >  }
> > +static inline int of_dma_get_restricted_buffer(struct device *dev)
> > +{
> > + return -ENODEV;
> > +}
> >  #endif
> >
> >  #endif /* _LINUX_OF_PRIVATE_H */
> >
>
>
> --
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/vt-d: Consolidate duplicate cache invaliation code

2021-01-14 Thread Lu Baolu
The pasid based IOTLB and devTLB invalidation code is duplicate in
several places. Consolidate them by using the common helpers.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/pasid.c | 18 ++--
 drivers/iommu/intel/svm.c   | 55 ++---
 2 files changed, 11 insertions(+), 62 deletions(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index b92af83b79bd..f26cb6195b2c 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -456,20 +456,6 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu 
*iommu,
qi_submit_sync(iommu, , 1, 0);
 }
 
-static void
-iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)
-{
-   struct qi_desc desc;
-
-   desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
-   QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) | QI_EIOTLB_TYPE;
-   desc.qw1 = 0;
-   desc.qw2 = 0;
-   desc.qw3 = 0;
-
-   qi_submit_sync(iommu, , 1, 0);
-}
-
 static void
 devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
   struct device *dev, u32 pasid)
@@ -514,7 +500,7 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, 
struct device *dev,
clflush_cache_range(pte, sizeof(*pte));
 
pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
+   qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
 
/* Device IOTLB doesn't need to be flushed in caching mode. */
if (!cap_caching_mode(iommu->cap))
@@ -530,7 +516,7 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
 
if (cap_caching_mode(iommu->cap)) {
pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
+   qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
} else {
iommu_flush_write_buffer(iommu);
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 18a9f05df407..033b25886e57 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -123,53 +123,16 @@ static void __flush_svm_range_dev(struct intel_svm *svm,
  unsigned long address,
  unsigned long pages, int ih)
 {
-   struct qi_desc desc;
+   struct device_domain_info *info = get_domain_info(sdev->dev);
 
-   if (pages == -1) {
-   desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
-   QI_EIOTLB_DID(sdev->did) |
-   QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
-   QI_EIOTLB_TYPE;
-   desc.qw1 = 0;
-   } else {
-   int mask = ilog2(__roundup_pow_of_two(pages));
-
-   desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
-   QI_EIOTLB_DID(sdev->did) |
-   QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
-   QI_EIOTLB_TYPE;
-   desc.qw1 = QI_EIOTLB_ADDR(address) |
-   QI_EIOTLB_IH(ih) |
-   QI_EIOTLB_AM(mask);
-   }
-   desc.qw2 = 0;
-   desc.qw3 = 0;
-   qi_submit_sync(sdev->iommu, , 1, 0);
-
-   if (sdev->dev_iotlb) {
-   desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
-   QI_DEV_EIOTLB_SID(sdev->sid) |
-   QI_DEV_EIOTLB_QDEP(sdev->qdep) |
-   QI_DEIOTLB_TYPE;
-   if (pages == -1) {
-   desc.qw1 = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) |
-   QI_DEV_EIOTLB_SIZE;
-   } else if (pages > 1) {
-   /* The least significant zero bit indicates the size. 
So,
-* for example, an "address" value of 0x12345f000 will
-* flush from 0x12344 to 0x12347 (256KiB). */
-   unsigned long last = address + ((unsigned long)(pages - 
1) << VTD_PAGE_SHIFT);
-   unsigned long mask = __rounddown_pow_of_two(address ^ 
last);
-
-   desc.qw1 = QI_DEV_EIOTLB_ADDR((address & ~mask) |
-   (mask - 1)) | QI_DEV_EIOTLB_SIZE;
-   } else {
-   desc.qw1 = QI_DEV_EIOTLB_ADDR(address);
-   }
-   desc.qw2 = 0;
-   desc.qw3 = 0;
-   qi_submit_sync(sdev->iommu, , 1, 0);
-   }
+   if (WARN_ON(!pages))
+   return;
+
+   qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
+   if (info->ats_enabled)
+   qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
+svm->pasid, sdev->qdep, address,
+order_base_2(pages));
 }
 
 static void