Re: [PATCH 3/5] swiotlb: Add alloc and free APIs

2020-04-29 Thread kbuild test robot
Hi Srivatsa,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 
next-20200429]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-b002-20200429 (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/i915_scatterlist.h:12:0,
from drivers/gpu/drm/i915/i915_scatterlist.c:7:
   include/linux/swiotlb.h: In function 'swiotlb_alloc':
>> include/linux/swiotlb.h:231:9: error: 'DMA_MAPPING_ERROR' undeclared (first 
>> use in this function); did you mean 'APM_NO_ERROR'?
 return DMA_MAPPING_ERROR;
^
APM_NO_ERROR
   include/linux/swiotlb.h:231:9: note: each undeclared identifier is reported 
only once for each function it appears in

vim +231 include/linux/swiotlb.h

   226  
   227  static inline phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool,
   228  size_t alloc_size, unsigned long tbl_dma_addr,
   229  unsigned long mask)
   230  {
 > 231  return DMA_MAPPING_ERROR;
   232  }
   233  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-29 Thread Qian Cai


> On Apr 29, 2020, at 7:20 AM, Joerg Roedel  wrote:
> 
> On Mon, Apr 20, 2020 at 09:26:12AM -0400, Qian Cai wrote:
>> No dice. There could be some other races. For example,
> 
> Can you please test this branch:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=amd-iommu-fixes
> 
> It has the previous fix in it and a couple more to make sure the
> device-table is updated and flushed before increase_address_space()
> updates domain->pt_root.

It looks promising as it survives for a day’s stress testing. I’ll keep it 
running for a few days to be sure.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 16/34] iommu/vt-d: Convert to probe/release_device() call-backs

2020-04-29 Thread Lu Baolu

On 2020/4/29 21:36, Joerg Roedel wrote:

From: Joerg Roedel 

Convert the Intel IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 


Reviewed-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel-iommu.c | 67 -
  1 file changed, 6 insertions(+), 61 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b9f905a55dda..b906727f5b85 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5781,78 +5781,27 @@ static bool intel_iommu_capable(enum iommu_cap cap)
return false;
  }
  
-static int intel_iommu_add_device(struct device *dev)

+static struct iommu_device *intel_iommu_probe_device(struct device *dev)
  {
-   struct dmar_domain *dmar_domain;
-   struct iommu_domain *domain;
struct intel_iommu *iommu;
-   struct iommu_group *group;
u8 bus, devfn;
-   int ret;
  
  	iommu = device_to_iommu(dev, , );

if (!iommu)
-   return -ENODEV;
-
-   iommu_device_link(>iommu, dev);
+   return ERR_PTR(-ENODEV);
  
  	if (translation_pre_enabled(iommu))

dev->archdata.iommu = DEFER_DEVICE_DOMAIN_INFO;
  
-	group = iommu_group_get_for_dev(dev);

-
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto unlink;
-   }
-
-   iommu_group_put(group);
-
-   domain = iommu_get_domain_for_dev(dev);
-   dmar_domain = to_dmar_domain(domain);
-   if (domain->type == IOMMU_DOMAIN_DMA) {
-   if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) {
-   ret = iommu_request_dm_for_dev(dev);
-   if (ret) {
-   dmar_remove_one_dev_info(dev);
-   dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-   domain_add_dev_info(si_domain, dev);
-   dev_info(dev,
-"Device uses a private identity 
domain.\n");
-   }
-   }
-   } else {
-   if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
-   ret = iommu_request_dma_domain_for_dev(dev);
-   if (ret) {
-   dmar_remove_one_dev_info(dev);
-   dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-   if (!get_private_domain_for_dev(dev)) {
-   dev_warn(dev,
-"Failed to get a private 
domain.\n");
-   ret = -ENOMEM;
-   goto unlink;
-   }
-
-   dev_info(dev,
-"Device uses a private dma domain.\n");
-   }
-   }
-   }
-
if (device_needs_bounce(dev)) {
dev_info(dev, "Use Intel IOMMU bounce page dma_ops\n");
set_dma_ops(dev, _dma_ops);
}
  
-	return 0;

-
-unlink:
-   iommu_device_unlink(>iommu, dev);
-   return ret;
+   return >iommu;
  }
  
-static void intel_iommu_remove_device(struct device *dev)

+static void intel_iommu_release_device(struct device *dev)
  {
struct intel_iommu *iommu;
u8 bus, devfn;
@@ -5863,10 +5812,6 @@ static void intel_iommu_remove_device(struct device *dev)
  
  	dmar_remove_one_dev_info(dev);
  
-	iommu_group_remove_device(dev);

-
-   iommu_device_unlink(>iommu, dev);
-
if (device_needs_bounce(dev))
set_dma_ops(dev, NULL);
  }
@@ -6198,8 +6143,8 @@ const struct iommu_ops intel_iommu_ops = {
.map= intel_iommu_map,
.unmap  = intel_iommu_unmap,
.iova_to_phys   = intel_iommu_iova_to_phys,
-   .add_device = intel_iommu_add_device,
-   .remove_device  = intel_iommu_remove_device,
+   .probe_device   = intel_iommu_probe_device,
+   .release_device = intel_iommu_release_device,
.get_resv_regions   = intel_iommu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
.apply_resv_region  = intel_iommu_apply_resv_region,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 04/34] iommu/vt-d: Wire up iommu_ops->def_domain_type

2020-04-29 Thread Lu Baolu

On 2020/4/29 21:36, Joerg Roedel wrote:

From: Joerg Roedel 

The Intel VT-d driver already has a matching function to determine the
default domain type for a device. Wire it up in intel_iommu_ops.

Signed-off-by: Joerg Roedel 


Reviewed-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel-iommu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef0a5246700e..b9f905a55dda 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -6209,6 +6209,7 @@ const struct iommu_ops intel_iommu_ops = {
.dev_enable_feat= intel_iommu_dev_enable_feat,
.dev_disable_feat   = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
+   .def_domain_type= device_def_domain_type,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
  };
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 4/8] iommu/vt-d: Add bind guest PASID support

2020-04-29 Thread Jacob Pan
On Wed, 29 Apr 2020 16:12:01 +0200
Auger Eric  wrote:

> >> in last review Eric raised the open about what about binding the
> >> same PASID to the same pdev multiple times. We discussed that
> >> should be disallowed. Here can you check whether aux_domain is
> >> enabled on pdev to restrict multiple-binding only for
> >> sub-devices?  
> > Why aux_domain is sufficient? A pdev could have aux_domain enabled
> > but still bind pdev many times more than its mdevs.
> > 
> > Either we allow multiple bind or not.  
> 
> I tried to figure out whether binding the same PASID to the same pdev
> was meaningful. I understood it is not. If this case can be detected
> at VFIO level I am fine as well.
I will remove the multiple bind support for now. Reintroduce it when we
enable mdev.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support

2020-04-29 Thread Jacob Pan
On Wed, 29 Apr 2020 14:00:05 +0800
Lu Baolu  wrote:

> Hi Jacob,
> 
> On 2020/4/29 11:36, Jacob Pan wrote:
> > On Wed, 22 Apr 2020 16:06:10 +0800
> > Lu Baolu  wrote:
> >   
> >> When a PASID is stopped or terminated, there can be pending PRQs
> >> (requests that haven't received responses) in remapping hardware.
> >> This adds the interface to drain page requests and call it when a
> >> PASID is terminated.
> >>
> >> Signed-off-by: Jacob Pan
> >> Signed-off-by: Liu Yi L
> >> Signed-off-by: Lu Baolu
> >> ---
> >>   drivers/iommu/intel-svm.c   | 103
> >> ++-- include/linux/intel-iommu.h |
> >> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> >> index 83dc4319f661..2534641ef707 100644
> >> --- a/drivers/iommu/intel-svm.c
> >> +++ b/drivers/iommu/intel-svm.c
> >> @@ -23,6 +23,7 @@
> >>   #include "intel-pasid.h"
> >>   
> >>   static irqreturn_t prq_event_thread(int irq, void *d);
> >> +static void intel_svm_drain_prq(struct device *dev, int pasid);
> >>   
> >>   #define PRQ_ORDER 0
> >>   
> >> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu
> >> *iommu) dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
> >>dmar_writeq(iommu->reg + DMAR_PQA_REG,
> >> virt_to_phys(iommu->prq) | PRQ_ORDER);
> >> +  init_completion(>prq_complete);
> >> +
> >>return 0;
> >>   }
> >>   
> >> @@ -208,6 +211,7 @@ static void intel_mm_release(struct
> >> mmu_notifier *mn, struct mm_struct *mm) rcu_read_lock();
> >>list_for_each_entry_rcu(sdev, >devs, list) {
> >>intel_pasid_tear_down_entry(svm->iommu,
> >> sdev->dev, svm->pasid);
> >> +  intel_svm_drain_prq(sdev->dev, svm->pasid);  
> > mmu_notifier release is called in atomic context, drain_prq needs to
> > wait for completion. I tested exit path when a process crashes. I
> > got
> > 
> > [  +0.696214] BUG: sleeping function called from invalid context at
> > kernel/sched/completion.c:101 [  +0.68] in_atomic(): 1,
> > irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest
> > [  +0.46] INFO: lockdep is turned off. [  +0.02] CPU: 1
> > PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637
> > [  +0.00] Hardware name: Intel Corporation Kabylake Client
> > platform/Skylake Halo DDR4 RVP11, BIOS 04.1709050855 09/05/2017
> > [  +0.01] Call Trace: [  +0.04]  dump_stack+0x68/0x9b
> > [  +0.03]  ___might_sleep+0x229/0x250
> > [  +0.03]  wait_for_completion_timeout+0x3c/0x110
> > [  +0.03]  intel_svm_drain_prq+0x12f/0x210
> > [  +0.03]  intel_mm_release+0x73/0x110
> > [  +0.03]  __mmu_notifier_release+0x94/0x220
> > [  +0.02]  ? do_munmap+0x10/0x10
> > [  +0.02]  ? prepare_ftrace_return+0x5c/0x80
> > [  +0.03]  exit_mmap+0x156/0x1a0
> > [  +0.02]  ? mmput+0x44/0x120
> > [  +0.03]  ? exit_mmap+0x5/0x1a0
> > [  +0.02]  ? ftrace_graph_caller+0xa0/0xa0
> > [  +0.01]  mmput+0x5e/0x120
> > 
> >   
> 
> Thanks a lot!
> 
> Actually, we can't drain page requests in this mm_notifier code path,
> right? The assumptions of page request draining are that 1) the device
> driver has drained DMA requests in the device end; 2) the pasid entry
> has been marked as non-present. So we could only drain page requests
> in the unbind path.
> 
> Thought?
> 
Right, we could save the drain here. unbind will come soon when mm
exits. So even the in flight PRs come through, we could just respond
with "Invalid Response" after mm exit starts. The current code already
checks if the mm is exiting by mmget_not_zero.

> Best regards,
> baolu

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH] virtio: virtio_pool can be static

2020-04-29 Thread kbuild test robot


Signed-off-by: kbuild test robot 
---
 virtio_bounce.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
index 3de8e0eb71e48..5a68d48667c42 100644
--- a/drivers/virtio/virtio_bounce.c
+++ b/drivers/virtio/virtio_bounce.c
@@ -19,7 +19,7 @@
 static phys_addr_t bounce_buf_paddr;
 static void *bounce_buf_vaddr;
 static size_t bounce_buf_size;
-struct swiotlb_pool *virtio_pool;
+static struct swiotlb_pool *virtio_pool;
 
 #define VIRTIO_MAX_BOUNCE_SIZE (16*4096)
 
@@ -76,7 +76,7 @@ static void virtio_unmap_page(struct device *dev, dma_addr_t 
dev_addr,
size, dir, attrs);
 }
 
-size_t virtio_max_mapping_size(struct device *dev)
+static size_t virtio_max_mapping_size(struct device *dev)
 {
return VIRTIO_MAX_BOUNCE_SIZE;
 }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread kbuild test robot
Hi Srivatsa,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on xen-tip/linux-next linus/master v5.7-rc3 
next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-191-gc51a0382-dirty
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 


sparse warnings: (new ones prefixed by >>)

>> drivers/virtio/virtio_bounce.c:22:21: sparse: sparse: symbol 'virtio_pool' 
>> was not declared. Should it be static?
>> drivers/virtio/virtio_bounce.c:79:8: sparse: sparse: symbol 
>> 'virtio_max_mapping_size' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] perf/smmuv3: Allow sharing MMIO registers with the SMMU driver

2020-04-29 Thread Tuan Phan


> On Apr 29, 2020, at 12:21 AM, Jean-Philippe Brucker 
>  wrote:
> 
> On Tue, Apr 28, 2020 at 11:10:09AM -0700, Tuan Phan wrote:
>> I tested this patch on HW, however I need to add one more following change 
>> to make it works
> 
> Thanks for testing. I don't understand why you need the change below
> though, do you know which other region is conflicting with the SMMU?
> It should be displayed in the error message and /proc/iomem.
> 
> Thanks,
> Jean

The error if I don’t apply that patch:
[4.943655] arm-smmu-v3 arm-smmu-v3.0.auto: can't request region for 
resource [mem 0x3bffe000-0x3bffe001]

The output of /proc/iomem for that region:
3bffe000-3bffe001 : arm-smmu-v3.0.auto  
  
  3bffe0002000-3bffe0002fff : arm-smmu-v3-pmcg.17.auto  
  
  3bffe0012000-3bffe0012fff : arm-smmu-v3-pmcg.17.auto  
  
3bffe0042000-3bffe0042fff : arm-smmu-v3-pmcg.11.auto
  
3bffe0052000-3bffe0052fff : arm-smmu-v3-pmcg.11.auto
  
3bffe0062000-3bffe0062fff : arm-smmu-v3-pmcg.12.auto
  
3bffe0072000-3bffe0072fff : arm-smmu-v3-pmcg.12.auto
  
3bffe00a2000-3bffe00a2fff : arm-smmu-v3-pmcg.13.auto
  
3bffe00b2000-3bffe00b2fff : arm-smmu-v3-pmcg.13.auto
  
3bffe00e2000-3bffe00e2fff : arm-smmu-v3-pmcg.14.auto
  
3bffe00f2000-3bffe00f2fff : arm-smmu-v3-pmcg.14.auto
  
3bffe0102000-3bffe0102fff : arm-smmu-v3-pmcg.15.auto
  
3bffe0112000-3bffe0112fff : arm-smmu-v3-pmcg.15.auto
  
3bffe0142000-3bffe0142fff : arm-smmu-v3-pmcg.16.auto
  
3bffe0152000-3bffe0152fff : arm-smmu-v3-pmcg.16.auto

> 
>> @@ -2854,7 +2854,7 @@ static int arm_smmu_device_probe(struct 
>> platform_device *pdev)
>>}
>>ioaddr = res->start;
>> 
>> -   smmu->base = devm_ioremap_resource(dev, res);
>> +   smmu->base = devm_ioremap(dev, res->start, resource_size(res));
>>if (IS_ERR(smmu->base))
>>return PTR_ERR(smmu->base);
>> 
>> 
>>> } else {
>>> smmu_pmu->reloc_base = smmu_pmu->reg_base;
>>> }
>>> -- 
>>> 2.26.0
>>> 
>>> 
>>> ___
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>> 
>> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: iommu_iova slab eats too much memory

2020-04-29 Thread Salil Mehta
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Wednesday, April 29, 2020 3:00 PM
> To: Salil Mehta ; Bin 
> 
> On 2020-04-29 2:37 pm, Salil Mehta wrote:
> > Hi Bin,
> >
> >> From: Bin [mailto:anole1...@gmail.com]
> >> Sent: Wednesday, April 29, 2020 5:14 AM
> >> To: Salil Mehta 
> >> Hi Shlil:
> >>
> >> Thank you for your attention, and these are my answers:
> >>
> >> 1. I don't really understand what you're saying. What's the difference 
> >> between
> DMA buffer and DMA mapping?
> >> It's like a memory block pool and a memory block or something like that?
> >
> >
> > DMA Mapping: Mapping are translations/associations [IOVA<->HPA OR 
> > IOVA<->GPA(further translated
> > to HPA by Stage-2)] which are created by the NIC  driver. IOMMU hardware 
> > responsible for NIC
> > IOVA translations is populated with the mappings by the driver before 
> > submitting the DMA buffer
> > to the hardware for TX/RX.
> >
> > DMA buffers: Actual Memory allocated by the driver where data could be 
> > DMA'ed (RX'ed or TX'ed)
> >
> >
> > I think you have missed the important point I mentioned earlier:
> > If there is a leak of IOVA mapping due to dma_unmap_* not being called 
> > somewhere then at
> > certain point the throughput will drastically fall and will almost become 
> > equal to zero.
> > This is due to the exhaustion of available IOVA mapping space in the IOMMU 
> > hardware.
> 
> With 64-bit address spaces, you're still likely to run out of memory for
> the IOVA structures and pagetables before you run out of the actual
> address space that they represent.

I see. Good point and it was non-obvious.

> The slowdown comes from having to
> walk the whole the rbtree to search for free space or free a PFN, but
> depending on how the allocation pattern interacts with the caching
> mechanism that may never happen to a significant degree.


So assuming, due to above limitation of the algorithm allocation of
such free mapping space gets delayed, this should only help in more
availability of the system memory in general unless this also affects
the release of the mappings - perhaps I am missing something here?  


> > Above condition is very much different than a *memory leak* of the DMA 
> > buffer
> itself which
> > will eventually lead to OOM.
> >
> >
> > Salil.
> >
> >> FYI:
> >> I found an interesting phenomenon that it's just a small part of the 
> >> running
> hosts has this issue, even though they all
> >> have the same kernel, configuration and hardwares, I don't know if this 
> >> really
> mean something.
> 
> Another thought for a debugging sanity check is to look at the
> intel-iommu tracepoints on a misbehaving system and see whether maps vs.
> unmaps look significantly out of balance. You could probably do
> something clever with ftrace to look for that kind of pattern in teh DMA
> API calls, too.
> 
> Robin.
> 
> >>
> >>
> >> Salil Mehta  于2020年4月28日周二 下午5:17写道:
> >> Hi Bin,
> >>
> >> Few questions:
> >>
> >> 1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere
> then
> >> at certain point the throughput will drastically fall and will almost 
> >> become
> >> equal
> >> to zero. This should be due to unavailability of the mapping anymore. But
> in
> >> your
> >> case VM is getting killed so this could be actual DMA buffer leak not DMA
> mapping
> >> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in
> the
> >> IOMMU
> >> Layer for a transient reason or even due to mapping/unmapping leak.
> >>
> >> 2. Could you check if you have TSO offload enabled on Intel 82599? It will
> help
> >> in reducing the number of mappings and will take off IOVA mapping pressure
> from
> >> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of
> memory
> >> required for the buffers.
> >>
> >> 3. Also, have you checked the cpu-usage while your experiment is going on?
> >>
> >> Thanks
> >> Salil.
> >>
> >>> -Original Message-
> >>> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> >>> Robin Murphy
> >>> Sent: Friday, April 24, 2020 5:31 PM
> >>> To: Bin 
> >>> Cc: iommu@lists.linux-foundation.org
> >>> Subject: Re: iommu_iova slab eats too much memory
> >>>
> >>> On 2020-04-24 2:20 pm, Bin wrote:
>  Dear Robin:
>    Thank you for your explanation. Now, I understand that this could 
>  be
>  NIC driver's fault, but how could I confirm it? Do I have to debug the
>  driver myself?
> >>>
> >>> I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> >>> memory about an order of magnitude faster than the IOVAs alone, but it
> >>> should shed some light on whether DMA API usage looks suspicious, and
> >>> dumping the mappings should help track down the responsible driver(s).
> >>> Although the debugfs code doesn't show the stacktrace of where each
> >>> mapping was made, I guess it would be fairly simple to tweak that for a
> >>> quick way to narrow down where to start looking in an 

Re: [PATCH v12 4/8] iommu/vt-d: Add bind guest PASID support

2020-04-29 Thread Auger Eric
Hi,

On 4/27/20 10:34 PM, Jacob Pan wrote:
> On Fri, 24 Apr 2020 10:47:45 +
> "Tian, Kevin"  wrote:
> 
>>> From: Jacob Pan 
>>> Sent: Wednesday, April 22, 2020 2:53 AM
>>>
>>> When supporting guest SVA with emulated IOMMU, the guest PASID
>>> table is shadowed in VMM. Updates to guest vIOMMU PASID table
>>> will result in PASID cache flush which will be passed down to
>>> the host as bind guest PASID calls.  
>>
>> Above description is not accurate. Guest PASID table updates don't
>> 'result in' PASID cache flush automatically. What about:
>> --
>> The guest needs to invalidate the PASID cache for any update to
>> guest PASID table. Those invalidation requests are intercepted
>> by the VMM and passed down to the host as binding guest PASID
>> calls.
>> --
> It is good to add more details, thanks.
> 
>>>
>>> For the SL page tables, it will be harvested from device's
>>> default domain (request w/o PASID), or aux domain in case of
>>> mediated device.
>>>
>>> .-.  .---.
>>> |   vIOMMU|  | Guest process CR3, FL only|
>>> | |  '---'
>>> ./
>>> | PASID Entry |--- PASID cache flush -
>>> '-'   |
>>> | |   V
>>> | |CR3 in GPA
>>> '-'
>>> Guest
>>> --| Shadow |--|
>>>   vv  v
>>> Host
>>> .-.  .--.
>>> |   pIOMMU|  | Bind FL for GVA-GPA  |
>>> | |  '--'
>>> ./  |
>>> | PASID Entry | V (Nested xlate)
>>> '\.--.
>>> | |   |SL for GPA-HPA, default domain|
>>> | |   '--'
>>> '-'
>>> Where:
>>>  - FL = First level/stage one page tables
>>>  - SL = Second level/stage two page tables
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Liu, Yi L 
>>> ---
>>>  drivers/iommu/intel-iommu.c |   4 +
>>>  drivers/iommu/intel-svm.c   | 204
>>> 
>>>  include/linux/intel-iommu.h |   8 +-
>>>  include/linux/intel-svm.h   |  17 
>>>  4 files changed, 232 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c
>>> b/drivers/iommu/intel-iommu.c index 9c01e391a931..8862d6b0ef21
>>> 100644 --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -6179,6 +6179,10 @@ const struct iommu_ops intel_iommu_ops = {
>>> .dev_disable_feat   = intel_iommu_dev_disable_feat,
>>> .is_attach_deferred =
>>> intel_iommu_is_attach_deferred, .pgsize_bitmap  =
>>> INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
>>> +   .sva_bind_gpasid= intel_svm_bind_gpasid,
>>> +   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
>>> +#endif
>>>  };
>>>
>>>  static void quirk_iommu_igfx(struct pci_dev *dev)
>>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>>> index 2998418f0a38..69b2070b843d 100644
>>> --- a/drivers/iommu/intel-svm.c
>>> +++ b/drivers/iommu/intel-svm.c
>>> @@ -226,6 +226,210 @@ static LIST_HEAD(global_svm_list);
>>> list_for_each_entry((sdev), &(svm)->devs, list) \
>>> if ((d) != (sdev)->dev) {} else
>>>
>>> +static inline void intel_svm_free_if_empty(struct intel_svm *svm,
>>> u64 pasid) +{
>>> +   if (list_empty(>devs)) {
>>> +   ioasid_set_data(pasid, NULL);
>>> +   kfree(svm);
>>> +   }
>>> +}  
>>
>> Do we really need a function form instead of putting the 4 lines
>> directly after the 'out' label?
>>
> it is more readable and good for code sharing.
That's my fault: I suggested to add this helper because I noticed this
was repeated several times in the code. But adding a new goto label to
handle that job is identical.


> 
>>> +
>>> +int intel_svm_bind_gpasid(struct iommu_domain *domain, struct
>>> device *dev,
>>> + struct iommu_gpasid_bind_data *data)
>>> +{
>>> +   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>>> +   struct dmar_domain *dmar_domain;
>>> +   struct intel_svm_dev *sdev;
>>> +   struct intel_svm *svm;
>>> +   int ret = 0;
>>> +
>>> +   if (WARN_ON(!iommu) || !data)
>>> +   return -EINVAL;  
>>
>> well, why not checking !dev together?
> This is kernel API, unlike iommu and data caller fills in dev directly.
> 
>>
>>> +
>>> +   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
>>> +   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
>>> +   return -EINVAL;
>>> +
>>> +   if (dev_is_pci(dev)) {
>>> +   /* VT-d supports devices with full 20 bit PASIDs
>>> only */
>>> +   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
>>> +   return -EINVAL;
>>> +   } else {
>>> +   return -ENOTSUPP;
>>> +   }
>>> +
>>> +   /*
>>> +* We 

Re: iommu_iova slab eats too much memory

2020-04-29 Thread Robin Murphy

On 2020-04-29 2:37 pm, Salil Mehta wrote:

Hi Bin,


From: Bin [mailto:anole1...@gmail.com]
Sent: Wednesday, April 29, 2020 5:14 AM
To: Salil Mehta 
Hi Shlil:

Thank you for your attention, and these are my answers:

1. I don't really understand what you're saying. What's the difference between 
DMA buffer and DMA mapping?
It's like a memory block pool and a memory block or something like that?



DMA Mapping: Mapping are translations/associations [IOVA<->HPA OR 
IOVA<->GPA(further translated
to HPA by Stage-2)] which are created by the NIC  driver. IOMMU hardware 
responsible for NIC
IOVA translations is populated with the mappings by the driver before 
submitting the DMA buffer
to the hardware for TX/RX.

DMA buffers: Actual Memory allocated by the driver where data could be DMA'ed 
(RX'ed or TX'ed)


I think you have missed the important point I mentioned earlier:
If there is a leak of IOVA mapping due to dma_unmap_* not being called 
somewhere then at
certain point the throughput will drastically fall and will almost become equal 
to zero.
This is due to the exhaustion of available IOVA mapping space in the IOMMU 
hardware.


With 64-bit address spaces, you're still likely to run out of memory for 
the IOVA structures and pagetables before you run out of the actual 
address space that they represent. The slowdown comes from having to 
walk the whole the rbtree to search for free space or free a PFN, but 
depending on how the allocation pattern interacts with the caching 
mechanism that may never happen to a significant degree.



Above condition is very much different than a *memory leak* of the DMA buffer 
itself which
will eventually lead to OOM.
  


Salil.


FYI:
I found an interesting phenomenon that it's just a small part of the running 
hosts has this issue, even though they all
have the same kernel, configuration and hardwares, I don't know if this really 
mean something.


Another thought for a debugging sanity check is to look at the 
intel-iommu tracepoints on a misbehaving system and see whether maps vs. 
unmaps look significantly out of balance. You could probably do 
something clever with ftrace to look for that kind of pattern in teh DMA 
API calls, too.


Robin.




Salil Mehta  于2020年4月28日周二 下午5:17写道:
Hi Bin,

Few questions:

1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere then
at certain point the throughput will drastically fall and will almost become
equal
to zero. This should be due to unavailability of the mapping anymore. But in
your
case VM is getting killed so this could be actual DMA buffer leak not DMA 
mapping
leak. I doubt VM will get killed due to exhaustion of the DMA mappings in the
IOMMU
Layer for a transient reason or even due to mapping/unmapping leak.

2. Could you check if you have TSO offload enabled on Intel 82599? It will help
in reducing the number of mappings and will take off IOVA mapping pressure from
the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of 
memory
required for the buffers.

3. Also, have you checked the cpu-usage while your experiment is going on?

Thanks
Salil.


-Original Message-
From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
Robin Murphy
Sent: Friday, April 24, 2020 5:31 PM
To: Bin 
Cc: iommu@lists.linux-foundation.org
Subject: Re: iommu_iova slab eats too much memory

On 2020-04-24 2:20 pm, Bin wrote:

Dear Robin:
  Thank you for your explanation. Now, I understand that this could be
NIC driver's fault, but how could I confirm it? Do I have to debug the
driver myself?


I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
memory about an order of magnitude faster than the IOVAs alone, but it
should shed some light on whether DMA API usage looks suspicious, and
dumping the mappings should help track down the responsible driver(s).
Although the debugfs code doesn't show the stacktrace of where each
mapping was made, I guess it would be fairly simple to tweak that for a
quick way to narrow down where to start looking in an offending driver.

Robin.


Robin Murphy  于2020年4月24日周五 下午8:15写道:


On 2020-04-24 1:06 pm, Bin wrote:

I'm not familiar with the mmu stuff, so what you mean by "some driver
leaking DMA mappings", is it possible that some other kernel module like
KVM or NIC driver leads to the leaking problem instead of the iommu

module

itself?


Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
should, since I'd expect a lot of people to have noticed that. It's far
more likely that some driver is failing to call dma_unmap_* when it's
finished with a buffer - with the IOMMU disabled that would be a no-op
on x86 with a modern 64-bit-capable device, so such a latent bug could
have been easily overlooked.

Robin.


Bin  于 2020年4月24日周五 20:00写道:


Well, that's the problem! I'm assuming the iommu kernel module is

leaking

memory. But I don't know why and how.

Do you have any idea about it? Or any further information 

Re: [PATCH v12 6/8] iommu/vt-d: Add svm/sva invalidate function

2020-04-29 Thread Auger Eric
Hi Jacob,

On 4/21/20 8:52 PM, Jacob Pan wrote:
> When Shared Virtual Address (SVA) is enabled for a guest OS via
> vIOMMU, we need to provide invalidation support at IOMMU API and driver
> level. This patch adds Intel VT-d specific function to implement
> iommu passdown invalidate API for shared virtual address.
> 
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue
> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.
> 
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.
> 
> ---
> v12   - Use ratelimited prints for all user called APIs.
>   - Check for domain nesting attr
> ---
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Liu, Yi L 
Reviewed-by: Eric Auger 

Thanks

Eric
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-iommu.c | 175 
> 
>  1 file changed, 175 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 8862d6b0ef21..24de233faaf5 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5595,6 +5595,180 @@ static void intel_iommu_aux_detach_device(struct 
> iommu_domain *domain,
>   aux_domain_remove_dev(to_dmar_domain(domain), dev);
>  }
>  
> +/*
> + * 2D array for converting and sanitizing IOMMU generic TLB granularity to
> + * VT-d granularity. Invalidation is typically included in the unmap 
> operation
> + * as a result of DMA or VFIO unmap. However, for assigned devices guest
> + * owns the first level page tables. Invalidations of translation caches in 
> the
> + * guest are trapped and passed down to the host.
> + *
> + * vIOMMU in the guest will only expose first level page tables, therefore
> + * we do not support IOTLB granularity for request without PASID (second 
> level).
> + *
> + * For example, to find the VT-d granularity encoding for IOTLB
> + * type and page selective granularity within PASID:
> + * X: indexed by iommu cache type
> + * Y: indexed by enum iommu_inv_granularity
> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> + */
> +
> +const static int
> +inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> + /*
> +  * PASID based IOTLB invalidation: PASID selective (per PASID),
> +  * page selective (address granularity)
> +  */
> + {-EINVAL, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> + /* PASID based dev TLBs */
> + {-EINVAL, -EINVAL, QI_DEV_IOTLB_GRAN_PASID_SEL},
> + /* PASID cache */
> + {-EINVAL, -EINVAL, -EINVAL}
> +};
> +
> +static inline int to_vtd_granularity(int type, int granu)
> +{
> + return inv_type_granu_table[type][granu];
> +}
> +
> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> +{
> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
> +
> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
> +  * IOMMU cache invalidate API passes granu_size in bytes, and number of
> +  * granu size in contiguous memory.
> +  */
> + return order_base_2(nr_pages);
> +}
> +
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static int
> +intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct intel_iommu *iommu;
> + unsigned long flags;
> + int cache_type;
> + u8 bus, devfn;
> + u16 did, sid;
> + int ret = 0;
> + u64 size = 0;
> +
> + if (!inv_info || !dmar_domain ||
> + inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + if (!dev || !dev_is_pci(dev))
> + return -ENODEV;
> +
> + iommu = device_to_iommu(dev, , );
> + if (!iommu)
> + return -ENODEV;
> +
> + if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
> + return -EINVAL;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + spin_lock(>lock);
> + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
> + if (!info) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + did = dmar_domain->iommu_did[iommu->seq_id];
> + sid = PCI_DEVID(bus, devfn);
> +
> + /* Size is only valid in non-PASID selective invalidation */
> + if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
> + size = to_vtd_size(inv_info->addr_info.granule_size,
> +inv_info->addr_info.nb_granules);
> +
> + for_each_set_bit(cache_type,
> +  (unsigned long *)_info->cache,
> +  

[PATCH v3 17/34] iommu/arm-smmu: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the arm-smmu and arm-smmu-v3 drivers to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code does the
group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/arm-smmu-v3.c | 38 ++--
 drivers/iommu/arm-smmu.c| 39 ++---
 2 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82508730feb7..42e1ee7e5197 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2914,27 +2914,26 @@ static bool arm_smmu_sid_in_range(struct 
arm_smmu_device *smmu, u32 sid)
 
 static struct iommu_ops arm_smmu_ops;
 
-static int arm_smmu_add_device(struct device *dev)
+static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 {
int i, ret;
struct arm_smmu_device *smmu;
struct arm_smmu_master *master;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct iommu_group *group;
 
if (!fwspec || fwspec->ops != _smmu_ops)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
-   return -EBUSY;
+   return ERR_PTR(-EBUSY);
 
smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
if (!smmu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
master = kzalloc(sizeof(*master), GFP_KERNEL);
if (!master)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
master->dev = dev;
master->smmu = smmu;
@@ -2975,30 +2974,15 @@ static int arm_smmu_add_device(struct device *dev)
master->ssid_bits = min_t(u8, master->ssid_bits,
  CTXDESC_LINEAR_CDMAX);
 
-   ret = iommu_device_link(>iommu, dev);
-   if (ret)
-   goto err_disable_pasid;
+   return >iommu;
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto err_unlink;
-   }
-
-   iommu_group_put(group);
-   return 0;
-
-err_unlink:
-   iommu_device_unlink(>iommu, dev);
-err_disable_pasid:
-   arm_smmu_disable_pasid(master);
 err_free_master:
kfree(master);
dev_iommu_priv_set(dev, NULL);
-   return ret;
+   return ERR_PTR(ret);
 }
 
-static void arm_smmu_remove_device(struct device *dev)
+static void arm_smmu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_master *master;
@@ -3010,8 +2994,6 @@ static void arm_smmu_remove_device(struct device *dev)
master = dev_iommu_priv_get(dev);
smmu = master->smmu;
arm_smmu_detach_dev(master);
-   iommu_group_remove_device(dev);
-   iommu_device_unlink(>iommu, dev);
arm_smmu_disable_pasid(master);
kfree(master);
iommu_fwspec_free(dev);
@@ -3138,8 +3120,8 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-   .add_device = arm_smmu_add_device,
-   .remove_device  = arm_smmu_remove_device,
+   .probe_device   = arm_smmu_probe_device,
+   .release_device = arm_smmu_release_device,
.device_group   = arm_smmu_device_group,
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a6a5796e9c41..e622f4e33379 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -220,7 +220,7 @@ static int arm_smmu_register_legacy_master(struct device 
*dev,
  * With the legacy DT binding in play, we have no guarantees about
  * probe order, but then we're also not doing default domains, so we can
  * delay setting bus ops until we're sure every possible SMMU is ready,
- * and that way ensure that no add_device() calls get missed.
+ * and that way ensure that no probe_device() calls get missed.
  */
 static int arm_smmu_legacy_bus_init(void)
 {
@@ -1062,7 +1062,6 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
struct arm_smmu_device *smmu = cfg->smmu;
struct arm_smmu_smr *smrs = smmu->smrs;
-   struct iommu_group *group;
int i, idx, ret;
 
mutex_lock(>stream_map_mutex);
@@ -1090,18 +1089,9 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
cfg->smendx[i] = (s16)idx;
}
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto out_err;
-   }
-   

[PATCH v3 24/34] iommu/qcom: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the QCOM IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/qcom_iommu.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 0e2a96467767..054e476ebd49 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -524,14 +524,13 @@ static bool qcom_iommu_capable(enum iommu_cap cap)
}
 }
 
-static int qcom_iommu_add_device(struct device *dev)
+static struct iommu_device *qcom_iommu_probe_device(struct device *dev)
 {
struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
-   struct iommu_group *group;
struct device_link *link;
 
if (!qcom_iommu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
/*
 * Establish the link between iommu and master, so that the
@@ -542,28 +541,19 @@ static int qcom_iommu_add_device(struct device *dev)
if (!link) {
dev_err(qcom_iommu->dev, "Unable to create device link between 
%s and %s\n",
dev_name(qcom_iommu->dev), dev_name(dev));
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
}
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-   iommu_device_link(_iommu->iommu, dev);
-
-   return 0;
+   return _iommu->iommu;
 }
 
-static void qcom_iommu_remove_device(struct device *dev)
+static void qcom_iommu_release_device(struct device *dev)
 {
struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
 
if (!qcom_iommu)
return;
 
-   iommu_device_unlink(_iommu->iommu, dev);
-   iommu_group_remove_device(dev);
iommu_fwspec_free(dev);
 }
 
@@ -619,8 +609,8 @@ static const struct iommu_ops qcom_iommu_ops = {
.flush_iotlb_all = qcom_iommu_flush_iotlb_all,
.iotlb_sync = qcom_iommu_iotlb_sync,
.iova_to_phys   = qcom_iommu_iova_to_phys,
-   .add_device = qcom_iommu_add_device,
-   .remove_device  = qcom_iommu_remove_device,
+   .probe_device   = qcom_iommu_probe_device,
+   .release_device = qcom_iommu_release_device,
.device_group   = generic_device_group,
.of_xlate   = qcom_iommu_of_xlate,
.pgsize_bitmap  = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 31/34] iommu/exynos: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Exynos IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/exynos-iommu.c | 26 ++
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 09cdd163560a..60c8a56e4a3f 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1235,19 +1235,13 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct 
iommu_domain *iommu_domain,
return phys;
 }
 
-static int exynos_iommu_add_device(struct device *dev)
+static struct iommu_device *exynos_iommu_probe_device(struct device *dev)
 {
struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct sysmmu_drvdata *data;
-   struct iommu_group *group;
 
if (!has_sysmmu(dev))
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-
-   if (IS_ERR(group))
-   return PTR_ERR(group);
+   return ERR_PTR(-ENODEV);
 
list_for_each_entry(data, >controllers, owner_node) {
/*
@@ -1259,17 +1253,15 @@ static int exynos_iommu_add_device(struct device *dev)
 DL_FLAG_STATELESS |
 DL_FLAG_PM_RUNTIME);
}
-   iommu_group_put(group);
 
/* There is always at least one entry, see exynos_iommu_of_xlate() */
data = list_first_entry(>controllers,
struct sysmmu_drvdata, owner_node);
-   iommu_device_link(>iommu, dev);
 
-   return 0;
+   return >iommu;
 }
 
-static void exynos_iommu_remove_device(struct device *dev)
+static void exynos_iommu_release_device(struct device *dev)
 {
struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct sysmmu_drvdata *data;
@@ -1287,15 +1279,9 @@ static void exynos_iommu_remove_device(struct device 
*dev)
iommu_group_put(group);
}
}
-   iommu_group_remove_device(dev);
 
list_for_each_entry(data, >controllers, owner_node)
device_link_del(data->link);
-
-   /* There is always at least one entry, see exynos_iommu_of_xlate() */
-   data = list_first_entry(>controllers,
-   struct sysmmu_drvdata, owner_node);
-   iommu_device_unlink(>iommu, dev);
 }
 
 static int exynos_iommu_of_xlate(struct device *dev,
@@ -1341,8 +1327,8 @@ static const struct iommu_ops exynos_iommu_ops = {
.unmap = exynos_iommu_unmap,
.iova_to_phys = exynos_iommu_iova_to_phys,
.device_group = generic_device_group,
-   .add_device = exynos_iommu_add_device,
-   .remove_device = exynos_iommu_remove_device,
+   .probe_device = exynos_iommu_probe_device,
+   .release_device = exynos_iommu_release_device,
.pgsize_bitmap = SECT_SIZE | LPAGE_SIZE | SPAGE_SIZE,
.of_xlate = exynos_iommu_of_xlate,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 15/34] iommu/amd: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the AMD IOMMU Driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 71 ---
 1 file changed, 22 insertions(+), 49 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 0b4b4faa876d..c30367413683 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -343,21 +343,9 @@ static bool check_device(struct device *dev)
return true;
 }
 
-static void init_iommu_group(struct device *dev)
-{
-   struct iommu_group *group;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return;
-
-   iommu_group_put(group);
-}
-
 static int iommu_init_device(struct device *dev)
 {
struct iommu_dev_data *dev_data;
-   struct amd_iommu *iommu;
int devid;
 
if (dev->archdata.iommu)
@@ -367,8 +355,6 @@ static int iommu_init_device(struct device *dev)
if (devid < 0)
return devid;
 
-   iommu = amd_iommu_rlookup_table[devid];
-
dev_data = find_dev_data(devid);
if (!dev_data)
return -ENOMEM;
@@ -391,8 +377,6 @@ static int iommu_init_device(struct device *dev)
 
dev->archdata.iommu = dev_data;
 
-   iommu_device_link(>iommu, dev);
-
return 0;
 }
 
@@ -410,7 +394,7 @@ static void iommu_ignore_device(struct device *dev)
setup_aliases(dev);
 }
 
-static void iommu_uninit_device(struct device *dev)
+static void amd_iommu_uninit_device(struct device *dev)
 {
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu;
@@ -429,13 +413,6 @@ static void iommu_uninit_device(struct device *dev)
if (dev_data->domain)
detach_device(dev);
 
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
-
-   /* Remove dma-ops */
-   dev->dma_ops = NULL;
-
/*
 * We keep dev_data around for unplugged devices and reuse it when the
 * device is re-plugged - not doing so would introduce a ton of races.
@@ -2152,55 +2129,50 @@ static void detach_device(struct device *dev)
spin_unlock_irqrestore(>lock, flags);
 }
 
-static int amd_iommu_add_device(struct device *dev)
+static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 {
-   struct iommu_dev_data *dev_data;
-   struct iommu_domain *domain;
+   struct iommu_device *iommu_dev;
struct amd_iommu *iommu;
int ret, devid;
 
-   if (get_dev_data(dev))
-   return 0;
-
if (!check_device(dev))
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
devid = get_device_id(dev);
if (devid < 0)
-   return devid;
+   return ERR_PTR(devid);
 
iommu = amd_iommu_rlookup_table[devid];
 
+   if (get_dev_data(dev))
+   return >iommu;
+
ret = iommu_init_device(dev);
if (ret) {
if (ret != -ENOTSUPP)
dev_err(dev, "Failed to initialize - trying to proceed 
anyway\n");
-
+   iommu_dev = ERR_PTR(ret);
iommu_ignore_device(dev);
-   dev->dma_ops = NULL;
-   goto out;
+   } else {
+   iommu_dev = >iommu;
}
-   init_iommu_group(dev);
 
-   dev_data = get_dev_data(dev);
+   iommu_completion_wait(iommu);
 
-   BUG_ON(!dev_data);
+   return iommu_dev;
+}
 
-   if (dev_data->iommu_v2)
-   iommu_request_dm_for_dev(dev);
+static void amd_iommu_probe_finalize(struct device *dev)
+{
+   struct iommu_domain *domain;
 
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
-
-out:
-   iommu_completion_wait(iommu);
-
-   return 0;
 }
 
-static void amd_iommu_remove_device(struct device *dev)
+static void amd_iommu_release_device(struct device *dev)
 {
struct amd_iommu *iommu;
int devid;
@@ -2214,7 +2186,7 @@ static void amd_iommu_remove_device(struct device *dev)
 
iommu = amd_iommu_rlookup_table[devid];
 
-   iommu_uninit_device(dev);
+   amd_iommu_uninit_device(dev);
iommu_completion_wait(iommu);
 }
 
@@ -2687,8 +2659,9 @@ const struct iommu_ops amd_iommu_ops = {
.map = amd_iommu_map,
.unmap = amd_iommu_unmap,
.iova_to_phys = amd_iommu_iova_to_phys,
-   .add_device = amd_iommu_add_device,
-   .remove_device = amd_iommu_remove_device,
+   .probe_device = amd_iommu_probe_device,
+   .release_device = amd_iommu_release_device,
+   .probe_finalize = amd_iommu_probe_finalize,
.device_group = 

[PATCH v3 33/34] iommu: Move more initialization to __iommu_probe_device()

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Move the calls to dev_iommu_get() and try_module_get() into
__iommu_probe_device(), so that the callers don't have to do it on
their own.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 47 +--
 1 file changed, 18 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7f99e5ae432c..48a95f7d7999 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -194,9 +194,19 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
struct iommu_group *group;
int ret;
 
+   if (!dev_iommu_get(dev))
+   return -ENOMEM;
+
+   if (!try_module_get(ops->owner)) {
+   ret = -EINVAL;
+   goto err_free;
+   }
+
iommu_dev = ops->probe_device(dev);
-   if (IS_ERR(iommu_dev))
-   return PTR_ERR(iommu_dev);
+   if (IS_ERR(iommu_dev)) {
+   ret = PTR_ERR(iommu_dev);
+   goto out_module_put;
+   }
 
dev->iommu->iommu_dev = iommu_dev;
 
@@ -217,6 +227,12 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
 out_release:
ops->release_device(dev);
 
+out_module_put:
+   module_put(ops->owner);
+
+err_free:
+   dev_iommu_free(dev);
+
return ret;
 }
 
@@ -226,14 +242,6 @@ int iommu_probe_device(struct device *dev)
struct iommu_group *group;
int ret;
 
-   if (!dev_iommu_get(dev))
-   return -ENOMEM;
-
-   if (!try_module_get(ops->owner)) {
-   ret = -EINVAL;
-   goto err_out;
-   }
-
ret = __iommu_probe_device(dev, NULL);
if (ret)
goto err_out;
@@ -1532,14 +1540,10 @@ struct iommu_domain *iommu_group_default_domain(struct 
iommu_group *group)
 
 static int probe_iommu_group(struct device *dev, void *data)
 {
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
struct list_head *group_list = data;
struct iommu_group *group;
int ret;
 
-   if (!dev_iommu_get(dev))
-   return -ENOMEM;
-
/* Device is probed already if in a group */
group = iommu_group_get(dev);
if (group) {
@@ -1547,22 +1551,7 @@ static int probe_iommu_group(struct device *dev, void 
*data)
return 0;
}
 
-   if (!try_module_get(ops->owner)) {
-   ret = -EINVAL;
-   goto err_free_dev_iommu;
-   }
-
ret = __iommu_probe_device(dev, group_list);
-   if (ret)
-   goto err_module_put;
-
-   return 0;
-
-err_module_put:
-   module_put(ops->owner);
-err_free_dev_iommu:
-   dev_iommu_free(dev);
-
if (ret == -ENODEV)
ret = 0;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 32/34] iommu: Remove add_device()/remove_device() code-paths

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

All drivers are converted to use the probe/release_device()
call-backs, so the add_device/remove_device() pointers are unused and
the code using them can be removed.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 158 ++
 include/linux/iommu.h |   4 --
 2 files changed, 38 insertions(+), 124 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 397fd4fd0c32..7f99e5ae432c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -220,12 +220,20 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
return ret;
 }
 
-static int __iommu_probe_device_helper(struct device *dev)
+int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
int ret;
 
+   if (!dev_iommu_get(dev))
+   return -ENOMEM;
+
+   if (!try_module_get(ops->owner)) {
+   ret = -EINVAL;
+   goto err_out;
+   }
+
ret = __iommu_probe_device(dev, NULL);
if (ret)
goto err_out;
@@ -259,75 +267,23 @@ static int __iommu_probe_device_helper(struct device *dev)
 
 err_release:
iommu_release_device(dev);
+
 err_out:
return ret;
 
 }
 
-int iommu_probe_device(struct device *dev)
+void iommu_release_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
-   struct iommu_group *group;
-   int ret;
-
-   WARN_ON(dev->iommu_group);
-
-   if (!ops)
-   return -EINVAL;
-
-   if (!dev_iommu_get(dev))
-   return -ENOMEM;
-
-   if (!try_module_get(ops->owner)) {
-   ret = -EINVAL;
-   goto err_free_dev_param;
-   }
-
-   if (ops->probe_device)
-   return __iommu_probe_device_helper(dev);
-
-   ret = ops->add_device(dev);
-   if (ret)
-   goto err_module_put;
 
-   group = iommu_group_get(dev);
-   iommu_create_device_direct_mappings(group, dev);
-   iommu_group_put(group);
-
-   if (ops->probe_finalize)
-   ops->probe_finalize(dev);
-
-   return 0;
-
-err_module_put:
-   module_put(ops->owner);
-err_free_dev_param:
-   dev_iommu_free(dev);
-   return ret;
-}
-
-static void __iommu_release_device(struct device *dev)
-{
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   if (!dev->iommu)
+   return;
 
iommu_device_unlink(dev->iommu->iommu_dev, dev);
-
iommu_group_remove_device(dev);
 
ops->release_device(dev);
-}
-
-void iommu_release_device(struct device *dev)
-{
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
-
-   if (!dev->iommu)
-   return;
-
-   if (ops->release_device)
-   __iommu_release_device(dev);
-   else if (dev->iommu_group)
-   ops->remove_device(dev);
 
module_put(ops->owner);
dev_iommu_free(dev);
@@ -1560,23 +1516,6 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
if (ret)
goto out_put_group;
 
-   /*
-* Try to allocate a default domain - needs support from the
-* IOMMU driver. There are still some drivers which don't support
-* default domains, so the return value is not yet checked. Only
-* allocate the domain here when the driver still has the
-* add_device/remove_device call-backs implemented.
-*/
-   if (!ops->probe_device) {
-   iommu_alloc_default_domain(dev);
-
-   if (group->default_domain)
-   ret = __iommu_attach_device(group->default_domain, dev);
-
-   if (ret)
-   goto out_put_group;
-   }
-
return group;
 
 out_put_group:
@@ -1591,21 +1530,6 @@ struct iommu_domain *iommu_group_default_domain(struct 
iommu_group *group)
return group->default_domain;
 }
 
-static int add_iommu_group(struct device *dev, void *data)
-{
-   int ret = iommu_probe_device(dev);
-
-   /*
-* We ignore -ENODEV errors for now, as they just mean that the
-* device is not translated by an IOMMU. We still care about
-* other errors and fail to initialize when they happen.
-*/
-   if (ret == -ENODEV)
-   ret = 0;
-
-   return ret;
-}
-
 static int probe_iommu_group(struct device *dev, void *data)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
@@ -1793,47 +1717,41 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group)
 
 int bus_iommu_probe(struct bus_type *bus)
 {
-   const struct iommu_ops *ops = bus->iommu_ops;
+   struct iommu_group *group, *next;
+   LIST_HEAD(group_list);
int ret;
 
-   if (ops->probe_device) {
-   struct iommu_group *group, *next;
-  

[PATCH v3 06/34] iommu/amd: Return -ENODEV in add_device when device is not handled by IOMMU

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

When check_device() fails on the device, it is not handled by the
IOMMU and amd_iommu_add_device() needs to return -ENODEV.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 504f2db75eda..3e0d27f7622e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2157,9 +2157,12 @@ static int amd_iommu_add_device(struct device *dev)
struct amd_iommu *iommu;
int ret, devid;
 
-   if (!check_device(dev) || get_dev_data(dev))
+   if (get_dev_data(dev))
return 0;
 
+   if (!check_device(dev))
+   return -ENODEV;
+
devid = get_device_id(dev);
if (devid < 0)
return devid;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 34/34] iommu: Unexport iommu_group_get_for_dev()

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

The function is now only used in IOMMU core code and shouldn't be used
outside of it anyway, so remove the export for it.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 4 ++--
 include/linux/iommu.h | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 48a95f7d7999..a9e5618cde80 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -91,6 +91,7 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 struct iommu_group *group);
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
   struct device *dev);
+static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -1500,7 +1501,7 @@ static int iommu_alloc_default_domain(struct device *dev)
  * to the returned IOMMU group, which will already include the provided
  * device.  The reference should be released with iommu_group_put().
  */
-struct iommu_group *iommu_group_get_for_dev(struct device *dev)
+static struct iommu_group *iommu_group_get_for_dev(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
@@ -1531,7 +1532,6 @@ struct iommu_group *iommu_group_get_for_dev(struct device 
*dev)
 
return ERR_PTR(ret);
 }
-EXPORT_SYMBOL(iommu_group_get_for_dev);
 
 struct iommu_domain *iommu_group_default_domain(struct iommu_group *group)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index dd076366383f..7cfd2dddb49d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -527,7 +527,6 @@ extern int iommu_page_response(struct device *dev,
   struct iommu_page_response *msg);
 
 extern int iommu_group_id(struct iommu_group *group);
-extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
 
 extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 26/34] iommu/tegra: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Tegra IOMMU drivers to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/tegra-gart.c | 24 ++--
 drivers/iommu/tegra-smmu.c | 31 ---
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index db6559e8336f..5fbdff6ff41a 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -243,28 +243,16 @@ static bool gart_iommu_capable(enum iommu_cap cap)
return false;
 }
 
-static int gart_iommu_add_device(struct device *dev)
+static struct iommu_device *gart_iommu_probe_device(struct device *dev)
 {
-   struct iommu_group *group;
-
if (!dev_iommu_fwspec_get(dev))
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
+   return ERR_PTR(-ENODEV);
 
-   iommu_device_link(_handle->iommu, dev);
-
-   return 0;
+   return _handle->iommu;
 }
 
-static void gart_iommu_remove_device(struct device *dev)
+static void gart_iommu_release_device(struct device *dev)
 {
-   iommu_group_remove_device(dev);
-   iommu_device_unlink(_handle->iommu, dev);
 }
 
 static int gart_iommu_of_xlate(struct device *dev,
@@ -290,8 +278,8 @@ static const struct iommu_ops gart_iommu_ops = {
.domain_free= gart_iommu_domain_free,
.attach_dev = gart_iommu_attach_dev,
.detach_dev = gart_iommu_detach_dev,
-   .add_device = gart_iommu_add_device,
-   .remove_device  = gart_iommu_remove_device,
+   .probe_device   = gart_iommu_probe_device,
+   .release_device = gart_iommu_release_device,
.device_group   = generic_device_group,
.map= gart_iommu_map,
.unmap  = gart_iommu_unmap,
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 63a147b623e6..7426b7666e2b 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -757,11 +757,10 @@ static int tegra_smmu_configure(struct tegra_smmu *smmu, 
struct device *dev,
return 0;
 }
 
-static int tegra_smmu_add_device(struct device *dev)
+static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 {
struct device_node *np = dev->of_node;
struct tegra_smmu *smmu = NULL;
-   struct iommu_group *group;
struct of_phandle_args args;
unsigned int index = 0;
int err;
@@ -774,7 +773,7 @@ static int tegra_smmu_add_device(struct device *dev)
of_node_put(args.np);
 
if (err < 0)
-   return err;
+   return ERR_PTR(err);
 
/*
 * Only a single IOMMU master interface is currently
@@ -783,8 +782,6 @@ static int tegra_smmu_add_device(struct device *dev)
 */
dev->archdata.iommu = smmu;
 
-   iommu_device_link(>iommu, dev);
-
break;
}
 
@@ -793,26 +790,14 @@ static int tegra_smmu_add_device(struct device *dev)
}
 
if (!smmu)
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
+   return ERR_PTR(-ENODEV);
 
-   return 0;
+   return >iommu;
 }
 
-static void tegra_smmu_remove_device(struct device *dev)
+static void tegra_smmu_release_device(struct device *dev)
 {
-   struct tegra_smmu *smmu = dev->archdata.iommu;
-
-   if (smmu)
-   iommu_device_unlink(>iommu, dev);
-
dev->archdata.iommu = NULL;
-   iommu_group_remove_device(dev);
 }
 
 static const struct tegra_smmu_group_soc *
@@ -895,8 +880,8 @@ static const struct iommu_ops tegra_smmu_ops = {
.domain_free = tegra_smmu_domain_free,
.attach_dev = tegra_smmu_attach_dev,
.detach_dev = tegra_smmu_detach_dev,
-   .add_device = tegra_smmu_add_device,
-   .remove_device = tegra_smmu_remove_device,
+   .probe_device = tegra_smmu_probe_device,
+   .release_device = tegra_smmu_release_device,
.device_group = tegra_smmu_device_group,
.map = tegra_smmu_map,
.unmap = tegra_smmu_unmap,
@@ -1015,7 +1000,7 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
 * value. However the IOMMU registration process will attempt to add
 * all devices to the IOMMU when bus_set_iommu() is called. In order
 * not to rely on global variables to track the IOMMU instance, we
-* set it here so that it can be looked up from the .add_device()
+* set it here so that it can be looked up from the 

[PATCH v3 30/34] iommu/exynos: Use first SYSMMU in controllers list for IOMMU core

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

On Exynos platforms there can be more than one SYSMMU (IOMMU) for one
DMA master device. Since the IOMMU core code expects only one hardware
IOMMU, use the first SYSMMU in the list.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/exynos-iommu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 186ff5cc975c..09cdd163560a 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1261,6 +1261,11 @@ static int exynos_iommu_add_device(struct device *dev)
}
iommu_group_put(group);
 
+   /* There is always at least one entry, see exynos_iommu_of_xlate() */
+   data = list_first_entry(>controllers,
+   struct sysmmu_drvdata, owner_node);
+   iommu_device_link(>iommu, dev);
+
return 0;
 }
 
@@ -1286,6 +1291,11 @@ static void exynos_iommu_remove_device(struct device 
*dev)
 
list_for_each_entry(data, >controllers, owner_node)
device_link_del(data->link);
+
+   /* There is always at least one entry, see exynos_iommu_of_xlate() */
+   data = list_first_entry(>controllers,
+   struct sysmmu_drvdata, owner_node);
+   iommu_device_unlink(>iommu, dev);
 }
 
 static int exynos_iommu_of_xlate(struct device *dev,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 27/34] iommu/renesas: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Renesas IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/ipmmu-vmsa.c | 60 +-
 1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 310cf09feea3..fb7e702dee23 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -805,24 +805,8 @@ static int ipmmu_of_xlate(struct device *dev,
 static int ipmmu_init_arm_mapping(struct device *dev)
 {
struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
-   struct iommu_group *group;
int ret;
 
-   /* Create a device group and add the device to it. */
-   group = iommu_group_alloc();
-   if (IS_ERR(group)) {
-   dev_err(dev, "Failed to allocate IOMMU group\n");
-   return PTR_ERR(group);
-   }
-
-   ret = iommu_group_add_device(group, dev);
-   iommu_group_put(group);
-
-   if (ret < 0) {
-   dev_err(dev, "Failed to add device to IPMMU group\n");
-   return ret;
-   }
-
/*
 * Create the ARM mapping, used by the ARM DMA mapping core to allocate
 * VAs. This will allocate a corresponding IOMMU domain.
@@ -856,48 +840,39 @@ static int ipmmu_init_arm_mapping(struct device *dev)
return 0;
 
 error:
-   iommu_group_remove_device(dev);
if (mmu->mapping)
arm_iommu_release_mapping(mmu->mapping);
 
return ret;
 }
 
-static int ipmmu_add_device(struct device *dev)
+static struct iommu_device *ipmmu_probe_device(struct device *dev)
 {
struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
-   struct iommu_group *group;
-   int ret;
 
/*
 * Only let through devices that have been verified in xlate()
 */
if (!mmu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
-   if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)) {
-   ret = ipmmu_init_arm_mapping(dev);
-   if (ret)
-   return ret;
-   } else {
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
+   return >iommu;
+}
 
-   iommu_group_put(group);
-   }
+static void ipmmu_probe_finalize(struct device *dev)
+{
+   int ret = 0;
 
-   iommu_device_link(>iommu, dev);
-   return 0;
+   if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA))
+   ret = ipmmu_init_arm_mapping(dev);
+
+   if (ret)
+   dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not 
work\n");
 }
 
-static void ipmmu_remove_device(struct device *dev)
+static void ipmmu_release_device(struct device *dev)
 {
-   struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
-
-   iommu_device_unlink(>iommu, dev);
arm_iommu_detach_device(dev);
-   iommu_group_remove_device(dev);
 }
 
 static struct iommu_group *ipmmu_find_group(struct device *dev)
@@ -925,9 +900,14 @@ static const struct iommu_ops ipmmu_ops = {
.flush_iotlb_all = ipmmu_flush_iotlb_all,
.iotlb_sync = ipmmu_iotlb_sync,
.iova_to_phys = ipmmu_iova_to_phys,
-   .add_device = ipmmu_add_device,
-   .remove_device = ipmmu_remove_device,
+   .probe_device = ipmmu_probe_device,
+   .release_device = ipmmu_release_device,
+   .probe_finalize = ipmmu_probe_finalize,
+#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
+   .device_group = generic_device_group,
+#else
.device_group = ipmmu_find_group,
+#endif
.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K,
.of_xlate = ipmmu_of_xlate,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 08/34] iommu: Move default domain allocation to iommu_probe_device()

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Well, not really. The call to iommu_alloc_default_domain() in
iommu_group_get_for_dev() has to stay around as long as there are
IOMMU drivers using the add/remove_device() call-backs instead of
probe/release_device().

Those drivers expect that iommu_group_get_for_dev() returns the device
attached to a group and the group set up with a default domain (and
the device attached to the groups current domain).

But when all drivers are converted this compatability mess can be
removed.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 102 +-
 1 file changed, 71 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6cfe7799dc8c..7a385c18e1a5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -79,6 +79,16 @@ static bool iommu_cmd_line_dma_api(void)
return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API);
 }
 
+static int iommu_alloc_default_domain(struct device *dev);
+static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
+unsigned type);
+static int __iommu_attach_device(struct iommu_domain *domain,
+struct device *dev);
+static int __iommu_attach_group(struct iommu_domain *domain,
+   struct iommu_group *group);
+static void __iommu_detach_group(struct iommu_domain *domain,
+struct iommu_group *group);
+
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
__ATTR(_name, _mode, _show, _store)
@@ -221,10 +231,29 @@ int iommu_probe_device(struct device *dev)
goto err_free_dev_param;
}
 
-   if (ops->probe_device)
+   if (ops->probe_device) {
+   struct iommu_group *group;
+
ret = __iommu_probe_device(dev);
-   else
+
+   /*
+* Try to allocate a default domain - needs support from the
+* IOMMU driver. There are still some drivers which don't
+* support default domains, so the return value is not yet
+* checked.
+*/
+   if (!ret)
+   iommu_alloc_default_domain(dev);
+
+   group = iommu_group_get(dev);
+   if (group && group->default_domain) {
+   ret = __iommu_attach_device(group->default_domain, dev);
+   iommu_group_put(group);
+   }
+
+   } else {
ret = ops->add_device(dev);
+   }
 
if (ret)
goto err_module_put;
@@ -268,15 +297,6 @@ void iommu_release_device(struct device *dev)
dev_iommu_free(dev);
 }
 
-static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
-unsigned type);
-static int __iommu_attach_device(struct iommu_domain *domain,
-struct device *dev);
-static int __iommu_attach_group(struct iommu_domain *domain,
-   struct iommu_group *group);
-static void __iommu_detach_group(struct iommu_domain *domain,
-struct iommu_group *group);
-
 static int __init iommu_set_def_domain_type(char *str)
 {
bool pt;
@@ -1423,25 +1443,18 @@ static int iommu_get_def_domain_type(struct device *dev)
return (type == 0) ? iommu_def_domain_type : type;
 }
 
-static int iommu_alloc_default_domain(struct device *dev,
- struct iommu_group *group)
+static int iommu_group_alloc_default_domain(struct bus_type *bus,
+   struct iommu_group *group,
+   unsigned int type)
 {
struct iommu_domain *dom;
-   unsigned int type;
-
-   if (group->default_domain)
-   return 0;
 
-   type = iommu_get_def_domain_type(dev);
-
-   dom = __iommu_domain_alloc(dev->bus, type);
+   dom = __iommu_domain_alloc(bus, type);
if (!dom && type != IOMMU_DOMAIN_DMA) {
-   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
-   if (dom) {
-   dev_warn(dev,
-"failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
-type);
-   }
+   dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);
+   if (dom)
+   pr_warn("Failed to allocate default IOMMU domain of 
type %u for group %s - Falling back to IOMMU_DOMAIN_DMA",
+   type, group->name);
}
 
if (!dom)
@@ -1461,6 +1474,23 @@ static int iommu_alloc_default_domain(struct device *dev,
return 0;
 }
 

[PATCH v3 18/34] iommu/pamu: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the PAMU IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/fsl_pamu_domain.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 06828e2698d5..928d37771ece 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -1016,25 +1016,13 @@ static struct iommu_group *fsl_pamu_device_group(struct 
device *dev)
return group;
 }
 
-static int fsl_pamu_add_device(struct device *dev)
+static struct iommu_device *fsl_pamu_probe_device(struct device *dev)
 {
-   struct iommu_group *group;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-
-   iommu_device_link(_iommu, dev);
-
-   return 0;
+   return _iommu;
 }
 
-static void fsl_pamu_remove_device(struct device *dev)
+static void fsl_pamu_release_device(struct device *dev)
 {
-   iommu_device_unlink(_iommu, dev);
-   iommu_group_remove_device(dev);
 }
 
 static const struct iommu_ops fsl_pamu_ops = {
@@ -1048,8 +1036,8 @@ static const struct iommu_ops fsl_pamu_ops = {
.iova_to_phys   = fsl_pamu_iova_to_phys,
.domain_set_attr = fsl_pamu_set_domain_attr,
.domain_get_attr = fsl_pamu_get_domain_attr,
-   .add_device = fsl_pamu_add_device,
-   .remove_device  = fsl_pamu_remove_device,
+   .probe_device   = fsl_pamu_probe_device,
+   .release_device = fsl_pamu_release_device,
.device_group   = fsl_pamu_device_group,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 25/34] iommu/rockchip: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Rockchip IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/rockchip-iommu.c | 26 +++---
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index b33cdd5aad81..d25c2486ca07 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1054,40 +1054,28 @@ static void rk_iommu_domain_free(struct iommu_domain 
*domain)
kfree(rk_domain);
 }
 
-static int rk_iommu_add_device(struct device *dev)
+static struct iommu_device *rk_iommu_probe_device(struct device *dev)
 {
-   struct iommu_group *group;
-   struct rk_iommu *iommu;
struct rk_iommudata *data;
+   struct rk_iommu *iommu;
 
data = dev->archdata.iommu;
if (!data)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
iommu = rk_iommu_from_dev(dev);
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-   iommu_group_put(group);
-
-   iommu_device_link(>iommu, dev);
data->link = device_link_add(dev, iommu->dev,
 DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
 
-   return 0;
+   return >iommu;
 }
 
-static void rk_iommu_remove_device(struct device *dev)
+static void rk_iommu_release_device(struct device *dev)
 {
-   struct rk_iommu *iommu;
struct rk_iommudata *data = dev->archdata.iommu;
 
-   iommu = rk_iommu_from_dev(dev);
-
device_link_del(data->link);
-   iommu_device_unlink(>iommu, dev);
-   iommu_group_remove_device(dev);
 }
 
 static struct iommu_group *rk_iommu_device_group(struct device *dev)
@@ -1126,8 +1114,8 @@ static const struct iommu_ops rk_iommu_ops = {
.detach_dev = rk_iommu_detach_device,
.map = rk_iommu_map,
.unmap = rk_iommu_unmap,
-   .add_device = rk_iommu_add_device,
-   .remove_device = rk_iommu_remove_device,
+   .probe_device = rk_iommu_probe_device,
+   .release_device = rk_iommu_release_device,
.iova_to_phys = rk_iommu_iova_to_phys,
.device_group = rk_iommu_device_group,
.pgsize_bitmap = RK_IOMMU_PGSIZE_BITMAP,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 19/34] iommu/s390: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the S390 IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/s390-iommu.c | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 1137f3ddcb85..610f0828f22d 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -166,21 +166,14 @@ static void s390_iommu_detach_device(struct iommu_domain 
*domain,
}
 }
 
-static int s390_iommu_add_device(struct device *dev)
+static struct iommu_device *s390_iommu_probe_device(struct device *dev)
 {
-   struct iommu_group *group = iommu_group_get_for_dev(dev);
struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
 
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-   iommu_device_link(>iommu_dev, dev);
-
-   return 0;
+   return >iommu_dev;
 }
 
-static void s390_iommu_remove_device(struct device *dev)
+static void s390_iommu_release_device(struct device *dev)
 {
struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
struct iommu_domain *domain;
@@ -191,7 +184,7 @@ static void s390_iommu_remove_device(struct device *dev)
 * to vfio-pci and completing the VFIO_SET_IOMMU ioctl (which triggers
 * the attach_dev), removing the device via
 * "echo 1 > /sys/bus/pci/devices/.../remove" won't trigger detach_dev,
-* only remove_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
+* only release_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
 * notifier.
 *
 * So let's call detach_dev from here if it hasn't been called before.
@@ -201,9 +194,6 @@ static void s390_iommu_remove_device(struct device *dev)
if (domain)
s390_iommu_detach_device(domain, dev);
}
-
-   iommu_device_unlink(>iommu_dev, dev);
-   iommu_group_remove_device(dev);
 }
 
 static int s390_iommu_update_trans(struct s390_domain *s390_domain,
@@ -373,8 +363,8 @@ static const struct iommu_ops s390_iommu_ops = {
.map = s390_iommu_map,
.unmap = s390_iommu_unmap,
.iova_to_phys = s390_iommu_iova_to_phys,
-   .add_device = s390_iommu_add_device,
-   .remove_device = s390_iommu_remove_device,
+   .probe_device = s390_iommu_probe_device,
+   .release_device = s390_iommu_release_device,
.device_group = generic_device_group,
.pgsize_bitmap = S390_IOMMU_PGSIZES,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 04/34] iommu/vt-d: Wire up iommu_ops->def_domain_type

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

The Intel VT-d driver already has a matching function to determine the
default domain type for a device. Wire it up in intel_iommu_ops.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef0a5246700e..b9f905a55dda 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -6209,6 +6209,7 @@ const struct iommu_ops intel_iommu_ops = {
.dev_enable_feat= intel_iommu_dev_enable_feat,
.dev_disable_feat   = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
+   .def_domain_type= device_def_domain_type,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 10/34] iommu: Move new probe_device path to separate function

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

This makes it easier to remove to old code-path when all drivers are
converted. As a side effect that it also fixes the error cleanup
path.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 69 ---
 1 file changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 18eb3623bd00..8be047a4808f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -218,12 +218,55 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
return ret;
 }
 
+static int __iommu_probe_device_helper(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct iommu_group *group;
+   int ret;
+
+   ret = __iommu_probe_device(dev, NULL);
+   if (ret)
+   goto err_out;
+
+   /*
+* Try to allocate a default domain - needs support from the
+* IOMMU driver. There are still some drivers which don't
+* support default domains, so the return value is not yet
+* checked.
+*/
+   iommu_alloc_default_domain(dev);
+
+   group = iommu_group_get(dev);
+   if (!group)
+   goto err_release;
+
+   if (group->default_domain)
+   ret = __iommu_attach_device(group->default_domain, dev);
+
+   iommu_group_put(group);
+
+   if (ret)
+   goto err_release;
+
+   if (ops->probe_finalize)
+   ops->probe_finalize(dev);
+
+   return 0;
+
+err_release:
+   iommu_release_device(dev);
+err_out:
+   return ret;
+
+}
+
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
int ret;
 
WARN_ON(dev->iommu_group);
+
if (!ops)
return -EINVAL;
 
@@ -235,30 +278,10 @@ int iommu_probe_device(struct device *dev)
goto err_free_dev_param;
}
 
-   if (ops->probe_device) {
-   struct iommu_group *group;
-
-   ret = __iommu_probe_device(dev, NULL);
-
-   /*
-* Try to allocate a default domain - needs support from the
-* IOMMU driver. There are still some drivers which don't
-* support default domains, so the return value is not yet
-* checked.
-*/
-   if (!ret)
-   iommu_alloc_default_domain(dev);
-
-   group = iommu_group_get(dev);
-   if (group && group->default_domain) {
-   ret = __iommu_attach_device(group->default_domain, dev);
-   iommu_group_put(group);
-   }
-
-   } else {
-   ret = ops->add_device(dev);
-   }
+   if (ops->probe_device)
+   return __iommu_probe_device_helper(dev);
 
+   ret = ops->add_device(dev);
if (ret)
goto err_module_put;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 16/34] iommu/vt-d: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Intel IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 67 -
 1 file changed, 6 insertions(+), 61 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b9f905a55dda..b906727f5b85 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5781,78 +5781,27 @@ static bool intel_iommu_capable(enum iommu_cap cap)
return false;
 }
 
-static int intel_iommu_add_device(struct device *dev)
+static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 {
-   struct dmar_domain *dmar_domain;
-   struct iommu_domain *domain;
struct intel_iommu *iommu;
-   struct iommu_group *group;
u8 bus, devfn;
-   int ret;
 
iommu = device_to_iommu(dev, , );
if (!iommu)
-   return -ENODEV;
-
-   iommu_device_link(>iommu, dev);
+   return ERR_PTR(-ENODEV);
 
if (translation_pre_enabled(iommu))
dev->archdata.iommu = DEFER_DEVICE_DOMAIN_INFO;
 
-   group = iommu_group_get_for_dev(dev);
-
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto unlink;
-   }
-
-   iommu_group_put(group);
-
-   domain = iommu_get_domain_for_dev(dev);
-   dmar_domain = to_dmar_domain(domain);
-   if (domain->type == IOMMU_DOMAIN_DMA) {
-   if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) {
-   ret = iommu_request_dm_for_dev(dev);
-   if (ret) {
-   dmar_remove_one_dev_info(dev);
-   dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-   domain_add_dev_info(si_domain, dev);
-   dev_info(dev,
-"Device uses a private identity 
domain.\n");
-   }
-   }
-   } else {
-   if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
-   ret = iommu_request_dma_domain_for_dev(dev);
-   if (ret) {
-   dmar_remove_one_dev_info(dev);
-   dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-   if (!get_private_domain_for_dev(dev)) {
-   dev_warn(dev,
-"Failed to get a private 
domain.\n");
-   ret = -ENOMEM;
-   goto unlink;
-   }
-
-   dev_info(dev,
-"Device uses a private dma domain.\n");
-   }
-   }
-   }
-
if (device_needs_bounce(dev)) {
dev_info(dev, "Use Intel IOMMU bounce page dma_ops\n");
set_dma_ops(dev, _dma_ops);
}
 
-   return 0;
-
-unlink:
-   iommu_device_unlink(>iommu, dev);
-   return ret;
+   return >iommu;
 }
 
-static void intel_iommu_remove_device(struct device *dev)
+static void intel_iommu_release_device(struct device *dev)
 {
struct intel_iommu *iommu;
u8 bus, devfn;
@@ -5863,10 +5812,6 @@ static void intel_iommu_remove_device(struct device *dev)
 
dmar_remove_one_dev_info(dev);
 
-   iommu_group_remove_device(dev);
-
-   iommu_device_unlink(>iommu, dev);
-
if (device_needs_bounce(dev))
set_dma_ops(dev, NULL);
 }
@@ -6198,8 +6143,8 @@ const struct iommu_ops intel_iommu_ops = {
.map= intel_iommu_map,
.unmap  = intel_iommu_unmap,
.iova_to_phys   = intel_iommu_iova_to_phys,
-   .add_device = intel_iommu_add_device,
-   .remove_device  = intel_iommu_remove_device,
+   .probe_device   = intel_iommu_probe_device,
+   .release_device = intel_iommu_release_device,
.get_resv_regions   = intel_iommu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
.apply_resv_region  = intel_iommu_apply_resv_region,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 01/34] iommu: Move default domain allocation to separate function

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Move the code out of iommu_group_get_for_dev() into a separate
function.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 74 ++-
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2b471419e26c..bfe011760ed1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1361,6 +1361,41 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
+static int iommu_alloc_default_domain(struct device *dev,
+ struct iommu_group *group)
+{
+   struct iommu_domain *dom;
+
+   if (group->default_domain)
+   return 0;
+
+   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
+   if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
+   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
+   if (dom) {
+   dev_warn(dev,
+"failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
+iommu_def_domain_type);
+   }
+   }
+
+   if (!dom)
+   return -ENOMEM;
+
+   group->default_domain = dom;
+   if (!group->domain)
+   group->domain = dom;
+
+   if (!iommu_dma_strict) {
+   int attr = 1;
+   iommu_domain_set_attr(dom,
+ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+ );
+   }
+
+   return 0;
+}
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
@@ -1393,40 +1428,21 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
 
/*
 * Try to allocate a default domain - needs support from the
-* IOMMU driver.
+* IOMMU driver. There are still some drivers which don't support
+* default domains, so the return value is not yet checked.
 */
-   if (!group->default_domain) {
-   struct iommu_domain *dom;
-
-   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
-   if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
-   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
-   if (dom) {
-   dev_warn(dev,
-"failed to allocate default IOMMU 
domain of type %u; falling back to IOMMU_DOMAIN_DMA",
-iommu_def_domain_type);
-   }
-   }
-
-   group->default_domain = dom;
-   if (!group->domain)
-   group->domain = dom;
-
-   if (dom && !iommu_dma_strict) {
-   int attr = 1;
-   iommu_domain_set_attr(dom,
- DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
- );
-   }
-   }
+   iommu_alloc_default_domain(dev, group);
 
ret = iommu_group_add_device(group, dev);
-   if (ret) {
-   iommu_group_put(group);
-   return ERR_PTR(ret);
-   }
+   if (ret)
+   goto out_put_group;
 
return group;
+
+out_put_group:
+   iommu_group_put(group);
+
+   return ERR_PTR(ret);
 }
 EXPORT_SYMBOL(iommu_group_get_for_dev);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 13/34] iommu: Export bus_iommu_probe() and make is safe for re-probing

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Add a check to the bus_iommu_probe() call-path to make sure it ignores
devices which have already been successfully probed. Then export the
bus_iommu_probe() function so it can be used by IOMMU drivers.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 10 +-
 include/linux/iommu.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 834a45da0ed0..397fd4fd0c32 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1610,11 +1610,19 @@ static int probe_iommu_group(struct device *dev, void 
*data)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct list_head *group_list = data;
+   struct iommu_group *group;
int ret;
 
if (!dev_iommu_get(dev))
return -ENOMEM;
 
+   /* Device is probed already if in a group */
+   group = iommu_group_get(dev);
+   if (group) {
+   iommu_group_put(group);
+   return 0;
+   }
+
if (!try_module_get(ops->owner)) {
ret = -EINVAL;
goto err_free_dev_iommu;
@@ -1783,7 +1791,7 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group)
  iommu_do_create_direct_mappings);
 }
 
-static int bus_iommu_probe(struct bus_type *bus)
+int bus_iommu_probe(struct bus_type *bus)
 {
const struct iommu_ops *ops = bus->iommu_ops;
int ret;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 30170d191e5e..fea1622408ad 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -445,6 +445,7 @@ static inline void iommu_iotlb_gather_init(struct 
iommu_iotlb_gather *gather)
 #define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER  6 /* Post Driver unbind */
 
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
+extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
 extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: iommu_iova slab eats too much memory

2020-04-29 Thread Salil Mehta
Hi Bin,

> From: Bin [mailto:anole1...@gmail.com] 
> Sent: Wednesday, April 29, 2020 5:14 AM
> To: Salil Mehta 
> Hi Shlil:
> 
> Thank you for your attention, and these are my answers:
> 
> 1. I don't really understand what you're saying. What's the difference 
> between DMA buffer and DMA mapping? 
> It's like a memory block pool and a memory block or something like that? 


DMA Mapping: Mapping are translations/associations [IOVA<->HPA OR 
IOVA<->GPA(further translated
to HPA by Stage-2)] which are created by the NIC  driver. IOMMU hardware 
responsible for NIC
IOVA translations is populated with the mappings by the driver before 
submitting the DMA buffer
to the hardware for TX/RX. 

DMA buffers: Actual Memory allocated by the driver where data could be DMA'ed 
(RX'ed or TX'ed)


I think you have missed the important point I mentioned earlier:
If there is a leak of IOVA mapping due to dma_unmap_* not being called 
somewhere then at
certain point the throughput will drastically fall and will almost become equal 
to zero.
This is due to the exhaustion of available IOVA mapping space in the IOMMU 
hardware.

Above condition is very much different than a *memory leak* of the DMA buffer 
itself which
will eventually lead to OOM.
 

Salil.

> FYI:
> I found an interesting phenomenon that it's just a small part of the running 
> hosts has this issue, even though they all 
> have the same kernel, configuration and hardwares, I don't know if this 
> really mean something.
> 
>
> Salil Mehta  于2020年4月28日周二 下午5:17写道:
> Hi Bin,
> 
> Few questions:
> 
> 1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere 
> then
> at certain point the throughput will drastically fall and will almost become
> equal
> to zero. This should be due to unavailability of the mapping anymore. But in
> your
> case VM is getting killed so this could be actual DMA buffer leak not DMA 
> mapping
> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in the
> IOMMU
> Layer for a transient reason or even due to mapping/unmapping leak.
> 
> 2. Could you check if you have TSO offload enabled on Intel 82599? It will 
> help
> in reducing the number of mappings and will take off IOVA mapping pressure 
> from
> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of 
> memory
> required for the buffers.
> 
> 3. Also, have you checked the cpu-usage while your experiment is going on?
> 
> Thanks
> Salil.
> 
> > -Original Message-
> > From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> > Robin Murphy
> > Sent: Friday, April 24, 2020 5:31 PM
> > To: Bin 
> > Cc: iommu@lists.linux-foundation.org
> > Subject: Re: iommu_iova slab eats too much memory
> >
> > On 2020-04-24 2:20 pm, Bin wrote:
> > > Dear Robin:
> > >  Thank you for your explanation. Now, I understand that this could be
> > > NIC driver's fault, but how could I confirm it? Do I have to debug the
> > > driver myself?
> >
> > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> > memory about an order of magnitude faster than the IOVAs alone, but it
> > should shed some light on whether DMA API usage looks suspicious, and
> > dumping the mappings should help track down the responsible driver(s).
> > Although the debugfs code doesn't show the stacktrace of where each
> > mapping was made, I guess it would be fairly simple to tweak that for a
> > quick way to narrow down where to start looking in an offending driver.
> >
> > Robin.
> >
> > > Robin Murphy  于2020年4月24日周五 下午8:15写道:
> > >
> > >> On 2020-04-24 1:06 pm, Bin wrote:
> > >>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> > >>> leaking DMA mappings", is it possible that some other kernel module like
> > >>> KVM or NIC driver leads to the leaking problem instead of the iommu
> > >> module
> > >>> itself?
> > >>
> > >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> > >> should, since I'd expect a lot of people to have noticed that. It's far
> > >> more likely that some driver is failing to call dma_unmap_* when it's
> > >> finished with a buffer - with the IOMMU disabled that would be a no-op
> > >> on x86 with a modern 64-bit-capable device, so such a latent bug could
> > >> have been easily overlooked.
> > >>
> > >> Robin.
> > >>
> > >>> Bin  于 2020年4月24日周五 20:00写道:
> > >>>
> >  Well, that's the problem! I'm assuming the iommu kernel module is
> > >> leaking
> >  memory. But I don't know why and how.
> > 
> >  Do you have any idea about it? Or any further information is needed?
> > 
> >  Robin Murphy  于 2020年4月24日周五 19:20写道:
> > 
> > > On 2020-04-24 1:40 am, Bin wrote:
> > >> Hello? anyone there?
> > >>
> > >> Bin  于2020年4月23日周四 下午5:14写道:
> > >>
> > >>> Forget to mention, I've already disabled the slab merge, so this is
> > > what
> > >>> it is.
> > >>>
> > >>> Bin  于2020年4月23日周四 下午5:11写道:
> > >>>
> > 

[PATCH v3 14/34] iommu/amd: Remove dev_data->passthrough

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Make use of generic IOMMU infrastructure to gather the same information
carried in dev_data->passthrough and remove the struct member.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c   | 10 +-
 drivers/iommu/amd_iommu_types.h |  1 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 3e0d27f7622e..0b4b4faa876d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2047,8 +2047,8 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 static int attach_device(struct device *dev,
 struct protection_domain *domain)
 {
-   struct pci_dev *pdev;
struct iommu_dev_data *dev_data;
+   struct pci_dev *pdev;
unsigned long flags;
int ret;
 
@@ -2067,8 +2067,10 @@ static int attach_device(struct device *dev,
 
pdev = to_pci_dev(dev);
if (domain->flags & PD_IOMMUV2_MASK) {
+   struct iommu_domain *def_domain = iommu_get_dma_domain(dev);
+
ret = -EINVAL;
-   if (!dev_data->passthrough)
+   if (def_domain->type != IOMMU_DOMAIN_IDENTITY)
goto out;
 
if (dev_data->iommu_v2) {
@@ -2189,9 +2191,7 @@ static int amd_iommu_add_device(struct device *dev)
 
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
-   if (domain->type == IOMMU_DOMAIN_IDENTITY)
-   dev_data->passthrough = true;
-   else if (domain->type == IOMMU_DOMAIN_DMA)
+   if (domain->type == IOMMU_DOMAIN_DMA)
iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
 
 out:
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index ca8c4522045b..d0d7b6a0c3d8 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -640,7 +640,6 @@ struct iommu_dev_data {
struct pci_dev *pdev;
u16 devid;/* PCI Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
-   bool passthrough; /* Device is identity mapped */
struct {
bool enabled;
int qdep;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 12/34] iommu: Move iommu_group_create_direct_mappings() out of iommu_group_add_device()

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

After the previous changes the iommu group may not have a default
domain when iommu_group_add_device() is called. With no default domain
iommu_group_create_direct_mappings() will do nothing and no direct
mappings will be created.

Rename iommu_group_create_direct_mappings() to
iommu_create_device_direct_mappings() to better reflect that the
function creates direct mappings only for one device and not for all
devices in the group. Then move the call to the places where a default
domain actually exists.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7de0e29db333..834a45da0ed0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -89,6 +89,8 @@ static int __iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group);
 static void __iommu_detach_group(struct iommu_domain *domain,
 struct iommu_group *group);
+static int iommu_create_device_direct_mappings(struct iommu_group *group,
+  struct device *dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -243,6 +245,8 @@ static int __iommu_probe_device_helper(struct device *dev)
if (group->default_domain)
ret = __iommu_attach_device(group->default_domain, dev);
 
+   iommu_create_device_direct_mappings(group, dev);
+
iommu_group_put(group);
 
if (ret)
@@ -263,6 +267,7 @@ static int __iommu_probe_device_helper(struct device *dev)
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct iommu_group *group;
int ret;
 
WARN_ON(dev->iommu_group);
@@ -285,6 +290,10 @@ int iommu_probe_device(struct device *dev)
if (ret)
goto err_module_put;
 
+   group = iommu_group_get(dev);
+   iommu_create_device_direct_mappings(group, dev);
+   iommu_group_put(group);
+
if (ops->probe_finalize)
ops->probe_finalize(dev);
 
@@ -736,8 +745,8 @@ int iommu_group_set_name(struct iommu_group *group, const 
char *name)
 }
 EXPORT_SYMBOL_GPL(iommu_group_set_name);
 
-static int iommu_group_create_direct_mappings(struct iommu_group *group,
- struct device *dev)
+static int iommu_create_device_direct_mappings(struct iommu_group *group,
+  struct device *dev)
 {
struct iommu_domain *domain = group->default_domain;
struct iommu_resv_region *entry;
@@ -841,8 +850,6 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
 
dev->iommu_group = group;
 
-   iommu_group_create_direct_mappings(group, dev);
-
mutex_lock(>mutex);
list_add_tail(>list, >devices);
if (group->domain)
@@ -1736,6 +1743,7 @@ static void probe_alloc_default_domain(struct bus_type 
*bus,
gtype.type = iommu_def_domain_type;
 
iommu_group_alloc_default_domain(bus, group, gtype.type);
+
 }
 
 static int iommu_group_do_dma_attach(struct device *dev, void *data)
@@ -1760,6 +1768,21 @@ static int __iommu_group_dma_attach(struct iommu_group 
*group)
  iommu_group_do_dma_attach);
 }
 
+static int iommu_do_create_direct_mappings(struct device *dev, void *data)
+{
+   struct iommu_group *group = data;
+
+   iommu_create_device_direct_mappings(group, dev);
+
+   return 0;
+}
+
+static int iommu_group_create_direct_mappings(struct iommu_group *group)
+{
+   return __iommu_group_for_each_dev(group, group,
+ iommu_do_create_direct_mappings);
+}
+
 static int bus_iommu_probe(struct bus_type *bus)
 {
const struct iommu_ops *ops = bus->iommu_ops;
@@ -1792,6 +1815,8 @@ static int bus_iommu_probe(struct bus_type *bus)
continue;
}
 
+   iommu_group_create_direct_mappings(group);
+
ret = __iommu_group_dma_attach(group);
 
mutex_unlock(>mutex);
@@ -2632,7 +2657,7 @@ request_default_domain_for_dev(struct device *dev, 
unsigned long type)
iommu_domain_free(group->default_domain);
group->default_domain = domain;
 
-   iommu_group_create_direct_mappings(group, dev);
+   iommu_create_device_direct_mappings(group, dev);
 
dev_info(dev, "Using iommu %s mapping\n",
 type == IOMMU_DOMAIN_DMA ? "dma" : "direct");
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH v3 11/34] iommu: Split off default domain allocation from group assignment

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

When a bus is initialized with iommu-ops, all devices on the bus are
scanned and iommu-groups are allocated for them, and each groups will
also get a default domain allocated.

Until now this happened as soon as the group was created and the first
device added to it. When other devices with different default domain
requirements were added to the group later on, the default domain was
re-allocated, if possible.

This resulted in some back and forth and unnecessary allocations, so
change the flow to defer default domain allocation until all devices
have been added to their respective IOMMU groups.

The default domains are allocated for newly allocated groups after
each device on the bus is handled and was probed by the IOMMU driver.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 154 +-
 1 file changed, 151 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8be047a4808f..7de0e29db333 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -199,7 +199,7 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
dev->iommu->iommu_dev = iommu_dev;
 
group = iommu_group_get_for_dev(dev);
-   if (!IS_ERR(group)) {
+   if (IS_ERR(group)) {
ret = PTR_ERR(group);
goto out_release;
}
@@ -1599,6 +1599,37 @@ static int add_iommu_group(struct device *dev, void 
*data)
return ret;
 }
 
+static int probe_iommu_group(struct device *dev, void *data)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct list_head *group_list = data;
+   int ret;
+
+   if (!dev_iommu_get(dev))
+   return -ENOMEM;
+
+   if (!try_module_get(ops->owner)) {
+   ret = -EINVAL;
+   goto err_free_dev_iommu;
+   }
+
+   ret = __iommu_probe_device(dev, group_list);
+   if (ret)
+   goto err_module_put;
+
+   return 0;
+
+err_module_put:
+   module_put(ops->owner);
+err_free_dev_iommu:
+   dev_iommu_free(dev);
+
+   if (ret == -ENODEV)
+   ret = 0;
+
+   return ret;
+}
+
 static int remove_iommu_group(struct device *dev, void *data)
 {
iommu_release_device(dev);
@@ -1658,10 +1689,127 @@ static int iommu_bus_notifier(struct notifier_block 
*nb,
return 0;
 }
 
+struct __group_domain_type {
+   struct device *dev;
+   unsigned int type;
+};
+
+static int probe_get_default_domain_type(struct device *dev, void *data)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct __group_domain_type *gtype = data;
+   unsigned int type = 0;
+
+   if (ops->def_domain_type)
+   type = ops->def_domain_type(dev);
+
+   if (type) {
+   if (gtype->type && gtype->type != type) {
+   dev_warn(dev, "Device needs domain type %s, but device 
%s in the same iommu group requires type %s - using default\n",
+iommu_domain_type_str(type),
+dev_name(gtype->dev),
+iommu_domain_type_str(gtype->type));
+   gtype->type = 0;
+   }
+
+   if (!gtype->dev) {
+   gtype->dev  = dev;
+   gtype->type = type;
+   }
+   }
+
+   return 0;
+}
+
+static void probe_alloc_default_domain(struct bus_type *bus,
+  struct iommu_group *group)
+{
+   struct __group_domain_type gtype;
+
+   memset(, 0, sizeof(gtype));
+
+   /* Ask for default domain requirements of all devices in the group */
+   __iommu_group_for_each_dev(group, ,
+  probe_get_default_domain_type);
+
+   if (!gtype.type)
+   gtype.type = iommu_def_domain_type;
+
+   iommu_group_alloc_default_domain(bus, group, gtype.type);
+}
+
+static int iommu_group_do_dma_attach(struct device *dev, void *data)
+{
+   struct iommu_domain *domain = data;
+   const struct iommu_ops *ops;
+   int ret;
+
+   ret = __iommu_attach_device(domain, dev);
+
+   ops = domain->ops;
+
+   if (ret == 0 && ops->probe_finalize)
+   ops->probe_finalize(dev);
+
+   return ret;
+}
+
+static int __iommu_group_dma_attach(struct iommu_group *group)
+{
+   return __iommu_group_for_each_dev(group, group->default_domain,
+ iommu_group_do_dma_attach);
+}
+
+static int bus_iommu_probe(struct bus_type *bus)
+{
+   const struct iommu_ops *ops = bus->iommu_ops;
+   int ret;
+
+   if (ops->probe_device) {
+   struct iommu_group *group, *next;
+   LIST_HEAD(group_list);
+
+   /*
+* This code-path does not allocate the default domain when
+   

[PATCH v3 28/34] iommu/omap: Remove orphan_dev tracking

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Remove the tracking of device which could not be probed because
their IOMMU is not probed yet. Replace it with a call to
bus_iommu_probe() when a new IOMMU is probed.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/omap-iommu.c | 54 +++---
 1 file changed, 4 insertions(+), 50 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 887fefcb03b4..ecc9d0829a91 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -35,15 +35,6 @@
 
 static const struct iommu_ops omap_iommu_ops;
 
-struct orphan_dev {
-   struct device *dev;
-   struct list_head node;
-};
-
-static LIST_HEAD(orphan_dev_list);
-
-static DEFINE_SPINLOCK(orphan_lock);
-
 #define to_iommu(dev)  ((struct omap_iommu *)dev_get_drvdata(dev))
 
 /* bitmap of the page sizes currently supported */
@@ -62,8 +53,6 @@ static DEFINE_SPINLOCK(orphan_lock);
 static struct platform_driver omap_iommu_driver;
 static struct kmem_cache *iopte_cachep;
 
-static int _omap_iommu_add_device(struct device *dev);
-
 /**
  * to_omap_domain - Get struct omap_iommu_domain from generic iommu_domain
  * @dom:   generic iommu domain handle
@@ -1177,7 +1166,6 @@ static int omap_iommu_probe(struct platform_device *pdev)
struct omap_iommu *obj;
struct resource *res;
struct device_node *of = pdev->dev.of_node;
-   struct orphan_dev *orphan_dev, *tmp;
 
if (!of) {
pr_err("%s: only DT-based devices are supported\n", __func__);
@@ -1260,13 +1248,8 @@ static int omap_iommu_probe(struct platform_device *pdev)
 
dev_info(>dev, "%s registered\n", obj->name);
 
-   list_for_each_entry_safe(orphan_dev, tmp, _dev_list, node) {
-   err = _omap_iommu_add_device(orphan_dev->dev);
-   if (!err) {
-   list_del(_dev->node);
-   kfree(orphan_dev);
-   }
-   }
+   /* Re-probe bus to probe device attached to this IOMMU */
+   bus_iommu_probe(_bus_type);
 
return 0;
 
@@ -1657,7 +1640,7 @@ static phys_addr_t omap_iommu_iova_to_phys(struct 
iommu_domain *domain,
return ret;
 }
 
-static int _omap_iommu_add_device(struct device *dev)
+static int omap_iommu_add_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data, *tmp;
struct omap_iommu *oiommu;
@@ -1666,8 +1649,6 @@ static int _omap_iommu_add_device(struct device *dev)
struct platform_device *pdev;
int num_iommus, i;
int ret;
-   struct orphan_dev *orphan_dev;
-   unsigned long flags;
 
/*
 * Allocate the archdata iommu structure for DT-based devices.
@@ -1702,23 +1683,7 @@ static int _omap_iommu_add_device(struct device *dev)
if (!pdev) {
of_node_put(np);
kfree(arch_data);
-   spin_lock_irqsave(_lock, flags);
-   list_for_each_entry(orphan_dev, _dev_list,
-   node) {
-   if (orphan_dev->dev == dev)
-   break;
-   }
-   spin_unlock_irqrestore(_lock, flags);
-
-   if (orphan_dev && orphan_dev->dev == dev)
-   return -EPROBE_DEFER;
-
-   orphan_dev = kzalloc(sizeof(*orphan_dev), GFP_KERNEL);
-   orphan_dev->dev = dev;
-   spin_lock_irqsave(_lock, flags);
-   list_add(_dev->node, _dev_list);
-   spin_unlock_irqrestore(_lock, flags);
-   return -EPROBE_DEFER;
+   return -ENODEV;
}
 
oiommu = platform_get_drvdata(pdev);
@@ -1764,17 +1729,6 @@ static int _omap_iommu_add_device(struct device *dev)
return 0;
 }
 
-static int omap_iommu_add_device(struct device *dev)
-{
-   int ret;
-
-   ret = _omap_iommu_add_device(dev);
-   if (ret == -EPROBE_DEFER)
-   return 0;
-
-   return ret;
-}
-
 static void omap_iommu_remove_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/34] iommu/amd: Remove dma_mask check from check_device()

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

The check was only needed for the DMA-API implementation in the AMD
IOMMU driver, which no longer exists.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 73b4f84cf449..504f2db75eda 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -326,7 +326,7 @@ static bool check_device(struct device *dev)
 {
int devid;
 
-   if (!dev || !dev->dma_mask)
+   if (!dev)
return false;
 
devid = get_device_id(dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 22/34] iommu/mediatek: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Mediatek IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/mtk_iommu.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 5f4d6df59cf6..2be96f1cdbd2 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -441,38 +441,26 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct 
iommu_domain *domain,
return pa;
 }
 
-static int mtk_iommu_add_device(struct device *dev)
+static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct mtk_iommu_data *data;
-   struct iommu_group *group;
 
if (!fwspec || fwspec->ops != _iommu_ops)
-   return -ENODEV; /* Not a iommu client device */
+   return ERR_PTR(-ENODEV); /* Not a iommu client device */
 
data = dev_iommu_priv_get(dev);
-   iommu_device_link(>iommu, dev);
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-   return 0;
+   return >iommu;
 }
 
-static void mtk_iommu_remove_device(struct device *dev)
+static void mtk_iommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct mtk_iommu_data *data;
 
if (!fwspec || fwspec->ops != _iommu_ops)
return;
 
-   data = dev_iommu_priv_get(dev);
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
iommu_fwspec_free(dev);
 }
 
@@ -526,8 +514,8 @@ static const struct iommu_ops mtk_iommu_ops = {
.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
.iotlb_sync = mtk_iommu_iotlb_sync,
.iova_to_phys   = mtk_iommu_iova_to_phys,
-   .add_device = mtk_iommu_add_device,
-   .remove_device  = mtk_iommu_remove_device,
+   .probe_device   = mtk_iommu_probe_device,
+   .release_device = mtk_iommu_release_device,
.device_group   = mtk_iommu_device_group,
.of_xlate   = mtk_iommu_of_xlate,
.pgsize_bitmap  = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 23/34] iommu/mediatek-v1 Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Mediatek-v1 IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/mtk_iommu_v1.c | 50 +++-
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index a31be05601c9..7bdd74c7cb9f 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -416,14 +416,12 @@ static int mtk_iommu_create_mapping(struct device *dev,
return 0;
 }
 
-static int mtk_iommu_add_device(struct device *dev)
+static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct dma_iommu_mapping *mtk_mapping;
struct of_phandle_args iommu_spec;
struct of_phandle_iterator it;
struct mtk_iommu_data *data;
-   struct iommu_group *group;
int err;
 
of_for_each_phandle(, err, dev->of_node, "iommus",
@@ -442,35 +440,28 @@ static int mtk_iommu_add_device(struct device *dev)
}
 
if (!fwspec || fwspec->ops != _iommu_ops)
-   return -ENODEV; /* Not a iommu client device */
+   return ERR_PTR(-ENODEV); /* Not a iommu client device */
 
-   /*
-* This is a short-term bodge because the ARM DMA code doesn't
-* understand multi-device groups, but we have to call into it
-* successfully (and not just rely on a normal IOMMU API attach
-* here) in order to set the correct DMA API ops on @dev.
-*/
-   group = iommu_group_alloc();
-   if (IS_ERR(group))
-   return PTR_ERR(group);
+   data = dev_iommu_priv_get(dev);
 
-   err = iommu_group_add_device(group, dev);
-   iommu_group_put(group);
-   if (err)
-   return err;
+   return >iommu;
+}
 
-   data = dev_iommu_priv_get(dev);
+static void mtk_iommu_probe_finalize(struct device *dev)
+{
+   struct dma_iommu_mapping *mtk_mapping;
+   struct mtk_iommu_data *data;
+   int err;
+
+   data= dev_iommu_priv_get(dev);
mtk_mapping = data->dev->archdata.iommu;
-   err = arm_iommu_attach_device(dev, mtk_mapping);
-   if (err) {
-   iommu_group_remove_device(dev);
-   return err;
-   }
 
-   return iommu_device_link(>iommu, dev);
+   err = arm_iommu_attach_device(dev, mtk_mapping);
+   if (err)
+   dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not 
work\n");
 }
 
-static void mtk_iommu_remove_device(struct device *dev)
+static void mtk_iommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct mtk_iommu_data *data;
@@ -479,9 +470,6 @@ static void mtk_iommu_remove_device(struct device *dev)
return;
 
data = dev_iommu_priv_get(dev);
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
iommu_fwspec_free(dev);
 }
 
@@ -534,8 +522,10 @@ static const struct iommu_ops mtk_iommu_ops = {
.map= mtk_iommu_map,
.unmap  = mtk_iommu_unmap,
.iova_to_phys   = mtk_iommu_iova_to_phys,
-   .add_device = mtk_iommu_add_device,
-   .remove_device  = mtk_iommu_remove_device,
+   .probe_device   = mtk_iommu_probe_device,
+   .probe_finalize = mtk_iommu_probe_finalize,
+   .release_device = mtk_iommu_release_device,
+   .device_group   = generic_device_group,
.pgsize_bitmap  = ~0UL << MT2701_IOMMU_PAGE_SHIFT,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 03/34] iommu/amd: Implement iommu_ops->def_domain_type call-back

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Implement the new def_domain_type call-back for the AMD IOMMU driver.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 20cce366e951..73b4f84cf449 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2661,6 +2661,20 @@ static void amd_iommu_iotlb_sync(struct iommu_domain 
*domain,
amd_iommu_flush_iotlb_all(domain);
 }
 
+static int amd_iommu_def_domain_type(struct device *dev)
+{
+   struct iommu_dev_data *dev_data;
+
+   dev_data = get_dev_data(dev);
+   if (!dev_data)
+   return 0;
+
+   if (dev_data->iommu_v2)
+   return IOMMU_DOMAIN_IDENTITY;
+
+   return 0;
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -2680,6 +2694,7 @@ const struct iommu_ops amd_iommu_ops = {
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
.iotlb_sync = amd_iommu_iotlb_sync,
+   .def_domain_type = amd_iommu_def_domain_type,
 };
 
 /*
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/34] iommu: Add def_domain_type() callback in iommu_ops

2020-04-29 Thread Joerg Roedel
From: Sai Praneeth Prakhya 

Some devices are reqired to use a specific type (identity or dma)
of default domain when they are used with a vendor iommu. When the
system level default domain type is different from it, the vendor
iommu driver has to request a new default domain with
iommu_request_dma_domain_for_dev() and iommu_request_dm_for_dev()
in the add_dev() callback. Unfortunately, these two helpers only
work when the group hasn't been assigned to any other devices,
hence, some vendor iommu driver has to use a private domain if
it fails to request a new default one.

This adds def_domain_type() callback in the iommu_ops, so that
any special requirement of default domain for a device could be
aware by the iommu generic layer.

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Lu Baolu 
[ jroe...@suse.de: Added iommu_get_def_domain_type() function and use
   it to allocate the default domain ]
Co-developed-by: Joerg Roedel 
Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 20 +---
 include/linux/iommu.h |  6 ++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bfe011760ed1..5877abd9b693 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1361,21 +1361,35 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
+static int iommu_get_def_domain_type(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   unsigned int type = 0;
+
+   if (ops->def_domain_type)
+   type = ops->def_domain_type(dev);
+
+   return (type == 0) ? iommu_def_domain_type : type;
+}
+
 static int iommu_alloc_default_domain(struct device *dev,
  struct iommu_group *group)
 {
struct iommu_domain *dom;
+   unsigned int type;
 
if (group->default_domain)
return 0;
 
-   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
-   if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
+   type = iommu_get_def_domain_type(dev);
+
+   dom = __iommu_domain_alloc(dev->bus, type);
+   if (!dom && type != IOMMU_DOMAIN_DMA) {
dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
if (dom) {
dev_warn(dev,
 "failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
-iommu_def_domain_type);
+type);
}
}
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ef8b0bda695..1f027b07e499 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -248,6 +248,10 @@ struct iommu_iotlb_gather {
  * @cache_invalidate: invalidate translation caches
  * @sva_bind_gpasid: bind guest pasid and mm
  * @sva_unbind_gpasid: unbind guest pasid and mm
+ * @def_domain_type: device default domain type, return value:
+ * - IOMMU_DOMAIN_IDENTITY: must use an identity domain
+ * - IOMMU_DOMAIN_DMA: must use a dma domain
+ * - 0: use the default setting
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @owner: Driver module providing these ops
  */
@@ -318,6 +322,8 @@ struct iommu_ops {
 
int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
+   int (*def_domain_type)(struct device *dev);
+
unsigned long pgsize_bitmap;
struct module *owner;
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 07/34] iommu: Add probe_device() and release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Add call-backs to 'struct iommu_ops' as an alternative to the
add_device() and remove_device() call-backs, which will be removed when
all drivers are converted.

The new call-backs will not setup IOMMU groups and domains anymore,
so also add a probe_finalize() call-back where the IOMMU driver can do
per-device setup work which require the device to be set up with a
group and a domain.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 63 ++-
 include/linux/iommu.h |  9 +++
 2 files changed, 66 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5877abd9b693..6cfe7799dc8c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -174,6 +174,36 @@ static void dev_iommu_free(struct device *dev)
dev->iommu = NULL;
 }
 
+static int __iommu_probe_device(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct iommu_device *iommu_dev;
+   struct iommu_group *group;
+   int ret;
+
+   iommu_dev = ops->probe_device(dev);
+   if (IS_ERR(iommu_dev))
+   return PTR_ERR(iommu_dev);
+
+   dev->iommu->iommu_dev = iommu_dev;
+
+   group = iommu_group_get_for_dev(dev);
+   if (!IS_ERR(group)) {
+   ret = PTR_ERR(group);
+   goto out_release;
+   }
+   iommu_group_put(group);
+
+   iommu_device_link(iommu_dev, dev);
+
+   return 0;
+
+out_release:
+   ops->release_device(dev);
+
+   return ret;
+}
+
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
@@ -191,10 +221,17 @@ int iommu_probe_device(struct device *dev)
goto err_free_dev_param;
}
 
-   ret = ops->add_device(dev);
+   if (ops->probe_device)
+   ret = __iommu_probe_device(dev);
+   else
+   ret = ops->add_device(dev);
+
if (ret)
goto err_module_put;
 
+   if (ops->probe_finalize)
+   ops->probe_finalize(dev);
+
return 0;
 
 err_module_put:
@@ -204,17 +241,31 @@ int iommu_probe_device(struct device *dev)
return ret;
 }
 
+static void __iommu_release_device(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   iommu_device_unlink(dev->iommu->iommu_dev, dev);
+
+   iommu_group_remove_device(dev);
+
+   ops->release_device(dev);
+}
+
 void iommu_release_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
 
-   if (dev->iommu_group)
+   if (!dev->iommu)
+   return;
+
+   if (ops->release_device)
+   __iommu_release_device(dev);
+   else if (dev->iommu_group)
ops->remove_device(dev);
 
-   if (dev->iommu) {
-   module_put(ops->owner);
-   dev_iommu_free(dev);
-   }
+   module_put(ops->owner);
+   dev_iommu_free(dev);
 }
 
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1f027b07e499..30170d191e5e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -225,6 +225,10 @@ struct iommu_iotlb_gather {
  * @iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
  * @remove_device: remove device from iommu grouping
+ * @probe_device: Add device to iommu driver handling
+ * @release_device: Remove device from iommu driver handling
+ * @probe_finalize: Do final setup work after the device is added to an IOMMU
+ *  group and attached to the groups domain
  * @device_group: find iommu group for a particular device
  * @domain_get_attr: Query domain attributes
  * @domain_set_attr: Change domain attributes
@@ -275,6 +279,9 @@ struct iommu_ops {
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t 
iova);
int (*add_device)(struct device *dev);
void (*remove_device)(struct device *dev);
+   struct iommu_device *(*probe_device)(struct device *dev);
+   void (*release_device)(struct device *dev);
+   void (*probe_finalize)(struct device *dev);
struct iommu_group *(*device_group)(struct device *dev);
int (*domain_get_attr)(struct iommu_domain *domain,
   enum iommu_attr attr, void *data);
@@ -375,6 +382,7 @@ struct iommu_fault_param {
  *
  * @fault_param: IOMMU detected device fault reporting data
  * @fwspec: IOMMU fwspec data
+ * @iommu_dev:  IOMMU device this device is linked to
  * @priv:   IOMMU Driver private data
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
@@ -384,6 +392,7 @@ struct dev_iommu {
struct mutex lock;
struct iommu_fault_param*fault_param;
struct iommu_fwspec *fwspec;
+   struct 

[PATCH v3 09/34] iommu: Keep a list of allocated groups in __iommu_probe_device()

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

This is needed to defer default_domain allocation for new IOMMU groups
until all devices have been added to the group.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7a385c18e1a5..18eb3623bd00 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -44,6 +44,7 @@ struct iommu_group {
int id;
struct iommu_domain *default_domain;
struct iommu_domain *domain;
+   struct list_head entry;
 };
 
 struct group_device {
@@ -184,7 +185,7 @@ static void dev_iommu_free(struct device *dev)
dev->iommu = NULL;
 }
 
-static int __iommu_probe_device(struct device *dev)
+static int __iommu_probe_device(struct device *dev, struct list_head 
*group_list)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_device *iommu_dev;
@@ -204,6 +205,9 @@ static int __iommu_probe_device(struct device *dev)
}
iommu_group_put(group);
 
+   if (group_list && !group->default_domain && list_empty(>entry))
+   list_add_tail(>entry, group_list);
+
iommu_device_link(iommu_dev, dev);
 
return 0;
@@ -234,7 +238,7 @@ int iommu_probe_device(struct device *dev)
if (ops->probe_device) {
struct iommu_group *group;
 
-   ret = __iommu_probe_device(dev);
+   ret = __iommu_probe_device(dev, NULL);
 
/*
 * Try to allocate a default domain - needs support from the
@@ -567,6 +571,7 @@ struct iommu_group *iommu_group_alloc(void)
group->kobj.kset = iommu_group_kset;
mutex_init(>mutex);
INIT_LIST_HEAD(>devices);
+   INIT_LIST_HEAD(>entry);
BLOCKING_INIT_NOTIFIER_HEAD(>notifier);
 
ret = ida_simple_get(_group_ida, 0, 0, GFP_KERNEL);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 20/34] iommu/virtio: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the VirtIO IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/virtio-iommu.c | 41 +---
 1 file changed, 10 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index d5cac4f46ca5..bda300c2a438 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -865,24 +865,23 @@ static struct viommu_dev *viommu_get_by_fwnode(struct 
fwnode_handle *fwnode)
return dev ? dev_to_virtio(dev)->priv : NULL;
 }
 
-static int viommu_add_device(struct device *dev)
+static struct iommu_device *viommu_probe_device(struct device *dev)
 {
int ret;
-   struct iommu_group *group;
struct viommu_endpoint *vdev;
struct viommu_dev *viommu = NULL;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 
if (!fwspec || fwspec->ops != _ops)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
viommu = viommu_get_by_fwnode(fwspec->iommu_fwnode);
if (!viommu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
vdev->dev = dev;
vdev->viommu = viommu;
@@ -896,45 +895,25 @@ static int viommu_add_device(struct device *dev)
goto err_free_dev;
}
 
-   ret = iommu_device_link(>iommu, dev);
-   if (ret)
-   goto err_free_dev;
+   return >iommu;
 
-   /*
-* Last step creates a default domain and attaches to it. Everything
-* must be ready.
-*/
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto err_unlink_dev;
-   }
-
-   iommu_group_put(group);
-
-   return PTR_ERR_OR_ZERO(group);
-
-err_unlink_dev:
-   iommu_device_unlink(>iommu, dev);
 err_free_dev:
generic_iommu_put_resv_regions(dev, >resv_regions);
kfree(vdev);
 
-   return ret;
+   return ERR_PTR(ret);
 }
 
-static void viommu_remove_device(struct device *dev)
+static void viommu_release_device(struct device *dev)
 {
-   struct viommu_endpoint *vdev;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct viommu_endpoint *vdev;
 
if (!fwspec || fwspec->ops != _ops)
return;
 
vdev = dev_iommu_priv_get(dev);
 
-   iommu_group_remove_device(dev);
-   iommu_device_unlink(>viommu->iommu, dev);
generic_iommu_put_resv_regions(dev, >resv_regions);
kfree(vdev);
 }
@@ -960,8 +939,8 @@ static struct iommu_ops viommu_ops = {
.unmap  = viommu_unmap,
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
-   .add_device = viommu_add_device,
-   .remove_device  = viommu_remove_device,
+   .probe_device   = viommu_probe_device,
+   .release_device = viommu_release_device,
.device_group   = viommu_device_group,
.get_resv_regions   = viommu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 21/34] iommu/msm: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the MSM IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/msm_iommu.c | 34 +++---
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 94a6df1bddd6..10cd4db0710a 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -388,43 +388,23 @@ static struct msm_iommu_dev *find_iommu_for_dev(struct 
device *dev)
return ret;
 }
 
-static int msm_iommu_add_device(struct device *dev)
+static struct iommu_device *msm_iommu_probe_device(struct device *dev)
 {
struct msm_iommu_dev *iommu;
-   struct iommu_group *group;
unsigned long flags;
 
spin_lock_irqsave(_iommu_lock, flags);
iommu = find_iommu_for_dev(dev);
spin_unlock_irqrestore(_iommu_lock, flags);
 
-   if (iommu)
-   iommu_device_link(>iommu, dev);
-   else
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
+   if (!iommu)
+   return ERR_PTR(-ENODEV);
 
-   return 0;
+   return >iommu;
 }
 
-static void msm_iommu_remove_device(struct device *dev)
+static void msm_iommu_release_device(struct device *dev)
 {
-   struct msm_iommu_dev *iommu;
-   unsigned long flags;
-
-   spin_lock_irqsave(_iommu_lock, flags);
-   iommu = find_iommu_for_dev(dev);
-   spin_unlock_irqrestore(_iommu_lock, flags);
-
-   if (iommu)
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
 }
 
 static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device 
*dev)
@@ -708,8 +688,8 @@ static struct iommu_ops msm_iommu_ops = {
 */
.iotlb_sync = NULL,
.iova_to_phys = msm_iommu_iova_to_phys,
-   .add_device = msm_iommu_add_device,
-   .remove_device = msm_iommu_remove_device,
+   .probe_device = msm_iommu_probe_device,
+   .release_device = msm_iommu_release_device,
.device_group = generic_device_group,
.pgsize_bitmap = MSM_IOMMU_PGSIZES,
.of_xlate = qcom_iommu_of_xlate,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 29/34] iommu/omap: Convert to probe/release_device() call-backs

2020-04-29 Thread Joerg Roedel
From: Joerg Roedel 

Convert the OMAP IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/omap-iommu.c | 49 ++
 1 file changed, 13 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index ecc9d0829a91..6699fe6d9e06 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1640,15 +1640,13 @@ static phys_addr_t omap_iommu_iova_to_phys(struct 
iommu_domain *domain,
return ret;
 }
 
-static int omap_iommu_add_device(struct device *dev)
+static struct iommu_device *omap_iommu_probe_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data, *tmp;
+   struct platform_device *pdev;
struct omap_iommu *oiommu;
-   struct iommu_group *group;
struct device_node *np;
-   struct platform_device *pdev;
int num_iommus, i;
-   int ret;
 
/*
 * Allocate the archdata iommu structure for DT-based devices.
@@ -1657,7 +1655,7 @@ static int omap_iommu_add_device(struct device *dev)
 * IOMMU users.
 */
if (!dev->of_node)
-   return 0;
+   return ERR_PTR(-ENODEV);
 
/*
 * retrieve the count of IOMMU nodes using phandle size as element size
@@ -1670,27 +1668,27 @@ static int omap_iommu_add_device(struct device *dev)
 
arch_data = kcalloc(num_iommus + 1, sizeof(*arch_data), GFP_KERNEL);
if (!arch_data)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
for (i = 0, tmp = arch_data; i < num_iommus; i++, tmp++) {
np = of_parse_phandle(dev->of_node, "iommus", i);
if (!np) {
kfree(arch_data);
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
}
 
pdev = of_find_device_by_node(np);
if (!pdev) {
of_node_put(np);
kfree(arch_data);
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
}
 
oiommu = platform_get_drvdata(pdev);
if (!oiommu) {
of_node_put(np);
kfree(arch_data);
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
}
 
tmp->iommu_dev = oiommu;
@@ -1699,46 +1697,25 @@ static int omap_iommu_add_device(struct device *dev)
of_node_put(np);
}
 
+   dev->archdata.iommu = arch_data;
+
/*
 * use the first IOMMU alone for the sysfs device linking.
 * TODO: Evaluate if a single iommu_group needs to be
 * maintained for both IOMMUs
 */
oiommu = arch_data->iommu_dev;
-   ret = iommu_device_link(>iommu, dev);
-   if (ret) {
-   kfree(arch_data);
-   return ret;
-   }
-
-   dev->archdata.iommu = arch_data;
-
-   /*
-* IOMMU group initialization calls into omap_iommu_device_group, which
-* needs a valid dev->archdata.iommu pointer
-*/
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group)) {
-   iommu_device_unlink(>iommu, dev);
-   dev->archdata.iommu = NULL;
-   kfree(arch_data);
-   return PTR_ERR(group);
-   }
-   iommu_group_put(group);
 
-   return 0;
+   return >iommu;
 }
 
-static void omap_iommu_remove_device(struct device *dev)
+static void omap_iommu_release_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
 
if (!dev->of_node || !arch_data)
return;
 
-   iommu_device_unlink(_data->iommu_dev->iommu, dev);
-   iommu_group_remove_device(dev);
-
dev->archdata.iommu = NULL;
kfree(arch_data);
 
@@ -1763,8 +1740,8 @@ static const struct iommu_ops omap_iommu_ops = {
.map= omap_iommu_map,
.unmap  = omap_iommu_unmap,
.iova_to_phys   = omap_iommu_iova_to_phys,
-   .add_device = omap_iommu_add_device,
-   .remove_device  = omap_iommu_remove_device,
+   .probe_device   = omap_iommu_probe_device,
+   .release_device = omap_iommu_release_device,
.device_group   = omap_iommu_device_group,
.pgsize_bitmap  = OMAP_IOMMU_PGSIZES,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

2020-04-29 Thread Joerg Roedel
Hi,

here is the third version of this patch-set. Older versions can be found
here:

v1: https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/
(Has some more introductory text)

v2: https://lore.kernel.org/lkml/20200414131542.25608-1-j...@8bytes.org/

Changes v2 -> v3:

* Rebased v5.7-rc3

* Added a missing iommu_group_put() as reported by Lu Baolu.

* Added a patch to consolidate more initialization work in
  __iommu_probe_device(), fixing a bug where no 'struct
  device_iommu' was allocated in the hotplug path.

There is also a git-branch available with these patches applied:


https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v3

Please review. If there are no objections I plan to put these patches
into the IOMMU tree early next week.

Thanks,

Joerg

Joerg Roedel (33):
  iommu: Move default domain allocation to separate function
  iommu/amd: Implement iommu_ops->def_domain_type call-back
  iommu/vt-d: Wire up iommu_ops->def_domain_type
  iommu/amd: Remove dma_mask check from check_device()
  iommu/amd: Return -ENODEV in add_device when device is not handled by
IOMMU
  iommu: Add probe_device() and release_device() call-backs
  iommu: Move default domain allocation to iommu_probe_device()
  iommu: Keep a list of allocated groups in __iommu_probe_device()
  iommu: Move new probe_device path to separate function
  iommu: Split off default domain allocation from group assignment
  iommu: Move iommu_group_create_direct_mappings() out of
iommu_group_add_device()
  iommu: Export bus_iommu_probe() and make is safe for re-probing
  iommu/amd: Remove dev_data->passthrough
  iommu/amd: Convert to probe/release_device() call-backs
  iommu/vt-d: Convert to probe/release_device() call-backs
  iommu/arm-smmu: Convert to probe/release_device() call-backs
  iommu/pamu: Convert to probe/release_device() call-backs
  iommu/s390: Convert to probe/release_device() call-backs
  iommu/virtio: Convert to probe/release_device() call-backs
  iommu/msm: Convert to probe/release_device() call-backs
  iommu/mediatek: Convert to probe/release_device() call-backs
  iommu/mediatek-v1 Convert to probe/release_device() call-backs
  iommu/qcom: Convert to probe/release_device() call-backs
  iommu/rockchip: Convert to probe/release_device() call-backs
  iommu/tegra: Convert to probe/release_device() call-backs
  iommu/renesas: Convert to probe/release_device() call-backs
  iommu/omap: Remove orphan_dev tracking
  iommu/omap: Convert to probe/release_device() call-backs
  iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
  iommu/exynos: Convert to probe/release_device() call-backs
  iommu: Remove add_device()/remove_device() code-paths
  iommu: Move more initialization to __iommu_probe_device()
  iommu: Unexport iommu_group_get_for_dev()

Sai Praneeth Prakhya (1):
  iommu: Add def_domain_type() callback in iommu_ops

 drivers/iommu/amd_iommu.c   |  97 
 drivers/iommu/amd_iommu_types.h |   1 -
 drivers/iommu/arm-smmu-v3.c |  38 +---
 drivers/iommu/arm-smmu.c|  39 ++--
 drivers/iommu/exynos-iommu.c|  24 +-
 drivers/iommu/fsl_pamu_domain.c |  22 +-
 drivers/iommu/intel-iommu.c |  68 +-
 drivers/iommu/iommu.c   | 387 +---
 drivers/iommu/ipmmu-vmsa.c  |  60 ++---
 drivers/iommu/msm_iommu.c   |  34 +--
 drivers/iommu/mtk_iommu.c   |  24 +-
 drivers/iommu/mtk_iommu_v1.c|  50 ++---
 drivers/iommu/omap-iommu.c  |  99 ++--
 drivers/iommu/qcom_iommu.c  |  24 +-
 drivers/iommu/rockchip-iommu.c  |  26 +--
 drivers/iommu/s390-iommu.c  |  22 +-
 drivers/iommu/tegra-gart.c  |  24 +-
 drivers/iommu/tegra-smmu.c  |  31 +--
 drivers/iommu/virtio-iommu.c|  41 +---
 include/linux/iommu.h   |  21 +-
 20 files changed, 531 insertions(+), 601 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-29 Thread Joerg Roedel
On Mon, Apr 20, 2020 at 09:26:12AM -0400, Qian Cai wrote:
> No dice. There could be some other races. For example,

Can you please test this branch:


https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=amd-iommu-fixes

It has the previous fix in it and a couple more to make sure the
device-table is updated and flushed before increase_address_space()
updates domain->pt_root.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Jan Kiszka

On 29.04.20 12:45, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote:

On 29.04.20 12:20, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:

That would still not work I think where swiotlb is used for pass-thr devices
(when private memory is fine) as well as virtio devices (when shared memory is
required).


So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.



Definitely. That's the model we have for ivshmem-virtio as well.

Jan


Want to try implementing that?



The desire is definitely there, currently "just" not the time.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote:
> On 29.04.20 12:20, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> > > That would still not work I think where swiotlb is used for pass-thr 
> > > devices
> > > (when private memory is fine) as well as virtio devices (when shared 
> > > memory is
> > > required).
> > 
> > So that is a separate question. When there are multiple untrusted
> > devices, at the moment it looks like a single bounce buffer is used.
> > 
> > Which to me seems like a security problem, I think we should protect
> > untrusted devices from each other.
> > 
> 
> Definitely. That's the model we have for ivshmem-virtio as well.
> 
> Jan

Want to try implementing that?

> -- 
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Srivatsa Vaddagiri
* Michael S. Tsirkin  [2020-04-29 06:20:48]:

> On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> > That would still not work I think where swiotlb is used for pass-thr devices
> > (when private memory is fine) as well as virtio devices (when shared memory 
> > is
> > required).
> 
> So that is a separate question. When there are multiple untrusted
> devices, at the moment it looks like a single bounce buffer is used.
> 
> Which to me seems like a security problem, I think we should protect
> untrusted devices from each other.

I think as first step, let me see if we can make swiotlb driver accept a target
memory segment as its working area. That may suffice our needs I think.  A
subsequent step could be to make swiotlb driver recognize multiple pools.


-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Jan Kiszka

On 29.04.20 12:20, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:

That would still not work I think where swiotlb is used for pass-thr devices
(when private memory is fine) as well as virtio devices (when shared memory is
required).


So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.



Definitely. That's the model we have for ivshmem-virtio as well.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> That would still not work I think where swiotlb is used for pass-thr devices
> (when private memory is fine) as well as virtio devices (when shared memory is
> required).

So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.





> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Srivatsa Vaddagiri
* Michael S. Tsirkin  [2020-04-29 05:52:05]:

> > > So it seems that with modern Linux, all one needs
> > > to do on x86 is mark the device as untrusted.
> > > It's already possible to do this with ACPI and with OF - would that be
> > > sufficient for achieving what this patchset is trying to do?
> > 
> > In my case, its not sufficient to just mark virtio device untrusted and thus
> > activate the use of swiotlb. All of the secondary VM memory, including those
> > allocate by swiotlb driver, is private to it.
> 
> So why not make the bounce buffer memory shared then?

Its a limitation by our hypervisor. When a secondary VM is created, two
memory segments are allocated - one private and other shared. There is no
provision for the secondary VM to make part of its private memory shared after
it boots. I can perhaps consider a change in swiotlb driver to accept the second
shared memory segment as its main working area (rather than allocate its own).

That would still not work I think where swiotlb is used for pass-thr devices
(when private memory is fine) as well as virtio devices (when shared memory is
required).

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 03:14:10PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin  [2020-04-29 02:50:41]:
> 
> > So it seems that with modern Linux, all one needs
> > to do on x86 is mark the device as untrusted.
> > It's already possible to do this with ACPI and with OF - would that be
> > sufficient for achieving what this patchset is trying to do?
> 
> In my case, its not sufficient to just mark virtio device untrusted and thus
> activate the use of swiotlb. All of the secondary VM memory, including those
> allocate by swiotlb driver, is private to it.

So why not make the bounce buffer memory shared then?

> An additional piece of memory is
> available to secondary VM which is shared between VMs and which is where I 
> need
> swiotlb driver to do its work.
> 
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Srivatsa Vaddagiri
* Michael S. Tsirkin  [2020-04-29 02:50:41]:

> So it seems that with modern Linux, all one needs
> to do on x86 is mark the device as untrusted.
> It's already possible to do this with ACPI and with OF - would that be
> sufficient for achieving what this patchset is trying to do?

In my case, its not sufficient to just mark virtio device untrusted and thus
activate the use of swiotlb. All of the secondary VM memory, including those
allocate by swiotlb driver, is private to it. An additional piece of memory is
available to secondary VM which is shared between VMs and which is where I need
swiotlb driver to do its work.

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-29 Thread Joerg Roedel
Hi Qian,

On Mon, Apr 20, 2020 at 09:26:12AM -0400, Qian Cai wrote:
> 
> No dice. There could be some other races. For example,

Okay, I think I know what is happening. The increase_address_space()
function increases the address space, but does not update the
DTE and does not flush the old DTE from the caches. But this needs to
happen before domain->pt_root is updated, because otherwise another CPU
can come along and map something into the increased address-space which
is not yet accessible by the device because the DTE is not updated yet.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] perf/smmuv3: Allow sharing MMIO registers with the SMMU driver

2020-04-29 Thread Jean-Philippe Brucker
On Tue, Apr 28, 2020 at 11:10:09AM -0700, Tuan Phan wrote:
> I tested this patch on HW, however I need to add one more following change to 
> make it works

Thanks for testing. I don't understand why you need the change below
though, do you know which other region is conflicting with the SMMU?
It should be displayed in the error message and /proc/iomem.

Thanks,
Jean

> @@ -2854,7 +2854,7 @@ static int arm_smmu_device_probe(struct platform_device 
> *pdev)
> }
> ioaddr = res->start;
>  
> -   smmu->base = devm_ioremap_resource(dev, res);
> +   smmu->base = devm_ioremap(dev, res->start, resource_size(res));
> if (IS_ERR(smmu->base))
> return PTR_ERR(smmu->base);
> 
> 
> > } else {
> > smmu_pmu->reloc_base = smmu_pmu->reg_base;
> > }
> > -- 
> > 2.26.0
> > 
> > 
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] perf/smmuv3: Allow sharing MMIO registers with the SMMU driver

2020-04-29 Thread Tuan Phan


> On Apr 21, 2020, at 8:57 AM, Jean-Philippe Brucker  
> wrote:
> 
> Some Arm SMMUv3 implementations, for example Arm CoreLink MMU-600, embed
> the PMCG registers into the SMMU MMIO regions. It currently causes probe
> failure because the PMU and SMMU drivers request overlapping resources.
> 
> Avoid the conflict by calling devm_ioremap() directly from the PMU
> driver. We loose some sanity-checking of the memory map provided by
> firmware, which doesn't seem catastrophic.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
> 
> So this is the simplest solution, and I don't think we're missing much
> by skipping the resource reservation. I've also been exploring a more
> complex approach [1] which has the SMMU driver perform resource
> reservation on behalf of the PMU driver, but I'm not sure it's
> necessary.
> 
> Please test, I've only tried the RevC FastModel using devicetree so far.
> 
> [1] https://jpbrucker.net/git/linux/log/?h=smmu/pmu
> ---
> drivers/perf/arm_smmuv3_pmu.c | 28 +---
> 1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> index ca183a53a7f10..ad63d1e7f 100644
> --- a/drivers/perf/arm_smmuv3_pmu.c
> +++ b/drivers/perf/arm_smmuv3_pmu.c
> @@ -730,8 +730,8 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
> *smmu_pmu)
> 
> static int smmu_pmu_probe(struct platform_device *pdev)
> {
> + struct resource *res_0, *res_1;
>   struct smmu_pmu *smmu_pmu;
> - struct resource *res_0;
>   u32 cfgr, reg_size;
>   u64 ceid_64[2];
>   int irq, err;
> @@ -759,18 +759,32 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
>   };
> 
> + /*
> +  * If the PMCG registers are embedded into the SMMU regions, the
> +  * resources have to be shared with the SMMU driver. Use ioremap()
> +  * rather than ioremap_resource() to avoid conflicts.
> +  */
>   res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> - smmu_pmu->reg_base = devm_ioremap_resource(dev, res_0);
> - if (IS_ERR(smmu_pmu->reg_base))
> - return PTR_ERR(smmu_pmu->reg_base);
> + if (!res_0)
> + return -ENXIO;
> +
> + smmu_pmu->reg_base = devm_ioremap(dev, res_0->start,
> +   resource_size(res_0));
> + if (!smmu_pmu->reg_base)
> + return -ENOMEM;
> 
>   cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
> 
>   /* Determine if page 1 is present */
>   if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
> - smmu_pmu->reloc_base = devm_platform_ioremap_resource(pdev, 1);
> - if (IS_ERR(smmu_pmu->reloc_base))
> - return PTR_ERR(smmu_pmu->reloc_base);
> + res_1 = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> + if (!res_1)
> + return -ENXIO;
> +
> + smmu_pmu->reloc_base = devm_ioremap(dev, res_1->start,
> + resource_size(res_1));
> + if (!smmu_pmu->reloc_base)
> + return -ENOMEM;

I tested this patch on HW, however I need to add one more following change to 
make it works

@@ -2854,7 +2854,7 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
}
ioaddr = res->start;
 
-   smmu->base = devm_ioremap_resource(dev, res);
+   smmu->base = devm_ioremap(dev, res->start, resource_size(res));
if (IS_ERR(smmu->base))
return PTR_ERR(smmu->base);


>   } else {
>   smmu_pmu->reloc_base = smmu_pmu->reg_base;
>   }
> -- 
> 2.26.0
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Lu Baolu

On 2020/4/29 14:50, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 01:42:13PM +0800, Lu Baolu wrote:

On 2020/4/29 12:57, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:

On 2020/4/29 4:41, Michael S. Tsirkin wrote:

On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:

* Michael S. Tsirkin   [2020-04-28 12:17:57]:


Okay, but how is all this virtio specific?  For example, why not allow
separate swiotlbs for any type of device?
For example, this might make sense if a given device is from a
different, less trusted vendor.

Is swiotlb commonly used for multiple devices that may be on different trust
boundaries (and not behind a hardware iommu)?

Even a hardware iommu does not imply a 100% security from malicious
hardware. First lots of people use iommu=pt for performance reasons.
Second even without pt, unmaps are often batched, and sub-page buffers
might be used for DMA, so we are not 100% protected at all times.


For untrusted devices, IOMMU is forced on even iommu=pt is used;

I think you are talking about untrusted*drivers*  like with VFIO.

No. I am talking about untrusted devices like thunderbolt peripherals.
We always trust drivers hosted in kernel and the DMA APIs are designed
for them, right?

Please refer to this series.

https://lkml.org/lkml/2019/9/6/39

Best regards,
baolu

Oh, thanks for that! I didn't realize Linux is doing this.

So it seems that with modern Linux, all one needs
to do on x86 is mark the device as untrusted.
It's already possible to do this with ACPI and with OF - would that be
sufficient for achieving what this patchset is trying to do?


Yes.



Adding more ways to mark a device as untrusted, and adding
support for more platforms to use bounce buffers
sounds like a reasonable thing to do.



Agreed.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 01:42:13PM +0800, Lu Baolu wrote:
> On 2020/4/29 12:57, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> > > On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > > > * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> > > > > 
> > > > > > Okay, but how is all this virtio specific?  For example, why not 
> > > > > > allow
> > > > > > separate swiotlbs for any type of device?
> > > > > > For example, this might make sense if a given device is from a
> > > > > > different, less trusted vendor.
> > > > > Is swiotlb commonly used for multiple devices that may be on 
> > > > > different trust
> > > > > boundaries (and not behind a hardware iommu)?
> > > > Even a hardware iommu does not imply a 100% security from malicious
> > > > hardware. First lots of people use iommu=pt for performance reasons.
> > > > Second even without pt, unmaps are often batched, and sub-page buffers
> > > > might be used for DMA, so we are not 100% protected at all times.
> > > > 
> > > 
> > > For untrusted devices, IOMMU is forced on even iommu=pt is used;
> > 
> > I think you are talking about untrusted *drivers* like with VFIO.
> 
> No. I am talking about untrusted devices like thunderbolt peripherals.
> We always trust drivers hosted in kernel and the DMA APIs are designed
> for them, right?
> 
> Please refer to this series.
> 
> https://lkml.org/lkml/2019/9/6/39
> 
> Best regards,
> baolu

Oh, thanks for that! I didn't realize Linux is doing this.

So it seems that with modern Linux, all one needs
to do on x86 is mark the device as untrusted.
It's already possible to do this with ACPI and with OF - would that be
sufficient for achieving what this patchset is trying to do?

Adding more ways to mark a device as untrusted, and adding
support for more platforms to use bounce buffers
sounds like a reasonable thing to do.

> > 
> > On the other hand, I am talking about things like thunderbolt
> > peripherals being less trusted than on-board ones.
> 
> 
> 
> > 
> > Or possibly even using swiotlb for specific use-cases where
> > speed is less of an issue.
> > 
> > E.g. my wifi is pretty slow anyway, and that card is exposed to
> > malicious actors all the time, put just that behind swiotlb
> > for security, and leave my graphics card with pt since
> > I'm trusting it with secrets anyway.
> > 
> > 
> > > and
> > > iotlb flush is in strict mode (no batched flushes); ATS is also not
> > > allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> > > only apply page granularity protection. Swiotlb is now used for devices
> > > from different trust zone.
> > > 
> > > Best regards,
> > > baolu
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support

2020-04-29 Thread Lu Baolu

Hi Jacob,

On 2020/4/29 11:36, Jacob Pan wrote:

On Wed, 22 Apr 2020 16:06:10 +0800
Lu Baolu  wrote:


When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in remapping hardware.
This adds the interface to drain page requests and call it when a
PASID is terminated.

Signed-off-by: Jacob Pan
Signed-off-by: Liu Yi L
Signed-off-by: Lu Baolu
---
  drivers/iommu/intel-svm.c   | 103
++-- include/linux/intel-iommu.h |
4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 83dc4319f661..2534641ef707 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -23,6 +23,7 @@
  #include "intel-pasid.h"
  
  static irqreturn_t prq_event_thread(int irq, void *d);

+static void intel_svm_drain_prq(struct device *dev, int pasid);
  
  #define PRQ_ORDER 0
  
@@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)

dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
dmar_writeq(iommu->reg + DMAR_PQA_REG,
virt_to_phys(iommu->prq) | PRQ_ORDER);
+   init_completion(>prq_complete);
+
return 0;
  }
  
@@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier

*mn, struct mm_struct *mm) rcu_read_lock();
list_for_each_entry_rcu(sdev, >devs, list) {
intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
svm->pasid);
+   intel_svm_drain_prq(sdev->dev, svm->pasid);

mmu_notifier release is called in atomic context, drain_prq needs to
wait for completion. I tested exit path when a process crashes. I got

[  +0.696214] BUG: sleeping function called from invalid context at 
kernel/sched/completion.c:101
[  +0.68] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, 
name: dsatest
[  +0.46] INFO: lockdep is turned off.
[  +0.02] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ 
#1637
[  +0.00] Hardware name: Intel Corporation Kabylake Client platform/Skylake 
Halo DDR4 RVP11, BIOS
04.1709050855 09/05/2017
[  +0.01] Call Trace:
[  +0.04]  dump_stack+0x68/0x9b
[  +0.03]  ___might_sleep+0x229/0x250
[  +0.03]  wait_for_completion_timeout+0x3c/0x110
[  +0.03]  intel_svm_drain_prq+0x12f/0x210
[  +0.03]  intel_mm_release+0x73/0x110
[  +0.03]  __mmu_notifier_release+0x94/0x220
[  +0.02]  ? do_munmap+0x10/0x10
[  +0.02]  ? prepare_ftrace_return+0x5c/0x80
[  +0.03]  exit_mmap+0x156/0x1a0
[  +0.02]  ? mmput+0x44/0x120
[  +0.03]  ? exit_mmap+0x5/0x1a0
[  +0.02]  ? ftrace_graph_caller+0xa0/0xa0
[  +0.01]  mmput+0x5e/0x120




Thanks a lot!

Actually, we can't drain page requests in this mm_notifier code path,
right? The assumptions of page request draining are that 1) the device
driver has drained DMA requests in the device end; 2) the pasid entry
has been marked as non-present. So we could only drain page requests in
the unbind path.

Thought?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu