RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-10 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 3:57 AM
> To: Liu, Yi L 
> Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> On Sun, 22 Mar 2020 05:32:03 -0700
> "Liu, Yi L"  wrote:
> 
> > From: Liu Yi L 
> >
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
[...]
> > +/**
> > + * Unbind specific gpasid, caller of this function requires hold
> > + * vfio_iommu->lock
> > + */
> > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
> > +   struct iommu_gpasid_bind_data *gbind_data)
> > +{
> > +   return vfio_iommu_for_each_dev(iommu,
> > +   vfio_unbind_gpasid_fn, gbind_data);
> > +}
> > +
> > +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu,
> > +   struct iommu_gpasid_bind_data *gbind_data)
> > +{
> > +   int ret = 0;
> > +
> > +   mutex_lock(>lock);
> > +   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   ret = vfio_iommu_for_each_dev(iommu,
> > +   vfio_bind_gpasid_fn, gbind_data);
> > +   /*
> > +* If bind failed, it may not be a total failure. Some devices
> > +* within the iommu group may have bind successfully. Although
> > +* we don't enable pasid capability for non-singletion iommu
> > +* groups, a unbind operation would be helpful to ensure no
> > +* partial binding for an iommu group.
> 
> Where was the non-singleton group restriction done, I missed that.
> 
> > +*/
> > +   if (ret)
> > +   /*
> > +* Undo all binds that already succeeded, no need to
> > +* check the return value here since some device within
> > +* the group has no successful bind when coming to this
> > +* place switch.
> > +*/
> > +   vfio_iommu_type1_do_guest_unbind(iommu, gbind_data);
> 
> However, the for_each_dev function stops when the callback function
> returns error, are we just assuming we stop at the same device as we
> faulted on the first time and that we traverse the same set of devices
> the second time?  It seems strange to me that unbind should be able to
> fail.

I think the code needs enhancement. Although one group per container
and one device per group is the most typical and desired case, but
the code here loops domains, groups, devices. It should be able to
unwind prior bind when the loop failed for a device. So I plan to do
below change for bind path.

list_for_each_entry(domain, >domain_list, next) {
list_for_each_entry(group, >group_list, next) {
/*
  * if bind failed on a certain device, should unbind prior 
successful
  * bind iommu_group_for_each_dev() should be modified to take 
two
  * callbacks, one for forward loop and one for reverse loop 
when failure
  * happened. "return_upon_failure" indicates whether return 
upon failure
  * during forward loop or not. If yes, 
iommu_group_for_each_dev() should
  * unwind the prior bind in this iommu group before return.
  */
ret = iommu_group_for_each_dev(iommu_group, bind_gpasid_fn,
unbind_gpasid_fn, data, 
return_upon_failure);
if (ret)
break;
}
if (ret) {
/* unwind bindings with prior groups */
list_for_each_entry_continue_reverse(group,
>group_list, 
next) {
iommu_group_for_each_dev(iommu_group, unbind_gpasid_fn,
NULL, data, 
ignore_intermediate_failure);
}
break;
}
}

if (ret) {
/* unwind bindings with prior domains */
list_for_each_entry_continue_reverse(domain, >domain_list, next) 
{
iommu_group_for_each_dev(iommu_group, unbind_gpasid_fn,
NULL, data, 
ignore_intermediate_failure);
}
}
}

return ret;

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 20/28] mm: remove the pgprot argument to __vmalloc

2020-04-10 Thread Michael Kelley via iommu
From: Christoph Hellwig  Sent: Wednesday, April 8, 2020 4:59 AM
> 
> The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove
> it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/hyperv/hv_init.c  |  3 +--
>  arch/x86/include/asm/kvm_host.h|  3 +--
>  arch/x86/kvm/svm.c |  3 +--
>  drivers/block/drbd/drbd_bitmap.c   |  4 +---
>  drivers/gpu/drm/etnaviv/etnaviv_dump.c |  4 ++--
>  drivers/lightnvm/pblk-init.c   |  5 ++---
>  drivers/md/dm-bufio.c  |  4 ++--
>  drivers/mtd/ubi/io.c   |  4 ++--
>  drivers/scsi/sd_zbc.c  |  3 +--
>  fs/gfs2/dir.c  |  9 -
>  fs/gfs2/quota.c|  2 +-
>  fs/nfs/blocklayout/extent_tree.c   |  2 +-
>  fs/ntfs/malloc.h   |  2 +-
>  fs/ubifs/debug.c   |  2 +-
>  fs/ubifs/lprops.c  |  2 +-
>  fs/ubifs/lpt_commit.c  |  4 ++--
>  fs/ubifs/orphan.c  |  2 +-
>  fs/xfs/kmem.c  |  2 +-
>  include/linux/vmalloc.h|  2 +-
>  kernel/bpf/core.c  |  6 +++---
>  kernel/groups.c|  2 +-
>  kernel/module.c|  3 +--
>  mm/nommu.c | 15 +++
>  mm/page_alloc.c|  2 +-
>  mm/percpu.c|  2 +-
>  mm/vmalloc.c   |  4 ++--
>  net/bridge/netfilter/ebtables.c|  6 ++
>  sound/core/memalloc.c  |  2 +-
>  sound/core/pcm_memory.c|  2 +-
>  29 files changed, 47 insertions(+), 59 deletions(-)
> 

For Hyper-V changes,

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-10 Thread Minchan Kim
Hi Sergey,

On Fri, Apr 10, 2020 at 11:38:45AM +0900, Sergey Senozhatsky wrote:
> On (20/04/09 10:08), Minchan Kim wrote:
> > > > Even though I don't know how many usecase we have using zsmalloc as
> > > > module(I heard only once by dumb reason), it could affect existing
> > > > users. Thus, please include concrete explanation in the patch to
> > > > justify when the complain occurs.
> > > 
> > > The justification is 'we can unexport functions that have no sane reason
> > > of being exported in the first place'.
> > > 
> > > The Changelog pretty much says that.
> > 
> > Okay, I hope there is no affected user since this patch.
> > If there are someone, they need to provide sane reason why they want
> > to have zsmalloc as module.
> 
> I'm one of those who use zsmalloc as a module - mainly because I use zram
> as a compressing general purpose block device, not as a swap device.
> I create zram0, mkfs, mount, checkout and compile code, once done -
> umount, rmmod. This reduces the number of writes to SSD. Some people use
> tmpfs, but zram device(-s) can be much larger in size. That's a niche use
> case and I'm not against the patch.

It doesn't mean we couldn't use zsmalloc as module any longer. It means
we couldn't use zsmalloc as module with pgtable mapping whcih was little
bit faster on microbenchmark in some architecutre(However, I usually temped
to remove it since it had several problems). However, we could still use
zsmalloc as module as copy way instead of pgtable mapping. Thus, if someone
really want to rollback the feature, they should provide reasonable reason
why it doesn't work for them. "A little fast" wouldn't be enough to exports
deep internal to the module.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page

2020-04-10 Thread Michael Kelley via iommu
From: Christoph Hellwig  Sent: Wednesday, April 8, 2020 4:59 AM
> 
> Use the designated helper for allocating executable kernel memory, and
> remove the now unused PAGE_KERNEL_RX define.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/hyperv/hv_init.c| 2 +-
>  arch/x86/include/asm/pgtable_types.h | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 07/10] iommu/vt-d: Add svm/sva invalidate function

2020-04-10 Thread Jacob Pan
On Thu, 9 Apr 2020 10:50:34 +0200
Auger Eric  wrote:

> Hi Jacob,
> 
> On 4/3/20 8:42 PM, Jacob Pan wrote:
> > When Shared Virtual Address (SVA) is enabled for a guest OS via
> > vIOMMU, we need to provide invalidation support at IOMMU API and
> > driver level. This patch adds Intel VT-d specific function to
> > implement iommu passdown invalidate API for shared virtual address.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the
> > guest to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > ---
> > v11 - Removed 2D map array, use -EINVAL in granularity lookup array.
> >   Fixed devTLB invalidation granularity mapping. Disregard G=1
> > case and use address selective invalidation only.
> > 
> > v7 review fixed in v10
> > ---
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/intel-iommu.c | 158
> >  1 file changed, 158
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 94c7993dac6a..045c5c08d71d
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5594,6 +5594,163 @@ static void
> > intel_iommu_aux_detach_device(struct iommu_domain *domain,
> > aux_domain_remove_dev(to_dmar_domain(domain), dev); }
> >  
> > +/*
> > + * 2D array for converting and sanitizing IOMMU generic TLB
> > granularity to
> > + * VT-d granularity. Invalidation is typically included in the
> > unmap operation
> > + * as a result of DMA or VFIO unmap. However, for assigned devices
> > guest
> > + * owns the first level page tables. Invalidations of translation
> > caches in the
> > + * guest are trapped and passed down to the host.
> > + *
> > + * vIOMMU in the guest will only expose first level page tables,
> > therefore
> > + * we do not support IOTLB granularity for request without PASID
> > (second level).
> > + *
> > + * For example, to find the VT-d granularity encoding for IOTLB
> > + * type and page selective granularity within PASID:
> > + * X: indexed by iommu cache type
> > + * Y: indexed by enum iommu_inv_granularity
> > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> > + */
> > +
> > +const static int
> > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] =
> > {
> > +   /*
> > +* PASID based IOTLB invalidation: PASID selective (per
> > PASID),
> > +* page selective (address granularity)
> > +*/
> > +   {-EINVAL, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> > +   /* PASID based dev TLBs */
> > +   {-EINVAL, -EINVAL, QI_DEV_IOTLB_GRAN_PASID_SEL},
> > +   /* PASID cache */
> > +   {-EINVAL, -EINVAL, -EINVAL}
> > +};
> > +
> > +static inline int to_vtd_granularity(int type, int granu)
> > +{
> > +   return inv_type_granu_table[type][granu];
> > +}
> > +
> > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> > +{
> > +   u64 nr_pages = (granu_size * nr_granules) >>
> > VTD_PAGE_SHIFT; +
> > +   /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9
> > for 2MB, etc.
> > +* IOMMU cache invalidate API passes granu_size in bytes,
> > and number of
> > +* granu size in contiguous memory.
> > +*/
> > +   return order_base_2(nr_pages);
> > +}
> > +
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct
> > iommu_cache_invalidate_info *inv_info) +{
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct device_domain_info *info;
> > +   struct intel_iommu *iommu;
> > +   unsigned long flags;
> > +   int cache_type;
> > +   u8 bus, devfn;
> > +   u16 did, sid;
> > +   int ret = 0;
> > +   u64 size = 0;
> > +
> > +   if (!inv_info || !dmar_domain ||
> > +   inv_info->version !=
> > IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> > +   return -EINVAL;
> > +
> > +   if (!dev || !dev_is_pci(dev))
> > +   return -ENODEV;  
> 
> Check (domain->flags & DOMAIN_FLAG_NESTING_MODE)?
Good point

> > +
> > +   iommu = device_to_iommu(dev, , );
> > +   if (!iommu)
> > +   return -ENODEV;
> > +
> > +   spin_lock_irqsave(_domain_lock, flags);
> > +   spin_lock(>lock);
> > +   info = iommu_support_dev_iotlb(dmar_domain, iommu, bus,
> > devfn);
> > +   if (!info) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +   did = dmar_domain->iommu_did[iommu->seq_id];
> > +   sid = PCI_DEVID(bus, devfn);
> > +
> > +   /* Size is only valid in non-PASID selective invalidation
> > */
> > +   if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
> > + 

Re: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support

2020-04-10 Thread Jacob Pan
Hi Eric,

Missed a few things in the last reply.

On Thu, 9 Apr 2020 09:41:32 +0200
Auger Eric  wrote:

> > +   intel_pasid_tear_down_entry(iommu, dev,
> > svm->pasid);  
> intel_svm_unbind_mm() calls intel_flush_svm_range_dev(svm, sdev, 0,
> -1, 0); Don't we need to flush the (DEV-)IOTLBs as well?
Right, pasid tear down should always include (DEV-)IOTLB flush, I
initially thought it is taken care of by intel_pasid_tear_down_entry().

> > +   /* TODO: Drain in flight PRQ for the PASID
> > since it
> > +* may get reused soon, we don't want to
> > +* confuse with its previous life.
> > +* intel_svm_drain_prq(dev, pasid);
> > +*/
> > +   kfree_rcu(sdev, rcu);
> > +
> > +   if (list_empty(>devs)) {
> > +   /*
> > +* We do not free the IOASID here
> > in that
> > +* IOMMU driver did not allocate
> > it.  
> s/in/as?
I meant to say "in that" as "for the reason that"

> > +* Unlike native SVM, IOASID for
> > guest use was
> > +* allocated prior to the bind
> > call.> + * In any case, if the free
> > call comes before
> > +* the unbind, IOMMU driver will
> > get notified
> > +* and perform cleanup.
> > +*/
> > +   ioasid_set_data(pasid, NULL);
> > +   kfree(svm);
> > +   }  
> nit: you may use intel_svm_free_if_empty()
True, but I meant to insert ioasid_notifier under the list_empty
condition in the following ioasid patch.


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support

2020-04-10 Thread Jacob Pan
Hi Eric,

On Thu, 9 Apr 2020 09:41:32 +0200
Auger Eric  wrote:

> Hi Jacob,
> 
> On 4/3/20 8:42 PM, Jacob Pan wrote:
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
> > 
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> > 
> > .-.  .---.
> > |   vIOMMU|  | Guest process CR3, FL only|
> > | |  '---'
> > ./
> > | PASID Entry |--- PASID cache flush -
> > '-'   |
> > | |   V
> > | |CR3 in GPA
> > '-'
> > Guest
> > --| Shadow |--|
> >   vv  v
> > Host
> > .-.  .--.
> > |   pIOMMU|  | Bind FL for GVA-GPA  |
> > | |  '--'
> > ./  |
> > | PASID Entry | V (Nested xlate)
> > '\.--.
> > | |   |SL for GPA-HPA, default domain|
> > | |   '--'
> > '-'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> > 
> > ---
> > v11: Fixed locking, avoid duplicated paging mode check, added
> > helper to free svm if device list is empty. Use rate limited error
> > message since the bind gpasid call comes from user space.
> > ---
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/intel-iommu.c |   4 +
> >  drivers/iommu/intel-svm.c   | 206
> > 
> > include/linux/intel-iommu.h |   8 +- include/linux/intel-svm.h   |
> > 17  4 files changed, 234 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index c0dadec5a6b3..94c7993dac6a
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -6178,6 +6178,10 @@ const struct iommu_ops intel_iommu_ops = {
> > .dev_disable_feat   = intel_iommu_dev_disable_feat,
> > .is_attach_deferred =
> > intel_iommu_is_attach_deferred, .pgsize_bitmap  =
> > INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +   .sva_bind_gpasid= intel_svm_bind_gpasid,
> > +   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> > +#endif
> >  };
> >  
> >  static void quirk_iommu_igfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index d7f2a5358900..7cf711318b87 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -226,6 +226,212 @@ static LIST_HEAD(global_svm_list);
> > list_for_each_entry((sdev), &(svm)->devs, list) \
> > if ((d) != (sdev)->dev) {} else
> >  
> > +
> > +static inline void intel_svm_free_if_empty(struct intel_svm *svm,
> > u64 pasid) +{
> > +   if (list_empty(>devs)) {
> > +   ioasid_set_data(pasid, NULL);
> > +   kfree(svm);
> > +   }
> > +}
> > +
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +   struct device *dev,
> > +   struct iommu_gpasid_bind_data *data)
> > +{
> > +   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +   struct dmar_domain *dmar_domain;
> > +   struct intel_svm_dev *sdev;
> > +   struct intel_svm *svm;
> > +   int ret = 0;
> > +
> > +   if (WARN_ON(!iommu) || !data)
> > +   return -EINVAL;
> > +
> > +   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > +   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > +   return -EINVAL;
> > +
> > +   if (dev_is_pci(dev)) {
> > +   /* VT-d supports devices with full 20 bit PASIDs
> > only */
> > +   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > +   return -EINVAL;
> > +   } else {
> > +   return -ENOTSUPP;
> > +   }
> > +
> > +   /*
> > +* We only check host PASID range, we have no knowledge to
> > check
> > +* guest PASID range.
> > +*/
> > +   if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > +   return -EINVAL;
> > +
> > +   dmar_domain = to_dmar_domain(domain);
> > +
> > +   mutex_lock(_mutex);
> > +   svm = ioasid_find(NULL, data->hpasid, NULL);
> > +   if (IS_ERR(svm)) {
> > +   ret = PTR_ERR(svm);
> > +   goto out;
> > +   }
> > +
> > +   if (svm) {
> > +   /*
> > +* If we found svm for the PASID, there must be at
> > +* least one device bond, otherwise svm should be
> > freed.
> > +*/
> > +   if 

Re: [rfc v2 3/6] dma-pool: dynamically expanding atomic pools

2020-04-10 Thread David Rientjes via iommu
On Fri, 10 Apr 2020, Hillf Danton wrote:

> 
> On Wed, 8 Apr 2020 14:21:06 -0700 (PDT) David Rientjes wrote:
> > 
> > When an atomic pool becomes fully depleted because it is now relied upon
> > for all non-blocking allocations through the DMA API, allow background
> > expansion of each pool by a kworker.
> > 
> > When an atomic pool has less than the default size of memory left, kick
> > off a kworker to dynamically expand the pool in the background.  The pool
> > is doubled in size, up to MAX_ORDER-1.  If memory cannot be allocated at
> > the requested order, smaller allocation(s) are attempted.
> > 
> What is proposed looks like a path of single lane without how to
> dynamically shrink the pool taken into account. Thus the risk may
> rise in corner cases where pools are over-expanded in long run
> after one-off peak allocation requests.
> 

To us, this is actually a benefit: we prefer the peak size to be 
maintained so that we do not need to dynamic resize the pool later at the 
cost of throughput.  Genpool also does not have great support for 
scavenging and freeing unused chunks.

Perhaps we could enforce a maximum size on the pools just as we allow the 
default size to be defined by coherent_size= on the command line.  Our use 
case would not set this, however, since we have not seen egregious genpool 
sizes as the result of non-blockable DMA allocations (perhaps the drivers 
we use just play friendlier and you have seen excessive usage?).

I'll rely on Christoph to determine whether it makes sense to add some 
periodic scavening of the atomic pools, whether that's needed for this to 
be merged, or wheter we should enforce some maximum pool size.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-10 Thread Jacob Pan
On Thu, 9 Apr 2020 16:50:58 +0200
Jean-Philippe Brucker  wrote:

> > So unbind is coming anyway, the difference in handling in mmu
> > release notifier is whether we silently drop DMA fault vs.
> > reporting fault?  
> 
> What I meant is, between mmu release notifier and unbind(), we can't
> print any error from DMA fault on dmesg, because an mm exit is easily
> triggered by userspace. Look at the lifetime of the bond:
> 
> bind()
>  |
>  : Here any DMA fault is handled by mm, and on error we don't print
>  : anything to dmesg. Userspace can easily trigger faults by issuing
> DMA : on unmapped buffers.
>  |
> mm exit -> clear pgd, invalidate IOTLBs
>  |
>  : Here the PASID descriptor doesn't have the pgd anymore, but we
> don't : print out any error to dmesg either. DMA is likely still
> running but : any fault has to be ignored.
>  :
>  : We also can't free the PASID yet, since transactions are still
> coming : in with this PASID.
>  |
> unbind() -> clear context descriptor, release PASID and mmu notifier
>  |
>  : Here the PASID descriptor is clear. If DMA is still running the
> device : driver really messed up and we have to print out any fault.
> 
> For that middle state I had to introduce a new pasid descriptor state
> in the SMMU driver, to avoid reporting errors between mm exit and
> unbind().
I must have missed something, but why bother with a state when you can
always check if the mm is dead by mmget_not_zero()? You would not
handle IOPF if the mm is dead anyway, similarly for other DMA errors.

Also, since you are not freeing ioasid in mmu_notifier release anymore,
does it mean the IOASID notifier chain can be non-atomic?

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 08/10] iommu/ioasid: Introduce notifier APIs

2020-04-10 Thread Jacob Pan
On Wed, 1 Apr 2020 16:00:06 +0200
Jean-Philippe Brucker  wrote:

> On Wed, Mar 25, 2020 at 10:55:29AM -0700, Jacob Pan wrote:
> > IOASID users fit into the publisher-subscriber pattern, a system
> > wide blocking notifier chain can be used to inform subscribers of
> > state changes. Notifier mechanism also abstracts publisher from
> > knowing the private context each subcriber may have.
> > 
> > This patch adds APIs and a global notifier chain, a further
> > optimization might be per set notifier for ioasid_set aware users.
> > 
> > Usage example:
> > KVM register notifier block such that it can keep its guest-host
> > PASID translation table in sync with any IOASID updates.  
> 
> When you talk about KVM, is it for
> 
>   [PATCH 0/7] x86: tag application address space for devices
> 
> or something else as well? (I don't see mentions of KVM in that
> series)
> 
Yes, related to this set. This is set is for native ENQCMD support.
VMCS use of IOASID notifier is for the guest SVA + ENQCMD.
We need to maintain a G-H PASID translation in VMCS PASID translation
table. When guest binds a GPASID to a host PASID, this translation
table can be updated such that subsequent ENQCMD in the guest can
resolve to a host PASID.

CH 7.3.1 of DSA spec.
https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf
 
> > 
> > VFIO publish IOASID change by performing alloc/free, bind/unbind
> > operations.  
> 
> I was rather seeing IOASID as the end of the VFIO-IOMMU-IOASID chain,
> putting it in the middle complicates locking. If you only need to FREE
> notifier for this calse, maybe VFIO could talk directly to the IOMMU
> driver before freeing an IOASID?  gpasid_unbind() should already
> clear the PASID contexts, no?
> 
Yes, VFIO can track all the PASIDs and make sure they do unbind before
free. But that might be more complicated in VFIO, whereas here, when a
guest exits, VFIO can just free the entire IOASID set, IOASID will
notify IOMMU and do all the cleanup.

For maintaining VMCS pasid translation table, KVM still need to know
bind/unbind in addition to free events.

In addition, we also have VDCM (virtual device composition module) that
needs to perform G-H PASID translation and sanity check. VDCM needs the
free event only. This is also in the DSA spec above. The use is that
when the guest programs a GPASID into a virtual device, VDCM (similar
to SRIOV PDEV driver) needs to intercept (via vfio mdev) and translate
GPASID to HPASID.

> Thanks,
> Jean
> 
> > IOMMU driver gets notified when IOASID is freed by VFIO or core mm
> > code such that PASID context can be cleaned up.
> > 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan   

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc v2 3/6] dma-pool: dynamically expanding atomic pools

2020-04-10 Thread Hillf Danton


On Wed, 8 Apr 2020 14:21:06 -0700 (PDT) David Rientjes wrote:
> 
> When an atomic pool becomes fully depleted because it is now relied upon
> for all non-blocking allocations through the DMA API, allow background
> expansion of each pool by a kworker.
> 
> When an atomic pool has less than the default size of memory left, kick
> off a kworker to dynamically expand the pool in the background.  The pool
> is doubled in size, up to MAX_ORDER-1.  If memory cannot be allocated at
> the requested order, smaller allocation(s) are attempted.
> 
What is proposed looks like a path of single lane without how to
dynamically shrink the pool taken into account. Thus the risk may
rise in corner cases where pools are over-expanded in long run
after one-off peak allocation requests.

Is it worth the complexity of expander + shrinker at the first place?

> This allows the default size to be kept quite low when one or more of the
> atomic pools is not used.
> 
> This also allows __dma_atomic_pool_init to return a pointer to the pool
> to make initialization cleaner.
> 
> Also switch over some node ids to the more appropriate NUMA_NO_NODE.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] netfilter: nft_fwd_netdev: Fix CONFIG_NET_CLS_ACT=n build

2020-04-10 Thread Geert Uytterhoeven
On Fri, Apr 10, 2020 at 4:26 PM Geert Uytterhoeven  wrote:
> If CONFIG_NET_CLS_ACT=n:
>
> net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
> net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no 
> member named ‘tc_redirected’
>   pkt->skb->tc_redirected = 1;
>   ^~
> net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no 
> member named ‘tc_from_ingress’
>   pkt->skb->tc_from_ingress = 1;
>   ^~
>
> Fix this by protecting this code hunk with the appropriate #ifdef.
>
> Reported-by: nore...@ellerman.id.au
> Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb 
> via ingress")
> Signed-off-by: Geert Uytterhoeven 

Please ignore, wrong patch.
Sorry for the mess.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu: Fix MTK_IOMMU dependencies

2020-04-10 Thread Geert Uytterhoeven
If NO_DMA=y (e.g. Sun-3 all{mod,yes}-config):

drivers/iommu/dma-iommu.o: In function `iommu_dma_mmap':
dma-iommu.c:(.text+0x836): undefined reference to `dma_pgprot'

IOMMU_DMA must not be selected, unless HAS_DMA=y.

Hence fix this by making MTK_IOMMU depend on HAS_DMA.
While at it, remove the dependency on ARM || ARM64, as that is already
implied by the dependency on ARCH_MEDIATEK.

Fixes: e93a1695d7fb5513 ("iommu: Enable compile testing for some of drivers")
Signed-off-by: Geert Uytterhoeven 
---
 drivers/iommu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 58b4a4dbfc78b9a5..bead9aaea8429447 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -457,7 +457,7 @@ config S390_AP_IOMMU
 
 config MTK_IOMMU
bool "MTK IOMMU Support"
-   depends on ARM || ARM64 || COMPILE_TEST
+   depends on HAS_DMA
depends on ARCH_MEDIATEK || COMPILE_TEST
select ARM_DMA_USE_IOMMU
select IOMMU_API
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] netfilter: nft_fwd_netdev: Fix CONFIG_NET_CLS_ACT=n build

2020-04-10 Thread Geert Uytterhoeven
If CONFIG_NET_CLS_ACT=n:

net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member 
named ‘tc_redirected’
  pkt->skb->tc_redirected = 1;
  ^~
net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member 
named ‘tc_from_ingress’
  pkt->skb->tc_from_ingress = 1;
  ^~

Fix this by protecting this code hunk with the appropriate #ifdef.

Reported-by: nore...@ellerman.id.au
Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb 
via ingress")
Signed-off-by: Geert Uytterhoeven 
---
 net/netfilter/nft_fwd_netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nft_fwd_netdev.c b/net/netfilter/nft_fwd_netdev.c
index 74f050ba6badc9dc..ebcaf5c325712f30 100644
--- a/net/netfilter/nft_fwd_netdev.c
+++ b/net/netfilter/nft_fwd_netdev.c
@@ -28,9 +28,11 @@ static void nft_fwd_netdev_eval(const struct nft_expr *expr,
struct nft_fwd_netdev *priv = nft_expr_priv(expr);
int oif = regs->data[priv->sreg_dev];
 
+#ifdef CONFIG_NET_CLS_ACT
/* These are used by ifb only. */
pkt->skb->tc_redirected = 1;
pkt->skb->tc_from_ingress = 1;
+#endif
 
nf_fwd_netdev_egress(pkt, oif);
regs->verdict.code = NF_STOLEN;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-10 Thread Liu, Yi L
Hi Jean, Eric,

> From: Liu, Yi L 
> Sent: Thursday, April 9, 2020 8:47 PM
> Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
> 
[...]
> > > >>
> > > >> Yes I don't think an u32 is going to cut it for Arm :( We need to
> > > >> describe all sorts
> > of
> > > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID 
> > > >> size,
> HW
> > > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean
> much. I
> > > >> guess we could have a secondary vendor capability for these?
> > > >
> > > > Actually, I'm wondering if we can define some formats to stands for a 
> > > > set of
> > > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st
> level
> > > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can 
> > > > parse
> > > > the capabilities.
> > >
> > > But eventually do we really need all those capability getters? I mean
> > > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
> > > to detect any mismatch? Definitively the error handling may be heavier
> > > on userspace but can't we manage.
> >
> > I think we need to present these capabilities at boot time, long before
> > the guest triggers a bind(). For example if the host SMMU doesn't support
> > 16-bit ASID, we need to communicate that to the guest using vSMMU ID
> > registers or PROBE properties. Otherwise a bind() will succeed, but if the
> > guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
> > which we'll inject into the guest, for no apparent reason from their
> > perspective.
> >
> > In addition some VMMs may have fallbacks if shared page tables are not
> > available. They could fall back to a MAP/UNMAP interface, or simply not
> > present a vIOMMU to the guest.
> >
> 
> Based on the comments, I think it would be a need to report iommu caps
> in detail. So I guess iommu uapi needs to provide something alike vfio
> cap chain in iommu uapi. Please feel free let me know your thoughts. :-)

Consider more, I guess it may be better to start simpler. Cap chain suits
the case in which there are multiple caps. e.g. some vendor iommu driver
may want to report iommu capabilities via multiple caps. Actually, in VT-d
side, the host IOMMU capability could be reported in a single cap structure.
I'm not sure about ARM side. Will there be multiple iommu_info_caps for ARM?

> In vfio, we can define a cap as below:
>
> struct vfio_iommu_type1_info_cap_nesting {
>   struct  vfio_info_cap_header header;
>   __u64   iommu_model;
> #define VFIO_IOMMU_PASID_REQS (1 << 0)
> #define VFIO_IOMMU_BIND_GPASID(1 << 1)
> #define VFIO_IOMMU_CACHE_INV  (1 << 2)
>   __u32   nesting_capabilities;
>   __u32   pasid_bits;
> #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3)
>   __u32   flags;
>   __u32   data_size;
>   __u8data[];  /*iommu info caps defined by iommu uapi */
> };
> 

If iommu vendor driver only needs one cap structure to report hw
capability, then I think we needn't implement cap chain in iommu
uapi. The @data[] field could be determined by the @iommu_model
and @flags fields. This would be easier. thoughts?

> VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/
> cache_inv/bind_gpasid_table is available or not and also the pasid
> bits. After that VFIO will ask iommu driver about the iommu_cap_info
> and fill in the @data[] field.
>
> iommu uapi:
> struct iommu_info_cap_header {
>   __u16   id; /* Identifies capability */
>   __u16   version;/* Version specific to the capability 
> ID */
>   __u32   next;   /* Offset of next capability */
> };
> 
> #define IOMMU_INFO_CAP_INTEL_VTD 1
> struct iommu_info_cap_intel_vtd {
>   struct  iommu_info_cap_header header;
>   __u32   vaddr_width;   /* VA addr_width*/
>   __u32   ipaddr_width; /* IPA addr_width, input of SL page table */
>   /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */
>   __u64   flags;
> };
> 
> #define IOMMU_INFO_CAP_ARM_SMMUv3 2
> struct iommu_info_cap_arm_smmuv3 {
>   struct  iommu_info_cap_header header;
>   ...
> };

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 33/34] iommu: Remove add_device()/remove_device() code-paths

2020-04-10 Thread Marek Szyprowski
Hi Joerg

On 07.04.2020 20:37, Joerg Roedel wrote:
> From: Joerg Roedel 
>
> All drivers are converted to use the probe/release_device()
> call-backs, so the add_device/remove_device() pointers are unused and
> the code using them can be removed.
>
> Signed-off-by: Joerg Roedel 
> ---
>   drivers/iommu/iommu.c | 145 --
>   include/linux/iommu.h |   4 --
>   2 files changed, 27 insertions(+), 122 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index cf25c1e48830..d9032f9d597c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -220,7 +220,7 @@ static int __iommu_probe_device(struct device *dev, 
> struct list_head *group_list
>   return ret;
>   }
>   
> -static int __iommu_probe_device_helper(struct device *dev)
> +int iommu_probe_device(struct device *dev)
>   {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
>   struct iommu_group *group;
> @@ -264,70 +264,17 @@ static int __iommu_probe_device_helper(struct device 
> *dev)
>   
>   }
>   
> -int iommu_probe_device(struct device *dev)
> +void iommu_release_device(struct device *dev)
>   {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
> - struct iommu_group *group;
> - int ret;
> -
> - WARN_ON(dev->iommu_group);
> -
> - if (!ops)
> - return -EINVAL;
> -
> - if (!dev_iommu_get(dev))
> - return -ENOMEM;
> -
> - if (!try_module_get(ops->owner)) {
> - ret = -EINVAL;
> - goto err_free_dev_param;
> - }
> -
> - if (ops->probe_device)
> - return __iommu_probe_device_helper(dev);
> -
> - ret = ops->add_device(dev);
> - if (ret)
> - goto err_module_put;
>   
> - group = iommu_group_get(dev);
> - iommu_create_device_direct_mappings(group, dev);
> - iommu_group_put(group);
> -
> - if (ops->probe_finalize)
> - ops->probe_finalize(dev);
> -
> - return 0;
> -
> -err_module_put:
> - module_put(ops->owner);
> -err_free_dev_param:
> - dev_iommu_free(dev);
> - return ret;
> -}
> -
> -static void __iommu_release_device(struct device *dev)
> -{
> - const struct iommu_ops *ops = dev->bus->iommu_ops;
> + if (!dev->iommu)
> + return;
>   
>   iommu_device_unlink(dev->iommu->iommu_dev, dev);
> -
>   iommu_group_remove_device(dev);
>   
>   ops->release_device(dev);
> -}
> -
> -void iommu_release_device(struct device *dev)
> -{
> - const struct iommu_ops *ops = dev->bus->iommu_ops;
> -
> - if (!dev->iommu)
> - return;
> -
> - if (ops->release_device)
> - __iommu_release_device(dev);
> - else if (dev->iommu_group)
> - ops->remove_device(dev);
>   
>   module_put(ops->owner);
>   dev_iommu_free(dev);
> @@ -1560,23 +1507,6 @@ struct iommu_group *iommu_group_get_for_dev(struct 
> device *dev)
>   if (ret)
>   goto out_put_group;
>   
> - /*
> -  * Try to allocate a default domain - needs support from the
> -  * IOMMU driver. There are still some drivers which don't support
> -  * default domains, so the return value is not yet checked. Only
> -  * allocate the domain here when the driver still has the
> -  * add_device/remove_device call-backs implemented.
> -  */
> - if (!ops->probe_device) {
> - iommu_alloc_default_domain(dev);
> -
> - if (group->default_domain)
> - ret = __iommu_attach_device(group->default_domain, dev);
> -
> - if (ret)
> - goto out_put_group;
> - }
> -
>   return group;
>   
>   out_put_group:
> @@ -1591,21 +1521,6 @@ struct iommu_domain *iommu_group_default_domain(struct 
> iommu_group *group)
>   return group->default_domain;
>   }
>   
> -static int add_iommu_group(struct device *dev, void *data)
> -{
> - int ret = iommu_probe_device(dev);
> -
> - /*
> -  * We ignore -ENODEV errors for now, as they just mean that the
> -  * device is not translated by an IOMMU. We still care about
> -  * other errors and fail to initialize when they happen.
> -  */
> - if (ret == -ENODEV)
> - ret = 0;
> -
> - return ret;
> -}
> -
>   static int probe_iommu_group(struct device *dev, void *data)
>   {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
> @@ -1789,45 +1704,39 @@ static int iommu_group_create_direct_mappings(struct 
> iommu_group *group)
>   
>   int bus_iommu_probe(struct bus_type *bus)
>   {
> - const struct iommu_ops *ops = bus->iommu_ops;
> + struct iommu_group *group, *next;
> + LIST_HEAD(group_list);
>   int ret;
>   
> - if (ops->probe_device) {
> - struct iommu_group *group, *next;
> - LIST_HEAD(group_list);
> -
> - /*
> -  * This code-path does not allocate the default domain when
> -  * creating the iommu group, so do it after 

Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-10 Thread Daniel Vetter
On Fri, Apr 10, 2020 at 12:57 AM Benjamin Herrenschmidt
 wrote:
>
> On Thu, 2020-04-09 at 11:41 +0200, Daniel Vetter wrote:
> > Now if these boxes didn't ever have agp then I think we can get away
> > with deleting this, since we've already deleted the legacy radeon
> > driver. And that one used vmalloc for everything. The new kms one does
> > use the dma-api if the gpu isn't connected through agp
>
> Definitely no AGP there.

Ah in that case I think we can be sure that this code is dead.

Acked-by: Daniel Vetter 

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu