RE: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function

2020-02-20 Thread Liu, Yi L
> From: Jacob Pan 
> Sent: Wednesday, January 29, 2020 2:02 PM
> Subject: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function
> 
> When Shared Virtual Address (SVA) is enabled for a guest OS via vIOMMU, we
> need to provide invalidation support at IOMMU API and driver level. This patch
> adds Intel VT-d specific function to implement iommu passdown invalidate API
> for shared virtual address.
> 
> The use case is for supporting caching structure invalidation of assigned SVM
> capable devices. Emulated IOMMU exposes queue invalidation capability and
> passes down all descriptors from the guest to the physical IOMMU.
> 
> The assumption is that guest to host device ID mapping should be resolved 
> prior
> to calling IOMMU driver. Based on the device handle, host IOMMU driver can
> replace certain fields before submit to the invalidation queue.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Liu, Yi L 

May be update my  email to Liu Yi L  :-) @linux.intel.com
account no more work for me.  :-(

> ---
>  drivers/iommu/intel-iommu.c | 173
> 
>  1 file changed, 173 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> 8a4136e805ac..b8aa6479b87f 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5605,6 +5605,178 @@ static void intel_iommu_aux_detach_device(struct
> iommu_domain *domain,
>   aux_domain_remove_dev(to_dmar_domain(domain), dev);  }
> 
> +/*
> + * 2D array for converting and sanitizing IOMMU generic TLB granularity
> +to
> + * VT-d granularity. Invalidation is typically included in the unmap
> +operation
> + * as a result of DMA or VFIO unmap. However, for assigned device where
> +guest
> + * could own the first level page tables without being shadowed by
> +QEMU. In
> + * this case there is no pass down unmap to the host IOMMU as a result
> +of unmap
> + * in the guest. Only invalidations are trapped and passed down.
> + * In all cases, only first level TLB invalidation (request with PASID)
> +can be
> + * passed down, therefore we do not include IOTLB granularity for
> +request
> + * without PASID (second level).
> + *
> + * For an example, to find the VT-d granularity encoding for IOTLB
> + * type and page selective granularity within PASID:
> + * X: indexed by iommu cache type
> + * Y: indexed by enum iommu_inv_granularity
> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> + *
> + * Granu_map array indicates validity of the table. 1: valid, 0:
> +invalid
> + *
> + */
> +const static int
> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_N
> R] = {
> + /* PASID based IOTLB, support PASID selective and page selective */
> + {0, 1, 1},
> + /* PASID based dev TLBs, only support all PASIDs or single PASID */
> + {1, 1, 0},
> + /* PASID cache */
> + {1, 1, 0}
> +};
> +
> +const static u64
> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_
> NR] = {
> + /* PASID based IOTLB */
> + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> + /* PASID based dev TLBs */
> + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> + /* PASID cache */
> + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, };
> +
> +static inline int to_vtd_granularity(int type, int granu, u64
> +*vtd_granu) {
> + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
> IOMMU_INV_GRANU_NR ||
> + !inv_type_granu_map[type][granu])
> + return -EINVAL;
> +
> + *vtd_granu = inv_type_granu_table[type][granu];
> +
> + return 0;
> +}
> +
> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) {
> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
> +
> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
> +  * IOMMU cache invalidate API passes granu_size in bytes, and number
> of
> +  * granu size in contiguous memory.
> +  */
> + return order_base_2(nr_pages);
> +}
> +
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> + struct device *dev, struct iommu_cache_invalidate_info
> *inv_info) {
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct intel_iommu *iommu;
> + unsigned long flags;
> + int cache_type;
> + u8 bus, devfn;
> + u16 did, sid;
> + int ret = 0;
> + u64 size;
> +
> + if (!inv_info || !dmar_domain ||
> + inv_info->version !=
> IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + if (!dev || !dev_is_pci(dev))
> + return -ENODEV;
> +
> + iommu = device_to_iommu(dev, , );
> + if (!iommu)
> + return -ENODEV;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + spin_lock(>lock);
> + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
> + if (!info) {
> 

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Jason Wang


On 2020/2/21 上午12:06, Halil Pasic wrote:

Currently if one intends to run a memory protection enabled VM with
virtio devices and linux as the guest OS, one needs to specify the
VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
linux use the DMA API, which in turn handles the memory
encryption/protection stuff if the guest decides to turn itself into
a protected one. This however makes no sense due to multiple reasons:
* The device is not changed by the fact that the guest RAM is
protected. The so called IOMMU bypass quirk is not affected.
* This usage is not congruent with  standardised semantics of
VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
for using DMA API in virtio (orthogonal with respect to what is
expressed by VIRTIO_F_IOMMU_PLATFORM).

This series aims to decouple 'have to use DMA API because my (guest) RAM
is protected' and 'have to use DMA API because the device told me
VIRTIO_F_IOMMU_PLATFORM'.

Please find more detailed explanations about the conceptual aspects in
the individual patches. There is however also a very practical problem
that is addressed by this series.

For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
effect The vhost code assumes it the addresses on the virtio descriptor
ring are not guest physical addresses but iova's, and insists on doing a
translation of these regardless of what transport is used (e.g. whether
we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
"vhost: new device IOTLB API".) On s390 this results in severe
performance degradation (c.a. factor 10).



Do you see a consistent degradation on the performance, or it only 
happen when for during the beginning of the test?




BTW with ccw I/O there is
(architecturally) no IOMMU, so the whole address translation makes no
sense in the context of virtio-ccw.



I suspect we can do optimization in qemu side.

E.g send memtable entry via IOTLB API when vIOMMU is not enabled.

If this makes sense, I can draft patch to see if there's any difference.

Thanks




Halil Pasic (2):
   mm: move force_dma_unencrypted() to mem_encrypt.h
   virtio: let virtio use DMA API when guest RAM is protected

  drivers/virtio/virtio_ring.c |  3 +++
  include/linux/dma-direct.h   |  9 -
  include/linux/mem_encrypt.h  | 10 ++
  3 files changed, 13 insertions(+), 9 deletions(-)


base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread Jason Wang


On 2020/2/21 上午10:59, David Gibson wrote:

On Thu, Feb 20, 2020 at 05:13:09PM +0100, Christoph Hellwig wrote:

On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 867c7ebd3f10..fafc8f924955 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
if (!virtio_has_iommu_quirk(vdev))
return true;
  
+	if (force_dma_unencrypted(>dev))

+   return true;

Hell no.  This is a detail of the platform DMA direct implementation.
Drivers have no business looking at this flag, and virtio finally needs
to be fixed to use the DMA API properly for everything but legacy devices.

So, this patch definitely isn't right as it stands, but I'm struggling
to understand what it is you're saying is the right way.

By "legacy devices" I assume you mean pre-virtio-1.0 devices, that
lack the F_VERSION_1 feature flag.  Is that right?  Because I don't
see how being a legacy device or not relates to use of the DMA API.

I *think* what you are suggesting here is that virtio devices that
have !F_IOMMU_PLATFORM should have their dma_ops set up so that the
DMA API treats IOVA==PA, which will satisfy what the device expects.



Can this work for swiotlb?

Thanks



Then the virtio driver can use the DMA API the same way for both
F_IOMMU_PLATFORM and !F_IOMMU_PLATFORM devices.

But if that works for !F_IOMMU_PLATFORM_DEVICES+F_VERSION_1 devices,
then AFAICT it will work equally well for legacy devices.

Using the DMA API for *everything* in virtio, legacy or not, seems
like a reasonable approach to me.  But, AFAICT, that does require the
DMA layer to have some kind of explicit call to turn on this
behaviour, which the virtio driver would call during initializsation.
I don't think we can do it 100% within the DMA layer, because only the
driver can reasonably know when a device has this weird non-standard
DMA behaviour.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread David Gibson
On Thu, Feb 20, 2020 at 05:17:48PM -0800, Ram Pai wrote:
> On Thu, Feb 20, 2020 at 03:55:14PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> > > Currently the advanced guest memory protection technologies (AMD SEV,
> > > powerpc secure guest technology and s390 Protected VMs) abuse the
> > > VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
> > > is in turn necessary, to make IO work with guest memory protection.
> > > 
> > > But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
> > > different beast: with virtio devices whose implementation runs on an SMP
> > > CPU we are still fine with doing all the usual optimizations, it is just
> > > that we need to make sure that the memory protection mechanism does not
> > > get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
> > > side of the guest (and possibly he host side as well) than we actually
> > > need.
> > > 
> > > An additional benefit of teaching the guest to make the right decision
> > > (and use DMA API) on it's own is: removing the need, to mandate special
> > > VM configuration for guests that may run with protection. This is
> > > especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
> > > the virtio control structures into the first 2G of guest memory:
> > > something we don't necessarily want to do per-default.
> > > 
> > > Signed-off-by: Halil Pasic 
> > > Tested-by: Ram Pai 
> > > Tested-by: Michael Mueller 
> > 
> > This might work for you but it's fragile, since without
> > VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets
> > GPA's, not DMA addresses.
> > 
> > 
> > 
> > IOW this looks like another iteration of:
> > 
> > virtio: Support encrypted memory on powerpc secure guests
> > 
> > which I was under the impression was abandoned as unnecessary.
> 
> It has been abondoned on powerpc. We enabled VIRTIO_F_ACCESS_PLATFORM;
> by default, flag on powerpc.

Uh... we haven't yet, though we're working on it.

> We would like to enable secure guests on powerpc without this flag
> aswell enabled, but past experience has educated us that its not a easy
> path.  However if Halil makes some inroads in this path for s390, we
> will like to support him.
> 
> 
> RP
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-20 Thread David Gibson
On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > >From a users perspective it makes absolutely perfect sense to use the
> > bounce buffers when they are NEEDED. 
> > Forcing the user to specify iommu_platform just because you need bounce 
> > buffers
> > really feels wrong. And obviously we have a severe performance issue
> > because of the indirections.
> 
> The point is that the user should not have to specify iommu_platform.
> We need to make sure any new hypervisor (especially one that might require
> bounce buffering) always sets it,

So, I have draft qemu patches which enable iommu_platform by default.
But that's really because of other problems with !iommu_platform, not
anything to do with bounce buffering or secure VMs.

The thing is that the hypervisor *doesn't* require bounce buffering.
In the POWER (and maybe s390 as well) models for Secure VMs, it's the
*guest*'s choice to enter secure mode, so the hypervisor has no reason
to know whether the guest needs bounce buffering.  As far as the
hypervisor and qemu are concerned that's a guest internal detail, it
just expects to get addresses it can access whether those are GPAs
(iommu_platform=off) or IOVAs (iommu_platform=on).

> as was a rather bogus legacy hack

It was certainly a bad idea, but it was a bad idea that went into a
public spec and has been widely deployed for many years.  We can't
just pretend it didn't happen and move on.

Turning iommu_platform=on by default breaks old guests, some of which
we still care about.  We can't (automatically) do it only for guests
that need bounce buffering, because the hypervisor doesn't know that
ahead of time.

> that isn't extensibe for cases that for example require bounce buffering.

In fact bounce buffering isn't really the issue from the hypervisor
(or spec's) point of view.  It's the fact that not all of guest memory
is accessible to the hypervisor.  Bounce buffering is just one way the
guest might deal with that.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread David Gibson
On Thu, Feb 20, 2020 at 05:13:09PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 867c7ebd3f10..fafc8f924955 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device 
> > *vdev)
> > if (!virtio_has_iommu_quirk(vdev))
> > return true;
> >  
> > +   if (force_dma_unencrypted(>dev))
> > +   return true;
> 
> Hell no.  This is a detail of the platform DMA direct implementation.
> Drivers have no business looking at this flag, and virtio finally needs
> to be fixed to use the DMA API properly for everything but legacy devices.

So, this patch definitely isn't right as it stands, but I'm struggling
to understand what it is you're saying is the right way.

By "legacy devices" I assume you mean pre-virtio-1.0 devices, that
lack the F_VERSION_1 feature flag.  Is that right?  Because I don't
see how being a legacy device or not relates to use of the DMA API.

I *think* what you are suggesting here is that virtio devices that
have !F_IOMMU_PLATFORM should have their dma_ops set up so that the
DMA API treats IOVA==PA, which will satisfy what the device expects.
Then the virtio driver can use the DMA API the same way for both
F_IOMMU_PLATFORM and !F_IOMMU_PLATFORM devices.

But if that works for !F_IOMMU_PLATFORM_DEVICES+F_VERSION_1 devices,
then AFAICT it will work equally well for legacy devices.

Using the DMA API for *everything* in virtio, legacy or not, seems
like a reasonable approach to me.  But, AFAICT, that does require the
DMA layer to have some kind of explicit call to turn on this
behaviour, which the virtio driver would call during initializsation.
I don't think we can do it 100% within the DMA layer, because only the
driver can reasonably know when a device has this weird non-standard
DMA behaviour.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread Ram Pai
On Thu, Feb 20, 2020 at 03:55:14PM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> > Currently the advanced guest memory protection technologies (AMD SEV,
> > powerpc secure guest technology and s390 Protected VMs) abuse the
> > VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
> > is in turn necessary, to make IO work with guest memory protection.
> > 
> > But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
> > different beast: with virtio devices whose implementation runs on an SMP
> > CPU we are still fine with doing all the usual optimizations, it is just
> > that we need to make sure that the memory protection mechanism does not
> > get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
> > side of the guest (and possibly he host side as well) than we actually
> > need.
> > 
> > An additional benefit of teaching the guest to make the right decision
> > (and use DMA API) on it's own is: removing the need, to mandate special
> > VM configuration for guests that may run with protection. This is
> > especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
> > the virtio control structures into the first 2G of guest memory:
> > something we don't necessarily want to do per-default.
> > 
> > Signed-off-by: Halil Pasic 
> > Tested-by: Ram Pai 
> > Tested-by: Michael Mueller 
> 
> This might work for you but it's fragile, since without
> VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets
> GPA's, not DMA addresses.
> 
> 
> 
> IOW this looks like another iteration of:
> 
>   virtio: Support encrypted memory on powerpc secure guests
> 
> which I was under the impression was abandoned as unnecessary.

It has been abondoned on powerpc. We enabled VIRTIO_F_ACCESS_PLATFORM;
by default, flag on powerpc.

We would like to enable secure guests on powerpc without this flag
aswell enabled, but past experience has educated us that its not a easy
path.  However if Halil makes some inroads in this path for s390, we
will like to support him.


RP

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: fix the wrong printing in RHSA parsing

2020-02-20 Thread Lu Baolu

Hi,

On 2020/1/15 18:28, Zhenzhong Duan wrote:

When base address in RHSA structure doesn't match base address in
each DRHD structure, the base address in last DRHD is printed out.

This doesn't make sense when there are multiple DRHD units, fix it
by printing the buggy RHSA's base address.

Signed-off-by: Zhenzhong Duan
Cc: David Woodhouse
Cc: Lu Baolu
Cc: Joerg Roedel
Cc:iommu@lists.linux-foundation.org


Queued for v5.7. Thanks!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/iova: Support limiting IOVA alignment

2020-02-20 Thread Liam Mark
On Thu, 20 Feb 2020, Robin Murphy wrote:

> > > > > Add CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT to limit the alignment of
> > > > > IOVAs to some desired PAGE_SIZE order, specified by
> > > > > CONFIG_IOMMU_IOVA_ALIGNMENT. This helps reduce the impact of
> > > > > fragmentation caused by the current IOVA alignment scheme, and
> > > > > gives better IOVA space utilization.
> > > > 
> > > > Even if the general change did prove reasonable, this IOVA allocator is
> > > > not
> > > > owned by the DMA API, so entirely removing the option of strict
> > > > size-alignment feels a bit uncomfortable. Personally I'd replace the
> > > > bool
> > > > argument with an actual alignment value to at least hand the authority
> > > > out
> > > > to individual callers.
> > > > 
> > > > Furthermore, even in DMA API terms, is anyone really ever going to
> > > > bother
> > > > tuning that config? Since iommu-dma is supposed to be a transparent
> > > > layer,
> > > > arguably it shouldn't behave unnecessarily differently from CMA, so
> > > > simply
> > > > piggy-backing off CONFIG_CMA_ALIGNMENT would seem logical.
> > > 
> > > Agreed, reusing CONFIG_CMA_ALIGNMENT makes a lot of sense here as callers
> > > relying on natural alignment of DMA buffer allocations already have to
> > > deal with that limitation. We could fix it as an optional parameter at
> > > init time (init_iova_domain()), and have the DMA IOMMU implementation
> > > pass it in there.
> > > 
> > 
> > My concern with using CONFIG_CMA_ALIGNMENT alignment is that for us this
> > would either involve further fragmenting our CMA regions (moving our CMA
> > max alignment from 1MB to max 2MB) or losing so of our 2MB IOVA block
> > mappings (changing our IOVA max alignment form 2MB to 1MB).
> > 
> > At least for us CMA allocations are often not DMA mapped into stage 1 page
> > tables so moving the CMA max alignment to 2MB in our case would, I think,
> > only provide the disadvantage of having to increase the size our CMA
> > regions to accommodate this large alignment (which isn’t optimal for
> > memory utilization since CMA regions can't satisfy unmovable page
> > allocations).
> > 
> > As an alternative would it be possible for the dma-iommu layer to use the
> > size of the allocation and the domain pgsize_bitmap field to pick a max
> > IOVA alignment, which it can pass in for that IOVA allocation, which will
> > maximize block mappings but not waste IOVA space?
> 
> Given that we already have DMA_ATTR_ALOC_SINGLE_PAGES for video drivers and
> suchlike that know enough to know they want "large buffer" allocation
> behaviour, would it suffice to have a similar attribute that says "I'm not too
> fussed about alignment"? That way there's no visible change for anyone who
> doesn't opt in and might be relying on the existing behaviour, intentionally
> or otherwise.
> 
> Then if necessary, the implementation can consider both flags together to
> decide whether to try to round down to the next block size or just shove it in
> anywhere.
> 

This should work for us.
My only concern is that many of our users would be using DMA-Buf memory, 
so DMA mapping would be done using dma_buf_map_attachment which I believe 
still doesn't support specifying any DMA attributes. 

I had previously tried to get support added upstream but wasn't 
successful.
https://lkml.org/lkml/2019/1/18/826
https://lkml.org/lkml/2019/1/18/827

But perhaps this new attribute will provide enough justification for DMA 
attributes (in some form, either explicitly or via flags) to be supported 
via dma_buf_map_attachment.

Liam

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> effect The vhost code assumes it the addresses on the virtio descriptor
> ring are not guest physical addresses but iova's, and insists on doing a
> translation of these regardless of what transport is used (e.g. whether
> we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> "vhost: new device IOTLB API".) On s390 this results in severe
> performance degradation (c.a. factor 10). BTW with ccw I/O there is
> (architecturally) no IOMMU, so the whole address translation makes no
> sense in the context of virtio-ccw.

So it sounds like a host issue: the emulation of s390 unnecessarily complicated.
Working around it by the guest looks wrong ...

> Halil Pasic (2):
>   mm: move force_dma_unencrypted() to mem_encrypt.h
>   virtio: let virtio use DMA API when guest RAM is protected
> 
>  drivers/virtio/virtio_ring.c |  3 +++
>  include/linux/dma-direct.h   |  9 -
>  include/linux/mem_encrypt.h  | 10 ++
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> 
> base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> * This usage is not congruent with  standardised semantics of
> VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> for using DMA API in virtio (orthogonal with respect to what is
> expressed by VIRTIO_F_IOMMU_PLATFORM). 

Quoting the spec:

  \item[VIRTIO_F_ACCESS_PLATFORM(33)] This feature indicates that
  the device can be used on a platform where device access to data
  in memory is limited and/or translated. E.g. this is the case if the device 
can be located
  behind an IOMMU that translates bus addresses from the device into physical
  addresses in memory, if the device can be limited to only access
  certain memory addresses or if special commands such as
  a cache flush can be needed to synchronise data in memory with
  the device. Whether accesses are actually limited or translated
  is described by platform-specific means.
  If this feature bit is set to 0, then the device
  has same access to memory addresses supplied to it as the
  driver has.
  In particular, the device will always use physical addresses
  matching addresses used by the driver (typically meaning
  physical addresses used by the CPU)
  and not translated further, and can access any address supplied to it by
  the driver. When clear, this overrides any platform-specific description of
  whether device access is limited or translated in any way, e.g.
  whether an IOMMU may be present.

since device can't access encrypted memory,
this seems to match your case reasonably well.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> Currently the advanced guest memory protection technologies (AMD SEV,
> powerpc secure guest technology and s390 Protected VMs) abuse the
> VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
> is in turn necessary, to make IO work with guest memory protection.
> 
> But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
> different beast: with virtio devices whose implementation runs on an SMP
> CPU we are still fine with doing all the usual optimizations, it is just
> that we need to make sure that the memory protection mechanism does not
> get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
> side of the guest (and possibly he host side as well) than we actually
> need.
> 
> An additional benefit of teaching the guest to make the right decision
> (and use DMA API) on it's own is: removing the need, to mandate special
> VM configuration for guests that may run with protection. This is
> especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
> the virtio control structures into the first 2G of guest memory:
> something we don't necessarily want to do per-default.
> 
> Signed-off-by: Halil Pasic 
> Tested-by: Ram Pai 
> Tested-by: Michael Mueller 

This might work for you but it's fragile, since without
VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets
GPA's, not DMA addresses.



IOW this looks like another iteration of:

virtio: Support encrypted memory on powerpc secure guests

which I was under the impression was abandoned as unnecessary.


To summarize, the necessary conditions for a hack along these lines
(using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:

  - secure guest mode is enabled - so we know that since we don't share
most memory regular virtio code won't
work, even though the buggy hypervisor didn't set VIRTIO_F_ACCESS_PLATFORM
  - DMA API is giving us addresses that are actually also physical
addresses
  - Hypervisor is buggy and didn't enable VIRTIO_F_ACCESS_PLATFORM
 
I don't see how this patch does this.


> ---
>  drivers/virtio/virtio_ring.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 867c7ebd3f10..fafc8f924955 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>   if (!virtio_has_iommu_quirk(vdev))
>   return true;
>  
> + if (force_dma_unencrypted(>dev))
> + return true;
> +
>   /* Otherwise, we are left to guess. */
>   /*
>* In theory, it's possible to have a buggy QEMU-supposed
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> Currently if one intends to run a memory protection enabled VM with
> virtio devices and linux as the guest OS, one needs to specify the
> VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
> linux use the DMA API, which in turn handles the memory
> encryption/protection stuff if the guest decides to turn itself into
> a protected one. This however makes no sense due to multiple reasons:
> * The device is not changed by the fact that the guest RAM is
> protected. The so called IOMMU bypass quirk is not affected.
> * This usage is not congruent with  standardised semantics of
> VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> for using DMA API in virtio (orthogonal with respect to what is
> expressed by VIRTIO_F_IOMMU_PLATFORM). 
> 
> This series aims to decouple 'have to use DMA API because my (guest) RAM
> is protected' and 'have to use DMA API because the device told me
> VIRTIO_F_IOMMU_PLATFORM'.
> 
> Please find more detailed explanations about the conceptual aspects in
> the individual patches. There is however also a very practical problem
> that is addressed by this series. 
> 
> For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> effect The vhost code assumes it the addresses on the virtio descriptor
> ring are not guest physical addresses but iova's, and insists on doing a
> translation of these regardless of what transport is used (e.g. whether
> we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> "vhost: new device IOTLB API".) On s390 this results in severe
> performance degradation (c.a. factor 10). BTW with ccw I/O there is
> (architecturally) no IOMMU, so the whole address translation makes no
> sense in the context of virtio-ccw.

That's just a QEMU thing. It uses the same flag for VIRTIO_F_ACCESS_PLATFORM
and vhost IOTLB. QEMU can separate them, no need to change linux.


> Halil Pasic (2):
>   mm: move force_dma_unencrypted() to mem_encrypt.h
>   virtio: let virtio use DMA API when guest RAM is protected
> 
>  drivers/virtio/virtio_ring.c |  3 +++
>  include/linux/dma-direct.h   |  9 -
>  include/linux/mem_encrypt.h  | 10 ++
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> 
> base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page

2020-02-20 Thread Yonghyun Hwang via iommu
intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge
page onto its corresponding physical address. This commit fixes the bug by
accomodating the level of page entry for the IOVA and adds IOVA's lower
address to the physical address.

Signed-off-by: Yonghyun Hwang 
---

Changes from v1:
- level cannot be 0. So, the condition, "if (level > 1)", is removed, which 
results in a simple code.
- a macro, BIT_MASK, is used to have a bit mask

---
 drivers/iommu/intel-iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 932267f49f9a..4fd5c6287b6d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5554,7 +5554,9 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
iommu_domain *domain,
 
pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, );
if (pte)
-   phys = dma_pte_addr(pte);
+   phys = dma_pte_addr(pte) +
+   (iova & (BIT_MASK(level_to_offset_bits(level) +
+   VTD_PAGE_SHIFT) - 1));
 
return phys;
 }
-- 
2.25.0.265.gbab2e86ba0-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page

2020-02-20 Thread Moritz Fischer
Hi Yonghyun,

On Thu, Feb 20, 2020 at 11:44:31AM -0800, Yonghyun Hwang wrote:
> intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge
> page onto its corresponding physical address. This commit fixes the bug by
> accomodating the level of page entry for the IOVA and adds IOVA's lower
> address to the physical address.
> 
D'oh I meant to add a Cc: sta...@vger.kernel.org here ... :)
> Signed-off-by: Yonghyun Hwang 
> ---
> 
> Changes from v1:
> - level cannot be 0. So, the condition, "if (level > 1)", is removed, which 
> results in a simple code.
> - a macro, BIT_MASK, is used to have a bit mask
> 
> ---
>  drivers/iommu/intel-iommu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 932267f49f9a..4fd5c6287b6d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5554,7 +5554,9 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
> iommu_domain *domain,
>  
>   pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, );
>   if (pte)
> - phys = dma_pte_addr(pte);
> + phys = dma_pte_addr(pte) +
> + (iova & (BIT_MASK(level_to_offset_bits(level) +
> + VTD_PAGE_SHIFT) - 1));
>  
>   return phys;
>  }
> -- 
> 2.25.0.265.gbab2e86ba0-goog
> 

Cheers,
Moritz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range

2020-02-20 Thread Pratik Patel
On 02/20  08:45 am, Will Deacon wrote:
> On Wed, Feb 19, 2020 at 12:06:28PM -0800, isa...@codeaurora.org wrote:
> > On 2020-02-19 03:15, Will Deacon wrote:
> > > On Tue, Feb 18, 2020 at 05:57:18PM -0800, isa...@codeaurora.org wrote:
> > > > Does this mean that the driver should be managing the IOVA space and
> > > > mappings for this device using the IOMMU API? If so, is the
> > > > rationale for
> > > > this because the device driver can have the information of what IOVA
> > > > ranges
> > > > can and cannot be used? Shouldn't there be a generic way of
> > > > informing an
> > > > IOMMU driver about these reserved ranges? Perhaps through a device
> > > > tree
> > > > property, instead of deferring this type of management to the driver?
> > > 
> > > Before we dive into designing that, can you please clarify whether the
> > > reserved IOVA range applies to all DMA masters mastering through a
> > > particular SMMU, or whether it's just about one specific master? I was
> > > assuming the former, but wanted to be sure.
> > > 
> > This situation currently applies to one master.
> 
> Interesting. Is it problematic if the range is reserved for all masters
> sharing that SMMU?
> 
Yes, that would be an overkill. It certainly is useful and in
some cases quite helpful to not waste that range of IOVA space for
other masters on the same SMMU that can use it.

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm

2020-02-20 Thread Robin Murphy

On 20/02/2020 6:38 am, isa...@codeaurora.org wrote:

On 2020-02-17 08:03, Robin Murphy wrote:

On 14/02/2020 11:06 pm, Isaac J. Manjarres wrote:

From: Liam Mark 

Using the best-fit algorithm, instead of the first-fit
algorithm, may reduce fragmentation when allocating
IOVAs.


What kind of pathological allocation patterns make that a serious
problem? Is there any scope for simply changing the order of things in
the callers? Do these drivers also run under other DMA API backends
(e.g. 32-bit Arm)?

The usecases where the IOVA space has been fragmented have 
non-deterministic allocation
patterns, and thus, it's not feasible to change the allocation order to 
avoid fragmenting

the IOVA space.


What about combining smaller buffers into larger individual allocations; 
any scope for that sort of thing? Certainly if you're consistently 
allocating small things less than PAGE_SIZE then DMA pools would be 
useful to avoid wanton memory wastage in general.



 From what we've observed, the usecases involve allocations of two types of
buffers: one type of buffer between 1 KB to 4 MB in size, and another 
type of

buffer between 1 KB to 400 MB in size.

The pathological scenarios seem to arise when there are
many (100+) randomly distributed non-power of two allocations, which in 
some cases leaves

behind holes of up to 100+ MB in the IOVA space.

Here are some examples that show the state of the IOVA space under which 
failure to

allocate an IOVA was observed:

Instance 1:
 Currently mapped total size : ~1.3GB
 Free space available : ~2GB
 Map for ~162MB fails.
     Max contiguous space available : < 162MB

Instance 2:
 Currently mapped total size : ~950MB
 Free space available : ~2.3GB
 Map for ~320MB fails.
 Max contiguous space available : ~189MB

Instance 3:
 Currently mapped total size : ~1.2GB
 Free space available : ~2.7GB
 Map for ~162MB fails.
 Max contiguous space available : <162MB

We are still in the process of collecting data with the best-fit 
algorithm enabled

to provide some numbers to show that it results in less IOVA space
fragmentation.


Thanks for those examples, and I'd definitely like to see the 
comparative figures. To dig a bit further, at the point where things 
start failing, where are the cached nodes pointing? IIRC there is still 
a pathological condition where empty space between limit_pfn and 
cached32_node gets 'lost' if nothing in between is freed, so the bigger 
the range of allocation sizes, the worse the effect, e.g.:


(considering an empty domain, pfn 0 *not* reserved, 32-bit limit_pfn)

alloc 4K, succeeds, cached32_node now at 4G-4K
alloc 2G, succeeds, cached32_node now at 0
alloc 4K, fails despite almost 2G contiguous free space within limit_pfn
	(and max32_alloc_size==1 now fast-forwards *any* further allocation 
attempt to failure)


If you're falling foul of this case (I was never sure how realistic a 
problem it would be in practice), there are at least a couple of much 
less invasive tweaks I can think of that would be worth exploring.


To answer your question about whether if this driver run under other DMA 
API backends:

yes, such as 32 bit ARM.


OK, that's what I suspected :)

AFAICS arch/arm's __alloc_iova() is also a first-fit algorithm, so if 
you get better behaviour there then it would suggest that this aspect 
isn't really the most important issue. Certainly, the fact that the 
"best fit" logic here also happens to ignore the cached nodes does start 
drawing me back to the point above.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 4/4] drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU

2020-02-20 Thread Maxime Ripard
The main DRM device is actually a virtual device so it doesn't have the
iommus property, which is instead on the DMA masters, in this case the
mixers.

Add a call to of_dma_configure with the mixers DT node but on the DRM
virtual device to configure it in the same way than the mixers.

Signed-off-by: Maxime Ripard 
---
 drivers/gpu/drm/sun4i/sun8i_mixer.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/sun4i/sun8i_mixer.c 
b/drivers/gpu/drm/sun4i/sun8i_mixer.c
index 7c24f8f832a5..85b8930e334c 100644
--- a/drivers/gpu/drm/sun4i/sun8i_mixer.c
+++ b/drivers/gpu/drm/sun4i/sun8i_mixer.c
@@ -372,6 +372,19 @@ static int sun8i_mixer_bind(struct device *dev, struct 
device *master,
mixer->engine.ops = _engine_ops;
mixer->engine.node = dev->of_node;
 
+   if (of_find_property(dev->of_node, "iommus", NULL)) {
+   /*
+* This assume we have the same DMA constraints for
+* all our the mixers in our pipeline. This sounds
+* bad, but it has always been the case for us, and
+* DRM doesn't do per-device allocation either, so we
+* would need to fix DRM first...
+*/
+   ret = of_dma_configure(drm->dev, dev->of_node, true);
+   if (ret)
+   return ret;
+   }
+
/*
 * While this function can fail, we shouldn't do anything
 * if this happens. Some early DE2 DT entries don't provide
-- 
git-series 0.9.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] dt-bindings: iommu: Add Allwinner H6 IOMMU bindings

2020-02-20 Thread Maxime Ripard
The Allwinner H6 has introduced an IOMMU. Let's add a device tree binding
for it.

Reviewed-by: Rob Herring 
Signed-off-by: Maxime Ripard 
---
 Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml | 61 
+
 1 file changed, 61 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml

diff --git 
a/Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml 
b/Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml
new file mode 100644
index ..5e125cf2a88b
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml
@@ -0,0 +1,61 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/allwinner,sun50i-h6-iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Allwinner H6 IOMMU Device Tree Bindings
+
+maintainers:
+  - Chen-Yu Tsai 
+  - Maxime Ripard 
+
+properties:
+  "#iommu-cells":
+const: 1
+description:
+  The content of the cell is the master ID.
+
+  compatible:
+const: allwinner,sun50i-h6-iommu
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  clocks:
+maxItems: 1
+
+  resets:
+maxItems: 1
+
+required:
+  - "#iommu-cells"
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - resets
+
+additionalProperties: false
+
+examples:
+  - |
+  #include 
+  #include 
+
+  #include 
+  #include 
+
+  iommu: iommu@30f {
+  compatible = "allwinner,sun50i-h6-iommu";
+  reg = <0x030f 0x1>;
+  interrupts = ;
+  clocks = < CLK_BUS_IOMMU>;
+  resets = < RST_BUS_IOMMU>;
+  #iommu-cells = <1>;
+  };
+
+...
-- 
git-series 0.9.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/4] iommu: Add Allwinner H6 IOMMU driver

2020-02-20 Thread Maxime Ripard
Hi,

Here's a series adding support for the IOMMU introduced in the Allwinner
H6. The driver from Allwinner hints at more SoCs using it in the future
(with more masters), so we can bet on that IOMMU becoming pretty much
standard in new SoCs from Allwinner.

One thing I wasn't really sure about was how to expose the statistics
reported by the IOMMU PMU (TLB hit rates, latencies, and so on). The
Allwinner driver exposes them through custom sysfs files, while they would
be best represented through perf I guess? Anyway, I'm planning to support
them later on.

Let me know what you think,
Maxime

Changes from v1:
  - Add a patch to configure the IOMMU on the virtual DRM device
  - Rework the domain allocation / freeing
  - Remove the runtime_pm handling to power up the device and rely on
refcounting
  - use map gfp argument for kmem cache allocation
  - Removed unused macros
  - Switched from BIT(0) to 1 for the page table entry valid flag to make
it more obvious that it's over multiple bits.
  - Switch to module_initcall
  - Make accesses to the fwspec more consistant
  - Removed dev_info logs
  - Reworked invalidation / flushing
  - Allow for compilation with COMPILE_TEST

Maxime Ripard (4):
  dt-bindings: iommu: Add Allwinner H6 IOMMU bindings
  iommu: Add Allwinner H6 IOMMU driver
  arm64: dts: allwinner: h6: Add IOMMU
  drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU

 Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml |   61 
-
 arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi   |   10 
+-
 drivers/gpu/drm/sun4i/sun8i_mixer.c|   13 
+-
 drivers/iommu/Kconfig  |9 
+-
 drivers/iommu/Makefile |1 
+-
 drivers/iommu/sun50i-iommu.c   | 1072 
-
 6 files changed, 1166 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml
 create mode 100644 drivers/iommu/sun50i-iommu.c

base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9
-- 
git-series 0.9.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] arm64: dts: allwinner: h6: Add IOMMU

2020-02-20 Thread Maxime Ripard
Now that we have a driver for the IOMMU, let's start using it.

Signed-off-by: Maxime Ripard 
---
 arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi 
b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
index 3329283e38ab..2d0777ad39aa 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
@@ -131,6 +131,7 @@
clock-names = "bus",
  "mod";
resets = <_clocks RST_MIXER0>;
+   iommus = < 0>;
 
ports {
#address-cells = <1>;
@@ -370,6 +371,15 @@
#interrupt-cells = <3>;
};
 
+   iommu: iommu@30f {
+   compatible = "allwinner,sun50i-h6-iommu";
+   reg = <0x030f 0x1>;
+   interrupts = ;
+   clocks = < CLK_BUS_IOMMU>;
+   resets = < RST_BUS_IOMMU>;
+   #iommu-cells = <1>;
+   };
+
mmc0: mmc@402 {
compatible = "allwinner,sun50i-h6-mmc",
 "allwinner,sun50i-a64-mmc";
-- 
git-series 0.9.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] iommu: Add Allwinner H6 IOMMU driver

2020-02-20 Thread Maxime Ripard
The Allwinner H6 has introduced an IOMMU for a few DMA controllers, mostly
video related: the display engine, the video decoders / encoders, the
camera capture controller, etc.

The design is pretty simple compared to other IOMMUs found in SoCs: there's
a single instance, controlling all the masters, with a single address
space.

It also features a performance monitoring unit that allows to retrieve
various informations (per-master and global TLB accesses, hits and misses,
access latency, etc) that isn't supported at the moment.

Signed-off-by: Maxime Ripard 
---
 drivers/iommu/Kconfig|9 +-
 drivers/iommu/Makefile   |1 +-
 drivers/iommu/sun50i-iommu.c | 1072 +++-
 3 files changed, 1082 insertions(+)
 create mode 100644 drivers/iommu/sun50i-iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d2fade984999..87677ea98427 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -302,6 +302,15 @@ config ROCKCHIP_IOMMU
  Say Y here if you are using a Rockchip SoC that includes an IOMMU
  device.
 
+config SUN50I_IOMMU
+   bool "Allwinner H6 IOMMU Support"
+   depends on ARCH_SUNXI || COMPILE_TEST
+   select ARM_DMA_USE_IOMMU
+   select IOMMU_API
+   select IOMMU_DMA
+   help
+ Support for the IOMMU introduced in the Allwinner H6 SoCs.
+
 config TEGRA_IOMMU_GART
bool "Tegra GART IOMMU Support"
depends on ARCH_TEGRA_2x_SOC
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 2104fb8afc06..dd1ff336b9b9 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_MTK_IOMMU_V1) += mtk_iommu_v1.o
 obj-$(CONFIG_OMAP_IOMMU) += omap-iommu.o
 obj-$(CONFIG_OMAP_IOMMU_DEBUG) += omap-iommu-debug.o
 obj-$(CONFIG_ROCKCHIP_IOMMU) += rockchip-iommu.o
+obj-$(CONFIG_SUN50I_IOMMU) += sun50i-iommu.o
 obj-$(CONFIG_TEGRA_IOMMU_GART) += tegra-gart.o
 obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
new file mode 100644
index ..81ba5f562bd2
--- /dev/null
+++ b/drivers/iommu/sun50i-iommu.c
@@ -0,0 +1,1072 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+// Copyright (C) 2016-2018, Allwinner Technology CO., LTD.
+// Copyright (C) 2019-2020, Cerno
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define IOMMU_RESET_REG0x010
+#define IOMMU_ENABLE_REG   0x020
+#define IOMMU_ENABLE_ENABLEBIT(0)
+
+#define IOMMU_BYPASS_REG   0x030
+#define IOMMU_AUTO_GATING_REG  0x040
+#define IOMMU_AUTO_GATING_ENABLE   BIT(0)
+
+#define IOMMU_WBUF_CTRL_REG0x044
+#define IOMMU_OOO_CTRL_REG 0x048
+#define IOMMU_4KB_BDY_PRT_CTRL_REG 0x04c
+#define IOMMU_TTB_REG  0x050
+#define IOMMU_TLB_ENABLE_REG   0x060
+#define IOMMU_TLB_PREFETCH_REG 0x070
+#define IOMMU_TLB_PREFETCH_MASTER_ENABLE(m)BIT(m)
+
+#define IOMMU_TLB_FLUSH_REG0x080
+#define IOMMU_TLB_FLUSH_PTW_CACHE  BIT(17)
+#define IOMMU_TLB_FLUSH_MACRO_TLB  BIT(16)
+#define IOMMU_TLB_FLUSH_MICRO_TLB(i)   (BIT(i) & GENMASK(5, 0))
+
+#define IOMMU_TLB_IVLD_ADDR_REG0x090
+#define IOMMU_TLB_IVLD_ADDR_MASK_REG   0x094
+#define IOMMU_TLB_IVLD_ENABLE_REG  0x098
+#define IOMMU_TLB_IVLD_ENABLE_ENABLE   BIT(0)
+
+#define IOMMU_PC_IVLD_ADDR_REG 0x0a0
+#define IOMMU_PC_IVLD_ENABLE_REG   0x0a8
+#define IOMMU_PC_IVLD_ENABLE_ENABLEBIT(0)
+
+#define IOMMU_DM_AUT_CTRL_REG(d)   (0x0b0 + ((d) / 2) * 4)
+#define IOMMU_DM_AUT_CTRL_RD_UNAVAIL(d, m) (1 << (((d & 1) * 16) + ((m) * 
2)))
+#define IOMMU_DM_AUT_CTRL_WR_UNAVAIL(d, m) (1 << (((d & 1) * 16) + ((m) * 
2) + 1))
+
+#define IOMMU_DM_AUT_OVWT_REG  0x0d0
+#define IOMMU_INT_ENABLE_REG   0x100
+#define IOMMU_INT_CLR_REG  0x104
+#define IOMMU_INT_STA_REG  0x108
+#define IOMMU_INT_ERR_ADDR_REG(i)  (0x110 + (i) * 4)
+#define IOMMU_INT_ERR_ADDR_L1_REG  0x130
+#define IOMMU_INT_ERR_ADDR_L2_REG  0x134
+#define IOMMU_INT_ERR_DATA_REG(i)  (0x150 + (i) * 4)
+#define IOMMU_L1PG_INT_REG 0x0180
+#define IOMMU_L2PG_INT_REG 0x0184
+
+#define IOMMU_INT_INVALID_L2PG BIT(17)
+#define IOMMU_INT_INVALID_L1PG BIT(16)
+#define IOMMU_INT_MASTER_PERMISSION(m) BIT(m)
+#define IOMMU_INT_MASTER_MASK  (IOMMU_INT_MASTER_PERMISSION(0) 
| \
+IOMMU_INT_MASTER_PERMISSION(1) 
| \
+

Re: [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support

2020-02-20 Thread Rob Herring
On Mon, Feb 17, 2020 at 1:17 PM Robin Murphy  wrote:
>
> On 13/02/2020 9:49 pm, Rob Herring wrote:
> > On Thu, Jan 30, 2020 at 11:34 AM Robin Murphy  wrote:
> >>
> >> On 30/01/2020 3:06 pm, Auger Eric wrote:
> >>> Hi Rob,
> >>> On 1/17/20 10:16 PM, Rob Herring wrote:
>  Arm SMMUv3.2 adds support for TLB range invalidate operations.
>  Support for range invalidate is determined by the RIL bit in the IDR3
>  register.
> 
>  The range invalidate is in units of the leaf page size and operates on
>  1-32 chunks of a power of 2 multiple pages. First, we determine from the
>  size what power of 2 multiple we can use. Then we calculate how many
>  chunks (1-31) of the power of 2 size for the range on the iteration. On
>  each iteration, we move up in size by at least 5 bits.
> 
>  Cc: Eric Auger 
>  Cc: Jean-Philippe Brucker 
>  Cc: Will Deacon 
>  Cc: Robin Murphy 
>  Cc: Joerg Roedel 
>  Signed-off-by: Rob Herring 
> >
> >
>  @@ -2003,7 +2024,7 @@ static void arm_smmu_tlb_inv_range(unsigned long 
>  iova, size_t size,
> {
>    u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
>    struct arm_smmu_device *smmu = smmu_domain->smmu;
>  -unsigned long start = iova, end = iova + size;
>  +unsigned long start = iova, end = iova + size, num_pages = 0, tg = 
>  0;
>    int i = 0;
>    struct arm_smmu_cmdq_ent cmd = {
>    .tlbi = {
>  @@ -2022,12 +2043,50 @@ static void arm_smmu_tlb_inv_range(unsigned long 
>  iova, size_t size,
>    cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>    }
> 
>  +if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>  +/* Get the leaf page size */
>  +tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>  +
>  +/* Convert page size of 12,14,16 (log2) to 1,2,3 */
>  +cmd.tlbi.tg = ((tg - ilog2(SZ_4K)) / 2) + 1;
> >>
> >> Given the comment, I think "(tg - 10) / 2" would suffice ;)
> >
> > Well, duh...
> >
> >>
>  +
>  +/* Determine what level the granule is at */
>  +cmd.tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
> >>
> >> Is it possible to rephrase that with logs and shifts to avoid the division?
> >
> > Well, with a loop is the only other way I came up with:
> >
> > bpl = tg - 3;
> > ttl = 3;
> > mask = BIT(bpl) - 1;
> > while ((granule & (mask << ((4 - ttl) * bpl + 3))) == 0)
> >  ttl--;
> >
> > Doesn't seem like much improvement to me given we have h/w divide...
>
> Sure, it doesn't take too many extra instructions to start matching
> typical IDIV latency, so there's no point being silly if there really
> isn't a clean alternative.
>
> Some quick scribbling suggests "4 - ilog2(granule) / 10" might actually
> be close enough, but perhaps that's a bit too cheeky.

How does divide by 10 save a divide?

>  +num_pages = size / (1UL << tg);
> >>
> >> Similarly, in my experience GCC has always seemed too cautious to elide
> >> non-constant division even in a seemingly-obvious case like this, so
> >> explicit "size >> tg" might be preferable.
> >
> > My experience has been gcc is smart enough. I checked and there's only
> > one divide and it is the prior one. But I'll change it anyways.
>
> Now that I think about it, the case that frustrated me may have had the
> shift and divide in separate statements - that's probably a lot harder
> to analyse than a single expression. Either way, the simple right shift
> is easier to read IMO.
>
>  +}
>  +
>    while (iova < end) {
>    if (i == CMDQ_BATCH_ENTRIES) {
>    arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, false);
>    i = 0;
>    }
> 
>  +if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>  +/*
>  + * On each iteration of the loop, the range is 5 
>  bits
>  + * worth of the aligned size remaining.
>  + * The range in pages is:
>  + *
>  + * range = (num_pages & (0x1f << __ffs(num_pages)))
>  + */
>  +unsigned long scale, num;
>  +
>  +/* Determine the power of 2 multiple number of 
>  pages */
>  +scale = __ffs(num_pages);
>  +cmd.tlbi.scale = scale;
>  +
>  +/* Determine how many chunks of 2^scale size we 
>  have */
>  +num = (num_pages >> scale) & 
>  CMDQ_TLBI_RANGE_NUM_MAX;
>  +cmd.tlbi.num = num - 1;
>  +
>  +/* range is num * 2^scale * pgsize */
>  +granule = num << (scale + tg);
>  +
>  +/* 

Re: [RFC PATCH] iommu/iova: Support limiting IOVA alignment

2020-02-20 Thread Robin Murphy

On 19/02/2020 11:22 pm, Liam Mark wrote:

On Wed, 19 Feb 2020, Will Deacon wrote:


On Mon, Feb 17, 2020 at 04:46:14PM +, Robin Murphy wrote:

On 14/02/2020 8:30 pm, Liam Mark wrote:


When the IOVA framework applies IOVA alignment it aligns all
IOVAs to the smallest PAGE_SIZE order which is greater than or
equal to the requested IOVA size.

We support use cases that requires large buffers (> 64 MB in
size) to be allocated and mapped in their stage 1 page tables.
However, with this alignment scheme we find ourselves running
out of IOVA space for 32 bit devices, so we are proposing this
config, along the similar vein as CONFIG_CMA_ALIGNMENT for CMA
allocations.


As per [1], I'd really like to better understand the allocation patterns
that lead to such a sparsely-occupied address space to begin with, given
that the rbtree allocator is supposed to try to maintain locality as far as
possible, and the rcaches should further improve on that. Are you also
frequently cycling intermediate-sized buffers which are smaller than 64MB
but still too big to be cached?  Are there a lot of non-power-of-two
allocations?


Right, information on the allocation pattern would help with this change
and also the choice of IOVA allocation algorithm. Without it, we're just
shooting in the dark.



Thanks for the responses.

I am looking into how much of our allocation pattern details I can share.

My general understanding is that this issue occurs on a 32bit devices
which have additional restrictions on the IOVA range they can use within those
32bits.

An example is a use case which involves allocating a lot of buffers ~80MB
is size, the current algorithm will require an alignment of 128MB for
those buffers. My understanding is that it simply can't accommodate the number 
of 80MB
buffers that are required because the of amount of IOVA space which can't
be used because of the 128MB alignment requirement.


OK, that's a case I can understand :)


Add CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT to limit the alignment of
IOVAs to some desired PAGE_SIZE order, specified by
CONFIG_IOMMU_IOVA_ALIGNMENT. This helps reduce the impact of
fragmentation caused by the current IOVA alignment scheme, and
gives better IOVA space utilization.


Even if the general change did prove reasonable, this IOVA allocator is not
owned by the DMA API, so entirely removing the option of strict
size-alignment feels a bit uncomfortable. Personally I'd replace the bool
argument with an actual alignment value to at least hand the authority out
to individual callers.

Furthermore, even in DMA API terms, is anyone really ever going to bother
tuning that config? Since iommu-dma is supposed to be a transparent layer,
arguably it shouldn't behave unnecessarily differently from CMA, so simply
piggy-backing off CONFIG_CMA_ALIGNMENT would seem logical.


Agreed, reusing CONFIG_CMA_ALIGNMENT makes a lot of sense here as callers
relying on natural alignment of DMA buffer allocations already have to
deal with that limitation. We could fix it as an optional parameter at
init time (init_iova_domain()), and have the DMA IOMMU implementation
pass it in there.



My concern with using CONFIG_CMA_ALIGNMENT alignment is that for us this
would either involve further fragmenting our CMA regions (moving our CMA
max alignment from 1MB to max 2MB) or losing so of our 2MB IOVA block
mappings (changing our IOVA max alignment form 2MB to 1MB).

At least for us CMA allocations are often not DMA mapped into stage 1 page
tables so moving the CMA max alignment to 2MB in our case would, I think,
only provide the disadvantage of having to increase the size our CMA
regions to accommodate this large alignment (which isn’t optimal for
memory utilization since CMA regions can't satisfy unmovable page
allocations).

As an alternative would it be possible for the dma-iommu layer to use the
size of the allocation and the domain pgsize_bitmap field to pick a max
IOVA alignment, which it can pass in for that IOVA allocation, which will
maximize block mappings but not waste IOVA space?


Given that we already have DMA_ATTR_ALOC_SINGLE_PAGES for video drivers 
and suchlike that know enough to know they want "large buffer" 
allocation behaviour, would it suffice to have a similar attribute that 
says "I'm not too fussed about alignment"? That way there's no visible 
change for anyone who doesn't opt in and might be relying on the 
existing behaviour, intentionally or otherwise.


Then if necessary, the implementation can consider both flags together 
to decide whether to try to round down to the next block size or just 
shove it in anywhere.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: question about iommu_need_mapping

2020-02-20 Thread Jerry Snitselaar

On Thu Feb 20 20, Jerry Snitselaar wrote:

On Thu Feb 20 20, Lu Baolu wrote:

Hi Jerry,

On 2020/2/20 7:55, Jerry Snitselaar wrote:

Is it possible for a device to end up with dev->archdata.iommu == NULL
on iommu_need_mapping in the following instance:

1. iommu_group has dma domain for default
2. device gets private identity domain in intel_iommu_add_device
3. iommu_need_mapping gets called with that device.
4. dmar_remove_one_dev_info sets dev->archdata.iommu = NULL via 
unlink_domain_info.
5. request_default_domain_for_dev exits after checking that 
group->default_domain

   exists, and group->default_domain->type is dma.
6. iommu_request_dma_domain_for_dev returns 0 from 
request_default_domain_for_dev

   and a private dma domain isn't created for the device.



Yes. It's possible.


The case I was seeing went away with commit 9235cb13d7d1 ("iommu/vt-d:
Allow devices with RMRRs to use identity domain"), because it changed
which domain the group and devices were using, but it seems like it is
still a possibility with the code. Baolu, you mentioned possibly
removing the domain switch. Commit 98b2fffb5e27 ("iommu/vt-d: Handle
32bit device with identity default domain") makes it sound like the
domain switch is required.


It's more "nice to have" than "required" if the iommu driver doesn't
disable swiotlb explicitly. The device access of system memory higher
than the device's addressing capability could go through the bounced
buffer implemented in swiotlb.

Best regards,
baolu


Hi Baolu,

Would this mean switching to bounce_dma_ops instead?



Never mind. I see that it would go into the dma_direct code.


Regards,
Jerry


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 04/11] cpufreq: Remove Calxeda driver

2020-02-20 Thread Mark Langsdorf

On 2/18/20 11:13 AM, Rob Herring wrote:

Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linux...@vger.kernel.org
Signed-off-by: Rob Herring 
---


Acked-by: Mark Langsdorf 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 02/11] ata: Remove Calxeda AHCI driver

2020-02-20 Thread Mark Langsdorf

On 2/18/20 11:13 AM, Rob Herring wrote:

Cc: Jens Axboe 
Cc: linux-...@vger.kernel.org
Signed-off-by: Rob Herring 
---


Acked-by: Mark Langsdorf 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: support setting memory uncached in place

2020-02-20 Thread Robin Murphy

On 20/02/2020 5:01 pm, Christoph Hellwig wrote:

We currently only support remapping memory as uncached through vmap
or a magic uncached segment provided by some architectures.  But there
is a simpler and much better way available on some architectures where
we can just remap the memory in place.  The advantages are:

  1) no aliasing is possible, which prevents speculating into the
 cached alias
  2) there is no need to allocate new ptes and thus no need for a special
 pre-allocated pool of memory that can be used with GFP_ATOMIC DMA
 allocations

The downside is that architectures must provide a way to set arbitrary
pages uncached in the kernel mapping, which might not be possible on
architecture that have a special implicit kernel mapping, and requires
splitting of huge page kernel mappings where they exist.

Signed-off-by: Christoph Hellwig 
---
  include/linux/dma-noncoherent.h |  3 +++
  kernel/dma/Kconfig  |  8 
  kernel/dma/direct.c | 28 ++--
  3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index ca9b5770caee..0820ec58f119 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -111,4 +111,7 @@ static inline void arch_dma_prep_coherent(struct page 
*page, size_t size)
  void *uncached_kernel_address(void *addr);
  void *cached_kernel_address(void *addr);
  
+int arch_dma_set_uncached(void *cpu_addr, size_t size);

+void arch_dma_clear_uncached(void *cpu_addr, size_t size);
+
  #endif /* _LINUX_DMA_NONCOHERENT_H */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 4c103a24e380..7bc0b77f1243 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -83,6 +83,14 @@ config DMA_DIRECT_REMAP
bool
select DMA_REMAP
  
+#

+# Should be selected if the architecture can remap memory from the page
+# allocator and CMA as uncached and provides the arch_dma_set_uncached and
+# arch_dma_clear_uncached helpers
+#
+config ARCH_HAS_DMA_SET_UNCACHED
+   bool
+
  config DMA_CMA
bool "DMA Contiguous Memory Allocator"
depends on HAVE_DMA_CONTIGUOUS && CMA
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6af7ae83c4ad..73fe65a4cbc0 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -169,11 +169,8 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size),
dma_pgprot(dev, PAGE_KERNEL, attrs),
__builtin_return_address(0));
-   if (!ret) {
-   dma_free_contiguous(dev, page, size);
-   return ret;
-   }
-
+   if (!ret)
+   goto out_free_pages;
memset(ret, 0, size);
goto done;
}
@@ -186,8 +183,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 * so log an error and fail.
 */
dev_info(dev, "Rejecting highmem page from CMA.\n");
-   dma_free_contiguous(dev, page, size);
-   return NULL;
+   goto out_free_pages;
}
  
  	ret = page_address(page);

@@ -196,10 +192,15 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
  
  	memset(ret, 0, size);
  
-	if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) &&

-   dma_alloc_need_uncached(dev, attrs)) {
+   if (dma_alloc_need_uncached(dev, attrs)) {
arch_dma_prep_coherent(page, size);
-   ret = uncached_kernel_address(ret);
+
+   if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED)) {
+   if (!arch_dma_set_uncached(ret, size))
+   goto out_free_pages;
+   } else if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT)) {
+   ret = uncached_kernel_address(ret);


Hmm, would we actually need to keep ARCH_HAS_UNCACHED_SEGMENT? If 
arch_dma_set_uncached() returned void*/ERR_PTR instead, then it could 
work for both cases (with arch_dma_clear_uncached() being a no-op for 
segments).


Robin.


+   }
}
  done:
if (force_dma_unencrypted(dev))
@@ -207,6 +208,9 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
else
*dma_handle = phys_to_dma(dev, page_to_phys(page));
return ret;
+out_free_pages:
+   dma_free_contiguous(dev, page, size);
+   return NULL;
  }
  
  void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,

@@ -230,6 +234,8 @@ void dma_direct_free_pages(struct device *dev, size_t size, 
void *cpu_addr,
  
  	if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))

vunmap(cpu_addr);
+   else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED))
+   arch_dma_clear_uncached(cpu_addr, size);
  

provide in-place uncached remapping for dma-direct (resend)

2020-02-20 Thread Christoph Hellwig
Hi all,

this series provides support for remapping places uncached in-place in
the generic dma-direct code, and moves openrisc over from its own
in-place remapping scheme.  The arm64 folks also had interest in such
a scheme to avoid problems with speculating into cache aliases.

Also all architectures that always use small page mappings for the
kernel and have non-coherent DMA should look into enabling this
scheme, as it is much more efficient than the vmap remapping.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] openrisc: use the generic in-place uncached DMA allocator

2020-02-20 Thread Christoph Hellwig
Switch openrisc to use the dma-direct allocator and just provide the
hooks for setting memory uncached or cached.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/Kconfig  |  1 +
 arch/openrisc/kernel/dma.c | 51 +-
 2 files changed, 7 insertions(+), 45 deletions(-)

diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 1928e061ff96..041fff4326dc 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -7,6 +7,7 @@
 config OPENRISC
def_bool y
select ARCH_32BIT_OFF_T
+   select ARCH_HAS_DMA_SET_UNCACHED
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select OF
select OF_EARLY_FLATTREE
diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index adec711ad39d..c73d2b3ae267 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -11,8 +11,6 @@
  * Copyright (C) 2010-2011 Jonas Bonn 
  *
  * DMA mapping callbacks...
- * As alloc_coherent is the only DMA callback being used currently, that's
- * the only thing implemented properly.  The rest need looking into...
  */
 
 #include 
@@ -67,62 +65,25 @@ static const struct mm_walk_ops clear_nocache_walk_ops = {
.pte_entry  = page_clear_nocache,
 };
 
-/*
- * Alloc "coherent" memory, which for OpenRISC means simply uncached.
- *
- * This function effectively just calls __get_free_pages, sets the
- * cache-inhibit bit on those pages, and makes sure that the pages are
- * flushed out of the cache before they are used.
- *
- * If the NON_CONSISTENT attribute is set, then this function just
- * returns "normal", cachable memory.
- *
- * There are additional flags WEAK_ORDERING and WRITE_COMBINE to take
- * into consideration here, too.  All current known implementations of
- * the OR1K support only strongly ordered memory accesses, so that flag
- * is being ignored for now; uncached but write-combined memory is a
- * missing feature of the OR1K.
- */
-void *
-arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
-   gfp_t gfp, unsigned long attrs)
+int arch_dma_set_uncached(void *cpu_addr, size_t size)
 {
-   unsigned long va;
-   void *page;
-
-   page = alloc_pages_exact(size, gfp | __GFP_ZERO);
-   if (!page)
-   return NULL;
-
-   /* This gives us the real physical address of the first page. */
-   *dma_handle = __pa(page);
-
-   va = (unsigned long)page;
+   unsigned long va = (unsigned long)cpu_addr;
 
/*
 * We need to iterate through the pages, clearing the dcache for
 * them and setting the cache-inhibit bit.
 */
-   if (walk_page_range(_mm, va, va + size, _nocache_walk_ops,
-   NULL)) {
-   free_pages_exact(page, size);
-   return NULL;
-   }
-
-   return (void *)va;
+   return walk_page_range(_mm, va, va + size, _nocache_walk_ops,
+   NULL);
 }
 
-void
-arch_dma_free(struct device *dev, size_t size, void *vaddr,
-   dma_addr_t dma_handle, unsigned long attrs)
+void arch_dma_clear_uncached(void *cpu_addr, size_t size)
 {
-   unsigned long va = (unsigned long)vaddr;
+   unsigned long va = (unsigned long)cpu_addr;
 
/* walk_page_range shouldn't be able to fail here */
WARN_ON(walk_page_range(_mm, va, va + size,
_nocache_walk_ops, NULL));
-
-   free_pages_exact(vaddr, size);
 }
 
 void arch_sync_dma_for_device(phys_addr_t addr, size_t size,
-- 
2.24.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dma-mapping: support setting memory uncached in place

2020-02-20 Thread Christoph Hellwig
We currently only support remapping memory as uncached through vmap
or a magic uncached segment provided by some architectures.  But there
is a simpler and much better way available on some architectures where
we can just remap the memory in place.  The advantages are:

 1) no aliasing is possible, which prevents speculating into the
cached alias
 2) there is no need to allocate new ptes and thus no need for a special
pre-allocated pool of memory that can be used with GFP_ATOMIC DMA
allocations

The downside is that architectures must provide a way to set arbitrary
pages uncached in the kernel mapping, which might not be possible on
architecture that have a special implicit kernel mapping, and requires
splitting of huge page kernel mappings where they exist.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-noncoherent.h |  3 +++
 kernel/dma/Kconfig  |  8 
 kernel/dma/direct.c | 28 ++--
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index ca9b5770caee..0820ec58f119 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -111,4 +111,7 @@ static inline void arch_dma_prep_coherent(struct page 
*page, size_t size)
 void *uncached_kernel_address(void *addr);
 void *cached_kernel_address(void *addr);
 
+int arch_dma_set_uncached(void *cpu_addr, size_t size);
+void arch_dma_clear_uncached(void *cpu_addr, size_t size);
+
 #endif /* _LINUX_DMA_NONCOHERENT_H */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 4c103a24e380..7bc0b77f1243 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -83,6 +83,14 @@ config DMA_DIRECT_REMAP
bool
select DMA_REMAP
 
+#
+# Should be selected if the architecture can remap memory from the page
+# allocator and CMA as uncached and provides the arch_dma_set_uncached and
+# arch_dma_clear_uncached helpers
+#
+config ARCH_HAS_DMA_SET_UNCACHED
+   bool
+
 config DMA_CMA
bool "DMA Contiguous Memory Allocator"
depends on HAVE_DMA_CONTIGUOUS && CMA
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6af7ae83c4ad..73fe65a4cbc0 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -169,11 +169,8 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size),
dma_pgprot(dev, PAGE_KERNEL, attrs),
__builtin_return_address(0));
-   if (!ret) {
-   dma_free_contiguous(dev, page, size);
-   return ret;
-   }
-
+   if (!ret)
+   goto out_free_pages;
memset(ret, 0, size);
goto done;
}
@@ -186,8 +183,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 * so log an error and fail.
 */
dev_info(dev, "Rejecting highmem page from CMA.\n");
-   dma_free_contiguous(dev, page, size);
-   return NULL;
+   goto out_free_pages;
}
 
ret = page_address(page);
@@ -196,10 +192,15 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 
memset(ret, 0, size);
 
-   if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) &&
-   dma_alloc_need_uncached(dev, attrs)) {
+   if (dma_alloc_need_uncached(dev, attrs)) {
arch_dma_prep_coherent(page, size);
-   ret = uncached_kernel_address(ret);
+
+   if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED)) {
+   if (!arch_dma_set_uncached(ret, size))
+   goto out_free_pages;
+   } else if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT)) {
+   ret = uncached_kernel_address(ret);
+   }
}
 done:
if (force_dma_unencrypted(dev))
@@ -207,6 +208,9 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
else
*dma_handle = phys_to_dma(dev, page_to_phys(page));
return ret;
+out_free_pages:
+   dma_free_contiguous(dev, page, size);
+   return NULL;
 }
 
 void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
@@ -230,6 +234,8 @@ void dma_direct_free_pages(struct device *dev, size_t size, 
void *cpu_addr,
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))
vunmap(cpu_addr);
+   else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED))
+   arch_dma_clear_uncached(cpu_addr, size);
 
dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
 }
@@ -238,6 +244,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
if (!IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) &&
+ 

Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-20 Thread Christian Borntraeger



On 20.02.20 17:31, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
>> >From a users perspective it makes absolutely perfect sense to use the
>> bounce buffers when they are NEEDED. 
>> Forcing the user to specify iommu_platform just because you need bounce 
>> buffers
>> really feels wrong. And obviously we have a severe performance issue
>> because of the indirections.
> 
> The point is that the user should not have to specify iommu_platform.

I (and Halil) agree on that. From a user perspective we want to
have the right thing in the guest without any command option. The iommu_plaform
thing was indeed a first step to make things work. 

I might mis-read Halils patch, but isnt one effect of this patch set 
that things WILL work without iommu_platform?


> We need to make sure any new hypervisor (especially one that might require
> bounce buffering) always sets it, as was a rather bogus legacy hack
> that isn't extensibe for cases that for example require bounce buffering.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-20 Thread Christoph Hellwig
On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> >From a users perspective it makes absolutely perfect sense to use the
> bounce buffers when they are NEEDED. 
> Forcing the user to specify iommu_platform just because you need bounce 
> buffers
> really feels wrong. And obviously we have a severe performance issue
> because of the indirections.

The point is that the user should not have to specify iommu_platform.
We need to make sure any new hypervisor (especially one that might require
bounce buffering) always sets it, as was a rather bogus legacy hack
that isn't extensibe for cases that for example require bounce buffering.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: question about iommu_need_mapping

2020-02-20 Thread Jerry Snitselaar

On Thu Feb 20 20, Lu Baolu wrote:

Hi Jerry,

On 2020/2/20 7:55, Jerry Snitselaar wrote:

Is it possible for a device to end up with dev->archdata.iommu == NULL
on iommu_need_mapping in the following instance:

1. iommu_group has dma domain for default
2. device gets private identity domain in intel_iommu_add_device
3. iommu_need_mapping gets called with that device.
4. dmar_remove_one_dev_info sets dev->archdata.iommu = NULL via 
unlink_domain_info.
5. request_default_domain_for_dev exits after checking that 
group->default_domain

   exists, and group->default_domain->type is dma.
6. iommu_request_dma_domain_for_dev returns 0 from 
request_default_domain_for_dev

   and a private dma domain isn't created for the device.



Yes. It's possible.


The case I was seeing went away with commit 9235cb13d7d1 ("iommu/vt-d:
Allow devices with RMRRs to use identity domain"), because it changed
which domain the group and devices were using, but it seems like it is
still a possibility with the code. Baolu, you mentioned possibly
removing the domain switch. Commit 98b2fffb5e27 ("iommu/vt-d: Handle
32bit device with identity default domain") makes it sound like the
domain switch is required.


It's more "nice to have" than "required" if the iommu driver doesn't
disable swiotlb explicitly. The device access of system memory higher
than the device's addressing capability could go through the bounced
buffer implemented in swiotlb.

Best regards,
baolu


Hi Baolu,

Would this mean switching to bounce_dma_ops instead?

Regards,
Jerry


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-20 Thread Christian Borntraeger



On 20.02.20 17:11, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:06:05PM +0100, Halil Pasic wrote:
>> Currently force_dma_unencrypted() is only used by the direct
>> implementation of the DMA API, and thus resides in dma-direct.h. But
>> there is nothing dma-direct specific about it: if one was -- for
>> whatever reason -- to implement custom DMA ops that have to in the
>> encrypted/protected scenarios dma-direct currently deals with, one would
>> need exactly this kind of information.
> 
> I really don't think it has business being anywhre else, and your completely
> bogus second patch just proves the point.

>From a users perspective it makes absolutely perfect sense to use the
bounce buffers when they are NEEDED. 
Forcing the user to specify iommu_platform just because you need bounce buffers
really feels wrong. And obviously we have a severe performance issue
because of the indirections.

Now: I understand that you want to get this fixes differently, but maybe you 
could help to outline how this could be fixed proper. 

Christian

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread Christoph Hellwig
On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 867c7ebd3f10..fafc8f924955 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>   if (!virtio_has_iommu_quirk(vdev))
>   return true;
>  
> + if (force_dma_unencrypted(>dev))
> + return true;

Hell no.  This is a detail of the platform DMA direct implementation.
Drivers have no business looking at this flag, and virtio finally needs
to be fixed to use the DMA API properly for everything but legacy devices.

No amount of desparate hacks is going to fix that fundamental problem,
and I'm starting to get really sick and tired of all the crap patches
published in this area.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-20 Thread Christoph Hellwig
On Thu, Feb 20, 2020 at 05:06:05PM +0100, Halil Pasic wrote:
> Currently force_dma_unencrypted() is only used by the direct
> implementation of the DMA API, and thus resides in dma-direct.h. But
> there is nothing dma-direct specific about it: if one was -- for
> whatever reason -- to implement custom DMA ops that have to in the
> encrypted/protected scenarios dma-direct currently deals with, one would
> need exactly this kind of information.

I really don't think it has business being anywhre else, and your completely
bogus second patch just proves the point.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Halil Pasic
Currently if one intends to run a memory protection enabled VM with
virtio devices and linux as the guest OS, one needs to specify the
VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
linux use the DMA API, which in turn handles the memory
encryption/protection stuff if the guest decides to turn itself into
a protected one. This however makes no sense due to multiple reasons:
* The device is not changed by the fact that the guest RAM is
protected. The so called IOMMU bypass quirk is not affected.
* This usage is not congruent with  standardised semantics of
VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
for using DMA API in virtio (orthogonal with respect to what is
expressed by VIRTIO_F_IOMMU_PLATFORM). 

This series aims to decouple 'have to use DMA API because my (guest) RAM
is protected' and 'have to use DMA API because the device told me
VIRTIO_F_IOMMU_PLATFORM'.

Please find more detailed explanations about the conceptual aspects in
the individual patches. There is however also a very practical problem
that is addressed by this series. 

For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
effect The vhost code assumes it the addresses on the virtio descriptor
ring are not guest physical addresses but iova's, and insists on doing a
translation of these regardless of what transport is used (e.g. whether
we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
"vhost: new device IOTLB API".) On s390 this results in severe
performance degradation (c.a. factor 10). BTW with ccw I/O there is
(architecturally) no IOMMU, so the whole address translation makes no
sense in the context of virtio-ccw.

Halil Pasic (2):
  mm: move force_dma_unencrypted() to mem_encrypt.h
  virtio: let virtio use DMA API when guest RAM is protected

 drivers/virtio/virtio_ring.c |  3 +++
 include/linux/dma-direct.h   |  9 -
 include/linux/mem_encrypt.h  | 10 ++
 3 files changed, 13 insertions(+), 9 deletions(-)


base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-20 Thread Halil Pasic
Currently force_dma_unencrypted() is only used by the direct
implementation of the DMA API, and thus resides in dma-direct.h. But
there is nothing dma-direct specific about it: if one was -- for
whatever reason -- to implement custom DMA ops that have to in the
encrypted/protected scenarios dma-direct currently deals with, one would
need exactly this kind of information.

More importantly, virtio has to use DMA API (so that the memory
encryption (x86) or protection (power, s390) is handled) under the very
same circumstances force_dma_unencrypted() returns true. Furthermore,
the in these cases the reason to go via the DMA API is distinct,
compared to the reason indicated by VIRTIO_F_IOMMU_PLATFORM: we need to
use DMA API independently of the device's properties with regards to
access to memory. I.e. the addresses in the descriptors are still guest
physical addresses, the device may still be implemented by a SMP CPU,
and thus the device shall use those without any further translation. See
[1].

Let's move force_dma_unencrypted() the so virtio, or other
implementations of DMA ops can make the right decisions.

[1] 
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-416
(In the spec VIRTIO_F_IOMMU_PLATFORM is called
VIRTIO_F_ACCESS_PLATFORM).

Signed-off-by: Halil Pasic 
Tested-by: Ram Pai 
Tested-by: Michael Mueller 
---
 include/linux/dma-direct.h  |  9 -
 include/linux/mem_encrypt.h | 10 ++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..590b15d881b0 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -26,15 +26,6 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, 
dma_addr_t dev_addr)
 }
 #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
-#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
-bool force_dma_unencrypted(struct device *dev);
-#else
-static inline bool force_dma_unencrypted(struct device *dev)
-{
-   return false;
-}
-#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
-
 /*
  * If memory encryption is supported, phys_to_dma will set the memory 
encryption
  * bit in the DMA address, and dma_to_phys will clear it.  The raw 
__phys_to_dma
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 5c4a18a91f89..64a48c4b01a2 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,6 +22,16 @@ static inline bool mem_encrypt_active(void) { return false; }
 
 #endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
+struct device;
+#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
+bool force_dma_unencrypted(struct device *dev);
+#else
+static inline bool force_dma_unencrypted(struct device *dev)
+{
+   return false;
+}
+#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 /*
  * The __sme_set() and __sme_clr() macros are useful for adding or removing
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread Halil Pasic
Currently the advanced guest memory protection technologies (AMD SEV,
powerpc secure guest technology and s390 Protected VMs) abuse the
VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
is in turn necessary, to make IO work with guest memory protection.

But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
different beast: with virtio devices whose implementation runs on an SMP
CPU we are still fine with doing all the usual optimizations, it is just
that we need to make sure that the memory protection mechanism does not
get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
side of the guest (and possibly he host side as well) than we actually
need.

An additional benefit of teaching the guest to make the right decision
(and use DMA API) on it's own is: removing the need, to mandate special
VM configuration for guests that may run with protection. This is
especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
the virtio control structures into the first 2G of guest memory:
something we don't necessarily want to do per-default.

Signed-off-by: Halil Pasic 
Tested-by: Ram Pai 
Tested-by: Michael Mueller 
---
 drivers/virtio/virtio_ring.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 867c7ebd3f10..fafc8f924955 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
if (!virtio_has_iommu_quirk(vdev))
return true;
 
+   if (force_dma_unencrypted(>dev))
+   return true;
+
/* Otherwise, we are left to guess. */
/*
 * In theory, it's possible to have a buggy QEMU-supposed
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-20 Thread Lu Baolu

Hi,

On 2020/2/20 18:06, Daniel Drake wrote:

On Wed, Feb 19, 2020 at 11:40 AM Lu Baolu  wrote:

With respect, this is problematical. The parent and all subdevices share
a single translation entry. The DMA mask should be consistent.

Otherwise, for example, subdevice A has 64-bit DMA capability and uses
an identity domain for DMA translation. While subdevice B has 32-bit DMA
capability and is forced to switch to DMA domain. Subdevice A will be
impacted without any notification.

Looking closer, this problematic codepath may already be happening for VMD,
under intel_iommu_add_device(). Consider this function running for a VMD
subdevice, we hit:

     domain = iommu_get_domain_for_dev(dev);

I can't quite grasp the code flow here, but domain->type now always seems
to return the domain type of the real DMA device, which seems like pretty
reasonable behaviour.

     if (domain->type == IOMMU_DOMAIN_DMA) {

and as detailed in previous mails, the real VMD device seems to be in a DMA
domain by default, so we continue.

         if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) {

Now we checked the default domain type of the subdevice. This seems rather
likely to return IDENTITY because that's effectively the default type...

             ret = iommu_request_dm_for_dev(dev);
             if (ret) {
                 dmar_remove_one_dev_info(dev);
                 dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
                 domain_add_dev_info(si_domain, dev);
                 dev_info(dev,
                      "Device uses a private identity domain.\n");
             }
         }

and now we're doing the bad stuff that Lu pointed out: we only have one
mapping shared for all the subdevices, so if we end up changing it for one
subdevice, we're likely to be breaking another.


Yes. Agreed.

By the way, do all subdevices stay in a same iommu group?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-20 Thread Daniel Drake
> On Wed, Feb 19, 2020 at 11:40 AM Lu Baolu  wrote:
> > With respect, this is problematical. The parent and all subdevices share
> > a single translation entry. The DMA mask should be consistent.
> >
> > Otherwise, for example, subdevice A has 64-bit DMA capability and uses
> > an identity domain for DMA translation. While subdevice B has 32-bit DMA
> > capability and is forced to switch to DMA domain. Subdevice A will be
> > impacted without any notification.

Looking closer, this problematic codepath may already be happening for VMD,
under intel_iommu_add_device(). Consider this function running for a VMD
subdevice, we hit:

    domain = iommu_get_domain_for_dev(dev);

I can't quite grasp the code flow here, but domain->type now always seems
to return the domain type of the real DMA device, which seems like pretty
reasonable behaviour.

    if (domain->type == IOMMU_DOMAIN_DMA) {

and as detailed in previous mails, the real VMD device seems to be in a DMA
domain by default, so we continue.

        if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) {

Now we checked the default domain type of the subdevice. This seems rather
likely to return IDENTITY because that's effectively the default type...

            ret = iommu_request_dm_for_dev(dev);
            if (ret) {
                dmar_remove_one_dev_info(dev);
                dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
                domain_add_dev_info(si_domain, dev);
                dev_info(dev,
                     "Device uses a private identity domain.\n");
            }
        }

and now we're doing the bad stuff that Lu pointed out: we only have one
mapping shared for all the subdevices, so if we end up changing it for one
subdevice, we're likely to be breaking another.
In this case iommu_request_dm_for_dev() returns -EBUSY without doing anything
and the following private identity code fortunately seems to have no
consequential effects - the real DMA device continues to operate in the DMA
domain, and all subdevice DMA requests go through the DMA mapping codepath.
That's probably why VMD appears to be working fine anyway, but this seems
fragile.

The following changes enforce that the real DMA device is in the DMA domain,
and avoid the intel_iommu_add_device() codepaths that would try to change
it to a different domain type. Let me know if I'm on the right lines...
---
 drivers/iommu/intel-iommu.c   | 16 
 drivers/pci/controller/intel-nvme-remap.c |  6 ++
 2 files changed, 22 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9644a5b3e0ae..8872b8d1780d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2911,6 +2911,9 @@ static int device_def_domain_type(struct device *dev)
if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);
 
+   if (pci_real_dma_dev(pdev) != pdev)
+   return IOMMU_DOMAIN_DMA;
+
if (device_is_rmrr_locked(dev))
return IOMMU_DOMAIN_DMA;
 
@@ -5580,6 +5583,7 @@ static bool intel_iommu_capable(enum iommu_cap cap)
 
 static int intel_iommu_add_device(struct device *dev)
 {
+   struct device *real_dev = dev;
struct dmar_domain *dmar_domain;
struct iommu_domain *domain;
struct intel_iommu *iommu;
@@ -5591,6 +5595,17 @@ static int intel_iommu_add_device(struct device *dev)
if (!iommu)
return -ENODEV;
 
+   if (dev_is_pci(dev))
+   real_dev = _real_dma_dev(to_pci_dev(dev))->dev;
+
+   if (real_dev != dev) {
+   domain = iommu_get_domain_for_dev(real_dev);
+   if (domain->type != IOMMU_DOMAIN_DMA) {
+   dev_err(dev, "Real DMA device not in DMA domain; can't 
handle DMA\n");
+   return -ENODEV;
+   }
+   }
+
iommu_device_link(>iommu, dev);
 
if (translation_pre_enabled(iommu))

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range

2020-02-20 Thread Will Deacon
On Wed, Feb 19, 2020 at 12:06:28PM -0800, isa...@codeaurora.org wrote:
> On 2020-02-19 03:15, Will Deacon wrote:
> > On Tue, Feb 18, 2020 at 05:57:18PM -0800, isa...@codeaurora.org wrote:
> > > Does this mean that the driver should be managing the IOVA space and
> > > mappings for this device using the IOMMU API? If so, is the
> > > rationale for
> > > this because the device driver can have the information of what IOVA
> > > ranges
> > > can and cannot be used? Shouldn't there be a generic way of
> > > informing an
> > > IOMMU driver about these reserved ranges? Perhaps through a device
> > > tree
> > > property, instead of deferring this type of management to the driver?
> > 
> > Before we dive into designing that, can you please clarify whether the
> > reserved IOVA range applies to all DMA masters mastering through a
> > particular SMMU, or whether it's just about one specific master? I was
> > assuming the former, but wanted to be sure.
> > 
> This situation currently applies to one master.

Interesting. Is it problematic if the range is reserved for all masters
sharing that SMMU?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: sytem reboots when initializing Edirol FA-101 firewire audio interface

2020-02-20 Thread Takashi Sakamoto
Hi,

On Wed, Feb 19, 2020 at 06:19:56AM -0800, Steve Morris wrote:
> I'm running Arch Linux x86_64
> 
> My system consistently reboots when I power on my FA-101 when running
> kernels 5.5.1-4. Downgrading to 5.4.15 allows everything to work
> properly.
> 
> Here's the outpu of:
> journalctl  | grep -E 'Reboot|firewire|fw|bebob|alsa|jack'
> 
> This is a good init under 5.4.15:
> -- Reboot --
> powered on interface while running 5.4.15, initialized properly
> Feb 18 18:35:37 hostname kernel: firewire_ohci :05:00.0: enabling device 
> (0080 -> 0083)
> Feb 18 18:35:37 hostname kernel: firewire_ohci :05:00.0: added OHCI v1.10 
> device as card 0, 4 IR + 8 IT contexts, quirks 0x11
> Feb 18 18:35:37 hostname kernel: firewire_core :05:00.0: created device 
> fw0: GUID 00112277, S400
> Feb 18 18:35:37 hostname systemd-udevd[541]: controlC0: Process 
> '/usr/bin/alsactl restore 0' failed with exit code 99.
> Feb 18 18:35:37 hostname systemd-udevd[541]: controlC1: Process 
> '/usr/bin/alsactl restore 1' failed with exit code 99.
> Feb 18 18:35:38 hostname kernel: amdgpu: [powerplay] smu driver if version = 
> 0x0033, smu fw if version = 0x0035, smu fw version = 0x002a3200 
> (42.50.0)
> Feb 18 18:36:08 hostname kernel: firewire_ohci :05:00.0: isochronous 
> cycle inconsistent
> Feb 18 18:36:08 hostname kernel: firewire_core :05:00.0: created device 
> fw1: GUID 0040abc20bc1, S400
> Feb 18 18:36:08 hostname kernel: firewire_core :05:00.0: phy config: new 
> root=ffc1, gap_count=5
> Feb 18 18:36:10 hostname kernel: firewire_core :05:00.0: BM lock failed 
> (timeout), making local node (ffc0) root
> Feb 18 18:36:10 hostname kernel: firewire_core :05:00.0: phy config: new 
> root=ffc0, gap_count=5
> Feb 18 18:36:12 hostname systemd-udevd[1532]: controlC2: Process 
> '/usr/bin/alsactl restore 2' failed with exit code 99.
> Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: transaction failed: no ack
> Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: fail to get an input for 
> MSU in plug 7: -5
> Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: transaction failed: no ack
> Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: fail to get an input for 
> MSU in plug 7: -5
> 
> This is a bad init under 5.5.4:
> -- Reboot --
> Powered on interface when machine was running.
> Feb 18 18:13:37 hostname kernel: firewire_ohci :05:00.0: enabling device 
> (0080 -> 0083)
> Feb 18 18:13:37 hostname kernel: firewire_ohci :05:00.0: added OHCI v1.10 
> device as card 0, 4 IR + 8 IT contexts, quirks 0x11
> Feb 18 18:13:37 hostname kernel: firewire_core :05:00.0: created device 
> fw0: GUID 00112277, S400
> Feb 18 18:13:37 hostname kernel: audit: type=1130 audit(1582078417.360:3): 
> pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=ufw comm="systemd" 
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> Feb 18 18:13:37 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 
> ses=4294967295 msg='unit=ufw comm="systemd" exe="/usr/lib/systemd/systemd" 
> hostname=? addr=? terminal=? res=success'
> Feb 18 18:13:37 hostname systemd-udevd[539]: controlC0: Process 
> '/usr/bin/alsactl restore 0' failed with exit code 99.
> Feb 18 18:13:37 hostname systemd-udevd[530]: controlC1: Process 
> '/usr/bin/alsactl restore 1' failed with exit code 99.
> Feb 18 18:13:38 hostname kernel: amdgpu: [powerplay] smu driver if version = 
> 0x0033, smu fw if version = 0x0035, smu fw version = 0x002a3200 
> (42.50.0)
> Feb 18 18:19:45 hostname kernel: firewire_core :05:00.0: phy config: new 
> root=ffc1, gap_count=5
> Feb 18 18:19:48 hostname kernel: firewire_core :05:00.0: phy config: new 
> root=ffc1, gap_count=5
> Feb 18 18:19:48 hostname kernel: firewire_core :05:00.0: created device 
> fw1: GUID 0040abc20bc1, S400
> Feb 18 18:19:52 hostname systemd-udevd[1682]: controlC2: Process 
> '/usr/bin/alsactl restore 2' failed with exit code 99.
> Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: transaction failed: no ack
> Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: fail to get an input for 
> MSU in plug 7: -5
> Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: transaction failed: no ack
> Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: fail to get an input for 
> MSU in plug 7: -5
> Feb 18 18:20:12 hostname kernel: firewire_core :05:00.0: phy config: new 
> root=ffc1, gap_count=5
> Feb 18 18:21:24 hostname kernel: firewire_ohci :05:00.0: AMD-Vi: Event 
> logged [IO_PAGE_FAULT domain=0x0001 address=0xd508 flags=0x]
> Followed by 125 more O_PAGE_FAULT and then the reboot,
> 
> I'll be happy to provide additional information if needed.

Hm. I think your system rushed into too complicated issue, At least, three
software stacks are related to your issue:
 - ALSA bebob driver(snd-bebob) and helper module(snd-firewire-lib)
 - Linux firewire core(firewire-core) and 1394 OHCI controller driver 
(firewire-ohci)
 - Driver