Re: [RFC v2 1/3] vfio: VFIO_IOMMU_CACHE_INVALIDATE

2019-11-12 Thread Auger Eric
Hi Yi,

On 11/6/19 2:31 AM, Liu, Yi L wrote:
>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>> Sent: Wednesday, November 6, 2019 6:42 AM
>> To: Liu, Yi L 
>> Subject: Re: [RFC v2 1/3] vfio: VFIO_IOMMU_CACHE_INVALIDATE
>>
>> On Fri, 25 Oct 2019 11:20:40 +
>> "Liu, Yi L"  wrote:
>>
>>> Hi Kevin,
>>>
 From: Tian, Kevin
 Sent: Friday, October 25, 2019 5:14 PM
 To: Liu, Yi L ; alex.william...@redhat.com;
 Subject: RE: [RFC v2 1/3] vfio: VFIO_IOMMU_CACHE_INVALIDATE

> From: Liu, Yi L
> Sent: Thursday, October 24, 2019 8:26 PM
>
> From: Liu Yi L 
>
> When the guest "owns" the stage 1 translation structures,  the
> host IOMMU driver has no knowledge of caching structure updates
> unless the guest invalidation requests are trapped and passed down to the 
> host.
>
> This patch adds the VFIO_IOMMU_CACHE_INVALIDATE ioctl with aims at
> propagating guest stage1 IOMMU cache invalidations to the host.
>
> Cc: Kevin Tian 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Eric Auger 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 55
> +
>  include/uapi/linux/vfio.h   | 13 ++
>  2 files changed, 68 insertions(+)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c index 96fddc1d..cd8d3a5 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -124,6 +124,34 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>
> +struct domain_capsule {
> + struct iommu_domain *domain;
> + void *data;
> +};
> +
> +/* iommu->lock must be held */
> +static int
> +vfio_iommu_lookup_dev(struct vfio_iommu *iommu,
> +   int (*fn)(struct device *dev, void *data),
> +   void *data)

 'lookup' usually means find a device and then return. But the real
 purpose here is to loop all the devices within this container and
 then do something. Does it make more sense to be vfio_iommu_for_each_dev?
>>
>> +1
>>
>>> yep, I can replace it.
>>>

> +{
> + struct domain_capsule dc = {.data = data};
> + struct vfio_domain *d;
>>> [...]
 2315,6 +2352,24 @@
> static long vfio_iommu_type1_ioctl(void *iommu_data,
>
>   return copy_to_user((void __user *)arg, , minsz) ?
>   -EFAULT : 0;
> + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) {
> + struct vfio_iommu_type1_cache_invalidate ustruct;

 it's weird to call a variable as struct.
>>>
>>> Will fix it.
>>>
> + int ret;
> +
> + minsz = offsetofend(struct
> vfio_iommu_type1_cache_invalidate,
> + info);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + ret = vfio_iommu_lookup_dev(iommu, vfio_cache_inv_fn,
> + );
> + mutex_unlock(>lock);
> + return ret;
>   }
>
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9e843a1..ccf60a2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -794,6 +794,19 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE   _IO(VFIO_TYPE, VFIO_BASE + 16)
>
> +/**
> + * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE +
> 24,
>>
>> What's going on with these ioctl numbers?  AFAICT[1] we've used up through
>> VFIO_BASE + 21, this jumps to 24, the next patch skips to 27, then the last 
>> patch fills
>> in 28 & 29.  Thanks,
> 
> Hi Alex,
> 
> I rebase my patch to Eric's nested stage translation patches. His base also 
> introduced
> IOCTLs. I should have made it better. I'll try to sync with Eric to serialize 
> the IOCTLs.
> 
> [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger
> https://lkml.org/lkml/2019/3/17/124

Feel free to choose your IOCTL numbers without taking care of my series.
I will adapt to yours if my work gets unblocked at some point.

Thanks

Eric
> 
> Thanks,
> Yi Liu
> 
>> Alex
>>
>> [1] git grep -h VFIO_BASE | grep "VFIO_BASE +" | grep -e ^#define | \
>> awk '{print $NF}' | tr -d ')' | sort -u -n
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to host

2019-11-12 Thread Liu, Yi L
> From: Alex Williamson 
> Sent: Wednesday, November 13, 2019 1:26 AM
> To: Liu, Yi L 
> Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to 
> host
> 
> On Tue, 12 Nov 2019 11:21:40 +
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson < alex.william...@redhat.com >
> > > Sent: Friday, November 8, 2019 7:21 AM
> > > To: Liu, Yi L 
> > > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page 
> > > tables) to host
> > >
> > > On Thu, 24 Oct 2019 08:26:23 -0400
> > > Liu Yi L  wrote:
> > >
> > > > This patch adds vfio support to bind guest translation structure
> > > > to host iommu. VFIO exposes iommu programming capability to user-
> > > > space. Guest is a user-space application in host under KVM solution.
> > > > For SVA usage in Virtual Machine, guest owns GVA->GPA translation
> > > > structure. And this part should be passdown to host to enable nested
> > > > translation (or say two stage translation). This patch reuses the
> > > > VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and adds new
> > > > bind type for binding guest owned translation structure to host.
> > > >
> > > > *) Add two new ioctls for VFIO containers.
> > > >
> > > >   - VFIO_IOMMU_BIND: for bind request from userspace, it could be
> > > >bind a process to a pasid or bind a guest pasid
> > > >to a device, this is indicated by type
> > > >   - VFIO_IOMMU_UNBIND: for unbind request from userspace, it could be
> > > >unbind a process to a pasid or unbind a guest pasid
> > > >to a device, also indicated by type
> > > >   - Bind type:
> > > > VFIO_IOMMU_BIND_PROCESS: user-space request to bind a process
> > > >to a device
> > > > VFIO_IOMMU_BIND_GUEST_PASID: bind guest owned translation
> > > >structure to host iommu. e.g. guest page table
> > > >
> > > > *) Code logic in vfio_iommu_type1_ioctl() to handle
> VFIO_IOMMU_BIND/UNBIND
> > > >
[...]
> > > > +static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu,
> > > > +   void __user *arg,
> > > > +   struct 
> > > > vfio_iommu_type1_bind *bind)
> > > > +{
> > > > +   struct iommu_gpasid_bind_data gbind_data;
> > > > +   unsigned long minsz;
> > > > +   int ret = 0;
> > > > +
> > > > +   minsz = sizeof(*bind) + sizeof(gbind_data);
> > > > +   if (bind->argsz < minsz)
> > > > +   return -EINVAL;
> > >
> > > But gbind_data can change size if new vendor specific data is added to
> > > the union, so kernel updates break existing userspace.  Fail.
> >
> > yes, we have a version field in struct iommu_gpasid_bind_data. How
> > about doing sanity check per versions? kernel knows the gbind_data
> > size of specific versions. Does it make sense? If yes, I'll also apply it
> > to the other sanity check in this series to avoid userspace fail after
> > kernel update.
> 
> Has it already been decided that the version field will be updated for
> every addition to the union?

No, just my proposal. Jacob may help to explain the purpose of version
field. But if we may be too  "frequent" for an uapi version number updating
if we inc version for each change in the union part. I may vote for the
second option from you below.

> It seems there are two options, either
> the version definition includes the possible contents of the union,
> which means we need to support multiple versions concurrently in the
> kernel to maintain compatibility with userspace and follow deprecation
> protocols for removing that support, or we need to consider version to
> be the general form of the structure and interpret the format field to
> determine necessary length to copy from the user.

As I mentioned above, may be better to let @version field only over the
general fields and let format to cover the possible changes in union. e.g.
IOMMU_PASID_FORMAT_INTEL_VTD2 may means version 2 of Intel
VT-d bind. But either way, I think we need to let kernel maintain multiple
versions to support compatible userspace. e.g. may have multiple versions
iommu_gpasid_bind_data_vtd struct in the union part.

> 
> > > > +
> > > > +   if (copy_from_user(_data, arg, sizeof(gbind_data)))
> > > > +   return -EFAULT;
> > > > +
> > > > +   mutex_lock(>lock);
> > > > +   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > > > +   ret = -EINVAL;
> > > > +   goto out_unlock;
> > > > +   }
> > > > +
> > > > +   ret = vfio_iommu_type1_do_guest_unbind(iommu, _data);
> > > > +
> > > > +out_unlock:
> > > > +   mutex_unlock(>lock);
> > > > +   return ret;
> > > > +}
> > > > +
> > > >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > >unsigned int cmd, unsigned long arg)
> > > >  {
> > > > @@ -2484,6 +2582,44 @@ static long 

[PATCH 1/3] dma-direct: unify the dma_capable definitions

2019-11-12 Thread Christoph Hellwig
Currently each architectures that wants to override dma_to_phys and
phys_to_dma also has to provide dma_capable.  But there isn't really
any good reason for that.  powerpc and mips just have copies of the
generic one minus the latests fix, and the arm one was the inspiration
for said fix, but misses the bus_dma_mask handling.
Make all architectures use the generic version instead.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/dma-direct.h | 19 ---
 arch/mips/include/asm/dma-direct.h|  8 
 arch/powerpc/include/asm/dma-direct.h |  9 -
 include/linux/dma-direct.h|  2 +-
 4 files changed, 1 insertion(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index b67e5fc1fe43..7c3001a6a775 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -14,23 +14,4 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, 
dma_addr_t dev_addr)
return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset;
 }
 
-static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
-{
-   u64 limit, mask;
-
-   if (!dev->dma_mask)
-   return 0;
-
-   mask = *dev->dma_mask;
-
-   limit = (mask + 1) & ~mask;
-   if (limit && size > limit)
-   return 0;
-
-   if ((addr | (addr + size - 1)) & ~mask)
-   return 0;
-
-   return 1;
-}
-
 #endif /* ASM_ARM_DMA_DIRECT_H */
diff --git a/arch/mips/include/asm/dma-direct.h 
b/arch/mips/include/asm/dma-direct.h
index b5c240806e1b..14e352651ce9 100644
--- a/arch/mips/include/asm/dma-direct.h
+++ b/arch/mips/include/asm/dma-direct.h
@@ -2,14 +2,6 @@
 #ifndef _MIPS_DMA_DIRECT_H
 #define _MIPS_DMA_DIRECT_H 1
 
-static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
-{
-   if (!dev->dma_mask)
-   return false;
-
-   return addr + size - 1 <= *dev->dma_mask;
-}
-
 dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr);
 phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr);
 
diff --git a/arch/powerpc/include/asm/dma-direct.h 
b/arch/powerpc/include/asm/dma-direct.h
index a2912b47102c..e29e8a236b8d 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -2,15 +2,6 @@
 #ifndef ASM_POWERPC_DMA_DIRECT_H
 #define ASM_POWERPC_DMA_DIRECT_H 1
 
-static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
-{
-   if (!dev->dma_mask)
-   return false;
-
-   return addr + size - 1 <=
-   min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
-}
-
 static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
if (!dev)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6db863c3eb93..991f8aa2676e 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -24,6 +24,7 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, 
dma_addr_t dev_addr)
 
return paddr + ((phys_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
 }
+#endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
 {
@@ -38,7 +39,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size)
 
return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
 }
-#endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
 #ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
 bool force_dma_unencrypted(struct device *dev);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


unify the dma_capable definition

2019-11-12 Thread Christoph Hellwig
Hi all,

there is no good reason to have different version of dma_capable.
This series removes the arch overrides and also adds a few cleanups
in the same area.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] dma-direct: avoid a forward declaration for phys_to_dma

2019-11-12 Thread Christoph Hellwig
Move dma_capable down a bit so that we don't need a forward declaration
for phys_to_dma.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 991f8aa2676e..f8959f75e496 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -6,8 +6,6 @@
 #include  /* for min_low_pfn */
 #include 
 
-static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr);
-
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
 #else
@@ -26,20 +24,6 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, 
dma_addr_t dev_addr)
 }
 #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
-static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
-{
-   dma_addr_t end = addr + size - 1;
-
-   if (!dev->dma_mask)
-   return false;
-
-   if (!IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
-   min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
-   return false;
-
-   return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
-}
-
 #ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED
 bool force_dma_unencrypted(struct device *dev);
 #else
@@ -65,6 +49,20 @@ static inline phys_addr_t dma_to_phys(struct device *dev, 
dma_addr_t daddr)
return __sme_clr(__dma_to_phys(dev, daddr));
 }
 
+static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
+{
+   dma_addr_t end = addr + size - 1;
+
+   if (!dev->dma_mask)
+   return false;
+
+   if (!IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
+   min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
+   return false;
+
+   return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
+}
+
 u64 dma_direct_get_required_mask(struct device *dev);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] powerpc: remove support for NULL dev in __phys_to_dma / __dma_to_phys

2019-11-12 Thread Christoph Hellwig
Support for calling the DMA API functions without a valid device pointer
was removed a while ago, so remove the stale support for that from the
powerpc __phys_to_dma / __dma_to_phys helpers.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-direct.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-direct.h 
b/arch/powerpc/include/asm/dma-direct.h
index e29e8a236b8d..abc154d784b0 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -4,15 +4,11 @@
 
 static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-   if (!dev)
-   return paddr + PCI_DRAM_OFFSET;
return paddr + dev->archdata.dma_offset;
 }
 
 static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-   if (!dev)
-   return daddr - PCI_DRAM_OFFSET;
return daddr - dev->archdata.dma_offset;
 }
 #endif /* ASM_POWERPC_DMA_DIRECT_H */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


remove DMA_ATTR_WRITE_BARRIER

2019-11-12 Thread Christoph Hellwig
There is no implementation of the DMA_ATTR_WRITE_BARRIER flag left
now that the ia64 sn2 code has been removed.  Drop the flag and
the calling convention to set it in the RDMA code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flag

2019-11-12 Thread Christoph Hellwig
This flag is not implemented by any backend and only set by the ib_umem
module in a single instance.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-attributes.txt | 18 --
 drivers/infiniband/core/umem.c   |  9 ++---
 include/linux/dma-mapping.h  |  5 +
 3 files changed, 3 insertions(+), 29 deletions(-)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 8f8d97f65d73..29dcbe8826e8 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -5,24 +5,6 @@ DMA attributes
 This document describes the semantics of the DMA attributes that are
 defined in linux/dma-mapping.h.
 
-DMA_ATTR_WRITE_BARRIER
---
-
-DMA_ATTR_WRITE_BARRIER is a (write) barrier attribute for DMA.  DMA
-to a memory region with the DMA_ATTR_WRITE_BARRIER attribute forces
-all pending DMA writes to complete, and thus provides a mechanism to
-strictly order DMA from a device across all intervening busses and
-bridges.  This barrier is not specific to a particular type of
-interconnect, it applies to the system as a whole, and so its
-implementation must account for the idiosyncrasies of the system all
-the way from the DMA device to memory.
-
-As an example of a situation where DMA_ATTR_WRITE_BARRIER would be
-useful, suppose that a device does a DMA write to indicate that data is
-ready and available in memory.  The DMA of the "completion indication"
-could race with data DMA.  Mapping the memory used for completion
-indications with DMA_ATTR_WRITE_BARRIER would prevent the race.
-
 DMA_ATTR_WEAK_ORDERING
 --
 
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 24244a2f68cc..66148739b00f 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -199,7 +199,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
struct mm_struct *mm;
unsigned long npages;
int ret;
-   unsigned long dma_attrs = 0;
struct scatterlist *sg;
unsigned int gup_flags = FOLL_WRITE;
 
@@ -211,9 +210,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
if (!context)
return ERR_PTR(-EIO);
 
-   if (dmasync)
-   dma_attrs |= DMA_ATTR_WRITE_BARRIER;
-
/*
 * If the combination of the addr and size requested for this memory
 * region causes an integer overflow, return error.
@@ -294,11 +290,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
 
sg_mark_end(sg);
 
-   umem->nmap = ib_dma_map_sg_attrs(context->device,
+   umem->nmap = ib_dma_map_sg(context->device,
  umem->sg_head.sgl,
  umem->sg_nents,
- DMA_BIDIRECTIONAL,
- dma_attrs);
+ DMA_BIDIRECTIONAL);
 
if (!umem->nmap) {
ret = -ENOMEM;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4d450672b7d6..a4930310d0c7 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -15,11 +15,8 @@
 /**
  * List of possible attributes associated with a DMA mapping. The semantics
  * of each attribute should be defined in Documentation/DMA-attributes.txt.
- *
- * DMA_ATTR_WRITE_BARRIER: DMA to a memory region with this attribute
- * forces all pending DMA writes to complete.
  */
-#define DMA_ATTR_WRITE_BARRIER (1UL << 0)
+
 /*
  * DMA_ATTR_WEAK_ORDERING: Specifies that reads and writes to the mapping
  * may be weakly ordered, that is that reads and writes may pass each other.
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] IB/umem: remove the dmasync argument to ib_umem_get

2019-11-12 Thread Christoph Hellwig
The argument is always ignored, so remove it.

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/core/umem.c|  3 +--
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  | 10 -
 drivers/infiniband/hw/cxgb3/iwch_provider.c   |  2 +-
 drivers/infiniband/hw/cxgb4/mem.c |  2 +-
 drivers/infiniband/hw/efa/efa_verbs.c |  2 +-
 drivers/infiniband/hw/hns/hns_roce_cq.c   |  2 +-
 drivers/infiniband/hw/hns/hns_roce_db.c   |  2 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c   |  4 ++--
 drivers/infiniband/hw/hns/hns_roce_qp.c   |  2 +-
 drivers/infiniband/hw/hns/hns_roce_srq.c  |  4 ++--
 drivers/infiniband/hw/i40iw/i40iw_verbs.c |  2 +-
 drivers/infiniband/hw/mlx4/cq.c   |  2 +-
 drivers/infiniband/hw/mlx4/doorbell.c |  2 +-
 drivers/infiniband/hw/mlx4/mr.c   |  2 +-
 drivers/infiniband/hw/mlx4/qp.c   |  5 ++---
 drivers/infiniband/hw/mlx4/srq.c  |  2 +-
 drivers/infiniband/hw/mlx5/cq.c   |  4 ++--
 drivers/infiniband/hw/mlx5/devx.c |  2 +-
 drivers/infiniband/hw/mlx5/doorbell.c |  2 +-
 drivers/infiniband/hw/mlx5/mr.c   |  2 +-
 drivers/infiniband/hw/mlx5/qp.c   |  4 ++--
 drivers/infiniband/hw/mlx5/srq.c  |  2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |  4 +---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |  2 +-
 drivers/infiniband/hw/qedr/verbs.c| 21 +--
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |  4 ++--
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |  2 +-
 drivers/infiniband/sw/rdmavt/mr.c |  2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c|  2 +-
 include/rdma/ib_umem.h|  4 ++--
 32 files changed, 52 insertions(+), 57 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 66148739b00f..7a3b99597ead 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -185,10 +185,9 @@ EXPORT_SYMBOL(ib_umem_find_best_pgsz);
  * @addr: userspace virtual address to start at
  * @size: length of region to pin
  * @access: IB_ACCESS_xxx flags for memory being pinned
- * @dmasync: flush in-flight DMA when the memory region is written
  */
 struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
-   size_t size, int access, int dmasync)
+   size_t size, int access)
 {
struct ib_ucontext *context;
struct ib_umem *umem;
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index b4149dc9e824..99a8e0b99e66 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -855,7 +855,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, 
struct bnxt_re_pd *pd,
bytes += (qplib_qp->sq.max_wqe * psn_sz);
}
bytes = PAGE_ALIGN(bytes);
-   umem = ib_umem_get(udata, ureq.qpsva, bytes, IB_ACCESS_LOCAL_WRITE, 1);
+   umem = ib_umem_get(udata, ureq.qpsva, bytes, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(umem))
return PTR_ERR(umem);
 
@@ -869,7 +869,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, 
struct bnxt_re_pd *pd,
bytes = (qplib_qp->rq.max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
bytes = PAGE_ALIGN(bytes);
umem = ib_umem_get(udata, ureq.qprva, bytes,
-  IB_ACCESS_LOCAL_WRITE, 1);
+  IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(umem))
goto rqfail;
qp->rumem = umem;
@@ -1322,7 +1322,7 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev,
 
bytes = (qplib_srq->max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
bytes = PAGE_ALIGN(bytes);
-   umem = ib_umem_get(udata, ureq.srqva, bytes, IB_ACCESS_LOCAL_WRITE, 1);
+   umem = ib_umem_get(udata, ureq.srqva, bytes, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(umem))
return PTR_ERR(umem);
 
@@ -2565,7 +2565,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct 
ib_cq_init_attr *attr,
 
cq->umem = ib_umem_get(udata, req.cq_va,
   entries * sizeof(struct cq_base),
-  IB_ACCESS_LOCAL_WRITE, 1);
+  IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(cq->umem)) {
rc = PTR_ERR(cq->umem);
goto fail;
@@ -3530,7 +3530,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, 
u64 start, u64 length,
/* The fixed portion of the rkey is the same as the lkey */
mr->ib_mr.rkey = mr->qplib_mr.rkey;
 
-   umem = ib_umem_get(udata, start, 

[PATCH 1/3] x86: Remove the calgary IOMMU driver

2019-11-12 Thread Christoph Hellwig
The calgary IOMMU was only used on high-end IBM systems in the early
x86_64 age and has no known users left.  Remove it to avoid having to
touch it for pending changes to the DMA API.

Signed-off-by: Christoph Hellwig 
---
 MAINTAINERS   |   10 -
 arch/x86/Kconfig  |   30 -
 arch/x86/configs/x86_64_defconfig |1 -
 arch/x86/include/asm/calgary.h|   57 --
 arch/x86/include/asm/pci_64.h |   14 -
 arch/x86/include/asm/tce.h|   35 -
 arch/x86/kernel/Makefile  |1 -
 arch/x86/kernel/pci-calgary_64.c  | 1586 -
 arch/x86/kernel/pci-dma.c |6 -
 arch/x86/kernel/tce_64.c  |  177 
 10 files changed, 1917 deletions(-)
 delete mode 100644 arch/x86/include/asm/calgary.h
 delete mode 100644 arch/x86/include/asm/tce.h
 delete mode 100644 arch/x86/kernel/pci-calgary_64.c
 delete mode 100644 arch/x86/kernel/tce_64.c

diff --git a/MAINTAINERS b/MAINTAINERS
index eb19fad370d7..778e8dee3e9d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3627,16 +3627,6 @@ L:   c...@lists.bufferbloat.net (moderated for 
non-subscribers)
 S: Maintained
 F: net/sched/sch_cake.c
 
-CALGARY x86-64 IOMMU
-M: Muli Ben-Yehuda 
-M: Jon Mason 
-L: iommu@lists.linux-foundation.org
-S: Maintained
-F: arch/x86/kernel/pci-calgary_64.c
-F: arch/x86/kernel/tce_64.c
-F: arch/x86/include/asm/calgary.h
-F: arch/x86/include/asm/tce.h
-
 CAN NETWORK DRIVERS
 M: Wolfgang Grandegger 
 M: Marc Kleine-Budde 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d6e1faa28c58..4d27bdc85bf2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -932,36 +932,6 @@ config GART_IOMMU
 
  If unsure, say Y.
 
-config CALGARY_IOMMU
-   bool "IBM Calgary IOMMU support"
-   select IOMMU_HELPER
-   select SWIOTLB
-   depends on X86_64 && PCI
-   ---help---
- Support for hardware IOMMUs in IBM's xSeries x366 and x460
- systems. Needed to run systems with more than 3GB of memory
- properly with 32-bit PCI devices that do not support DAC
- (Double Address Cycle). Calgary also supports bus level
- isolation, where all DMAs pass through the IOMMU.  This
- prevents them from going anywhere except their intended
- destination. This catches hard-to-find kernel bugs and
- mis-behaving drivers and devices that do not use the DMA-API
- properly to set up their DMA buffers.  The IOMMU can be
- turned off at boot time with the iommu=off parameter.
- Normally the kernel will make the right choice by itself.
- If unsure, say Y.
-
-config CALGARY_IOMMU_ENABLED_BY_DEFAULT
-   def_bool y
-   prompt "Should Calgary be enabled by default?"
-   depends on CALGARY_IOMMU
-   ---help---
- Should Calgary be enabled by default? if you choose 'y', Calgary
- will be used (if it exists). If you choose 'n', Calgary will not be
- used even if it exists. If you choose 'n' and would like to use
- Calgary anyway, pass 'iommu=calgary' on the kernel command line.
- If unsure, say Y.
-
 config MAXSMP
bool "Enable Maximum number of SMP Processors and NUMA Nodes"
depends on X86_64 && SMP && DEBUG_KERNEL
diff --git a/arch/x86/configs/x86_64_defconfig 
b/arch/x86/configs/x86_64_defconfig
index d0a5ffeae8df..0b9654c7a05c 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -25,7 +25,6 @@ CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 CONFIG_MODULE_FORCE_UNLOAD=y
 CONFIG_SMP=y
-CONFIG_CALGARY_IOMMU=y
 CONFIG_NR_CPUS=64
 CONFIG_SCHED_SMT=y
 CONFIG_PREEMPT_VOLUNTARY=y
diff --git a/arch/x86/include/asm/calgary.h b/arch/x86/include/asm/calgary.h
deleted file mode 100644
index facd374a1bf7..
--- a/arch/x86/include/asm/calgary.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * Derived from include/asm-powerpc/iommu.h
- *
- * Copyright IBM Corporation, 2006-2007
- *
- * Author: Jon Mason 
- * Author: Muli Ben-Yehuda 
- */
-
-#ifndef _ASM_X86_CALGARY_H
-#define _ASM_X86_CALGARY_H
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct iommu_table {
-   const struct cal_chipset_ops *chip_ops; /* chipset specific funcs */
-   unsigned long  it_base;  /* mapped address of tce table */
-   unsigned long  it_hint;  /* Hint for next alloc */
-   unsigned long *it_map;   /* A simple allocation bitmap for now */
-   void __iomem  *bbar; /* Bridge BAR */
-   u64tar_val;  /* Table Address Register */
-   struct timer_list watchdog_timer;
-   spinlock_t it_lock;  /* Protects it_map */
-   unsigned int   it_size;  /* Size of iommu table in entries */
-   unsigned char  it_busno; /* Bus number this table belongs to */
-};
-
-struct cal_chipset_ops {
-   void (*handle_quirks)(struct iommu_table 

[PATCH 2/3] x86/pci: Remove pci_64.h

2019-11-12 Thread Christoph Hellwig
This file only contains external declarations for two non-existing
function pointers.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/pci.h|  4 
 arch/x86/include/asm/pci_64.h | 14 --
 2 files changed, 18 deletions(-)
 delete mode 100644 arch/x86/include/asm/pci_64.h

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index e662f987dfa2..d9e28aad2738 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -120,10 +120,6 @@ void native_restore_msi_irqs(struct pci_dev *dev);
 #endif
 #endif  /* __KERNEL__ */
 
-#ifdef CONFIG_X86_64
-#include 
-#endif
-
 /* generic pci stuff */
 #include 
 
diff --git a/arch/x86/include/asm/pci_64.h b/arch/x86/include/asm/pci_64.h
deleted file mode 100644
index 4e1aef506aa5..
--- a/arch/x86/include/asm/pci_64.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_PCI_64_H
-#define _ASM_X86_PCI_64_H
-
-#ifdef __KERNEL__
-
-extern int (*pci_config_read)(int seg, int bus, int dev, int fn,
- int reg, int len, u32 *value);
-extern int (*pci_config_write)(int seg, int bus, int dev, int fn,
-  int reg, int len, u32 value);
-
-#endif /* __KERNEL__ */
-
-#endif /* _ASM_X86_PCI_64_H */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] x86/pci: Remove #ifdef __KERNEL__ guard from

2019-11-12 Thread Christoph Hellwig
pci.h is not a UAPI header, so the __KERNEL__ ifdef is rather pointless.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/pci.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index d9e28aad2738..90d0731fdcb6 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -12,8 +12,6 @@
 #include 
 #include 
 
-#ifdef __KERNEL__
-
 struct pci_sysdata {
int domain; /* PCI domain */
int node;   /* NUMA node */
@@ -118,7 +116,6 @@ void native_restore_msi_irqs(struct pci_dev *dev);
 #define native_setup_msi_irqs  NULL
 #define native_teardown_msi_irqNULL
 #endif
-#endif  /* __KERNEL__ */
 
 /* generic pci stuff */
 #include 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Remove the calgary iommu driver

2019-11-12 Thread Christoph Hellwig
Hi all,

per the discussion a few month ago with Jon we've agreed that the
calgary iommu driver is obsolete and in the way. Jon said back then
he'd send a patch to remove it, but as that didn't happen I prepared
one, plus two small cleanups touching the same area.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops

2019-11-12 Thread Christoph Hellwig
On Wed, Nov 13, 2019 at 10:50:27AM +0800, Lu Baolu wrote:
> Currently, this is a block issue for using per-device dma ops in Intel
> IOMMU driver. Hence block this driver from using the generic iommu dma
> ops.

That is in fact the reason why I bring it up :)

> I'd like to align Intel IOMMU driver with other vendors. Use iommu dma
> ops for devices which have been selected to go through iommu. And use
> direct dma ops if selected to by pass.
>
> One concern of this propose is that for devices with limited address
> capability, shall we force it to use iommu or alternatively use swiotlb
> if user decides to let it by pass iommu.
>
> I understand that using swiotlb will cause some overhead due to the
> bounced buffer, but Intel IOMMU is default on hence any users who use a
> default kernel won't suffer this. We only need to document this so that
> users understand this overhead when they decide to let such devices by
> pass iommu. This is common to all vendor iommu drivers as far as I can
> see.

Indeed.  And one idea would be to lift the code in the powerpc
dma_iommu_ops that check a flag and use the direct ops to the generic
dma code and a flag in struct device.  We can then switch the intel
iommu ops (and AMD Gart) over to it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops

2019-11-12 Thread Lu Baolu

Hi Christoph,

On 11/12/19 3:16 PM, Christoph Hellwig wrote:

On Fri, Jul 26, 2019 at 09:56:51AM +0800, Lu Baolu wrote:

I think current code doesn't do the right thing. The user asks the iommu
driver to use identity domain for a device, but the driver force it back
to DMA domain because of the device address capability.


expensive.  I don't think that this change is a good idea, and even if
we decide that this is a good idea after all that should be done in a
separate prep patch that explains the rationale.


Yes. Make sense.


Now that the bounce code has landed it might be good time to revisit
this patch in isolation and with a better explanation.



Yes. Thanks for bringing this up.

Currently, this is a block issue for using per-device dma ops in Intel
IOMMU driver. Hence block this driver from using the generic iommu dma
ops.

I'd like to align Intel IOMMU driver with other vendors. Use iommu dma
ops for devices which have been selected to go through iommu. And use
direct dma ops if selected to by pass.

One concern of this propose is that for devices with limited address
capability, shall we force it to use iommu or alternatively use swiotlb
if user decides to let it by pass iommu.

I understand that using swiotlb will cause some overhead due to the
bounced buffer, but Intel IOMMU is default on hence any users who use a
default kernel won't suffer this. We only need to document this so that
users understand this overhead when they decide to let such devices by
pass iommu. This is common to all vendor iommu drivers as far as I can
see.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.14 010/115] iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()

2019-11-12 Thread Sasha Levin
From: Robin Murphy 

[ Upstream commit 85c7a0f1ef624ef58173ef52ea77780257bdfe04 ]

In removing the pagetable-wide lock, we gained the possibility of the
vanishingly unlikely case where we have a race between two concurrent
unmappers splitting the same block entry. The logic to handle this is
fairly straightforward - whoever loses the race frees their partial
next-level table and instead dereferences the winner's newly-installed
entry in order to fall back to a regular unmap, which intentionally
echoes the pre-existing case of recursively splitting a 1GB block down
to 4KB pages by installing a full table of 2MB blocks first.

Unfortunately, the chump who implemented that logic failed to update the
condition check for that fallback, meaning that if said race occurs at
the last level (where the loser's unmap_idx is valid) then the unmap
won't actually happen. Fix that to properly account for both the race
and recursive cases.

Fixes: 2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")
Signed-off-by: Robin Murphy 
[will: re-jig control flow to avoid duplicate cmpxchg test]
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/io-pgtable-arm.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e8018a308868e..17a9225283dd1 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -551,13 +551,12 @@ static int arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
return 0;
 
tablep = iopte_deref(pte, data);
+   } else if (unmap_idx >= 0) {
+   io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
+   return size;
}
 
-   if (unmap_idx < 0)
-   return __arm_lpae_unmap(data, iova, size, lvl, tablep);
-
-   io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
-   return size;
+   return __arm_lpae_unmap(data, iova, size, lvl, tablep);
 }
 
 static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 022/209] iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout

2019-11-12 Thread Sasha Levin
From: Zhen Lei 

[ Upstream commit 0f02477d16980938a84aba8688a4e3a303306116 ]

The condition break condition of:

(int)(VAL - sync_idx) >= 0

in the __arm_smmu_sync_poll_msi() polling loop requires that sync_idx
must be increased monotonically according to the sequence of the CMDs in
the cmdq.

However, since the msidata is populated using atomic_inc_return_relaxed()
before taking the command-queue spinlock, then the following scenario
can occur:

CPU0CPU1
msidata=0
msidata=1
insert cmd1
insert cmd0
smmu execute cmd1
smmu execute cmd0
poll timeout, because msidata=1 is overridden by
cmd0, that means VAL=0, sync_idx=1.

This is not a functional problem, since the caller will eventually either
timeout or exit due to another CMD_SYNC, however it's clearly not what
the code is supposed to be doing. Fix it, by incrementing the sequence
count with the command-queue lock held, allowing us to drop the atomic
operations altogether.

Signed-off-by: Zhen Lei 
[will: dropped the specialised cmd building routine for now]
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/arm-smmu-v3.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 40fbf20d69e5a..2ab7100bcff12 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -567,7 +567,7 @@ struct arm_smmu_device {
 
int gerr_irq;
int combined_irq;
-   atomic_tsync_nr;
+   u32 sync_nr;
 
unsigned long   ias; /* IPA */
unsigned long   oas; /* PA */
@@ -964,14 +964,13 @@ static int __arm_smmu_cmdq_issue_sync_msi(struct 
arm_smmu_device *smmu)
struct arm_smmu_cmdq_ent ent = {
.opcode = CMDQ_OP_CMD_SYNC,
.sync   = {
-   .msidata = atomic_inc_return_relaxed(>sync_nr),
.msiaddr = virt_to_phys(>sync_count),
},
};
 
-   arm_smmu_cmdq_build_cmd(cmd, );
-
spin_lock_irqsave(>cmdq.lock, flags);
+   ent.sync.msidata = ++smmu->sync_nr;
+   arm_smmu_cmdq_build_cmd(cmd, );
arm_smmu_cmdq_insert_cmd(smmu, cmd);
spin_unlock_irqrestore(>cmdq.lock, flags);
 
@@ -2196,7 +2195,6 @@ static int arm_smmu_init_structures(struct 
arm_smmu_device *smmu)
 {
int ret;
 
-   atomic_set(>sync_nr, 0);
ret = arm_smmu_init_queues(smmu);
if (ret)
return ret;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 021/209] iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()

2019-11-12 Thread Sasha Levin
From: Robin Murphy 

[ Upstream commit 85c7a0f1ef624ef58173ef52ea77780257bdfe04 ]

In removing the pagetable-wide lock, we gained the possibility of the
vanishingly unlikely case where we have a race between two concurrent
unmappers splitting the same block entry. The logic to handle this is
fairly straightforward - whoever loses the race frees their partial
next-level table and instead dereferences the winner's newly-installed
entry in order to fall back to a regular unmap, which intentionally
echoes the pre-existing case of recursively splitting a 1GB block down
to 4KB pages by installing a full table of 2MB blocks first.

Unfortunately, the chump who implemented that logic failed to update the
condition check for that fallback, meaning that if said race occurs at
the last level (where the loser's unmap_idx is valid) then the unmap
won't actually happen. Fix that to properly account for both the race
and recursive cases.

Fixes: 2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")
Signed-off-by: Robin Murphy 
[will: re-jig control flow to avoid duplicate cmpxchg test]
Signed-off-by: Will Deacon 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/io-pgtable-arm.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 88641b4560bc8..2f79efd16a052 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -574,13 +574,12 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
return 0;
 
tablep = iopte_deref(pte, data);
+   } else if (unmap_idx >= 0) {
+   io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
+   return size;
}
 
-   if (unmap_idx < 0)
-   return __arm_lpae_unmap(data, iova, size, lvl, tablep);
-
-   io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
-   return size;
+   return __arm_lpae_unmap(data, iova, size, lvl, tablep);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-12 Thread Huang Adrian
On Wed, Nov 13, 2019 at 12:05 AM Joerg Roedel  wrote:
>
> On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote:
> > > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel  wrote:
>
> > > So there are a couple of options here:
> > >
> > > 1) Bail out and disable the IOMMU as the BIOS screwed up
> > >
> > > 2) Treat per-device exclusion ranges just as r/w unity-mapped
> > >regions.
> > >
> > >
> > > I think option 2) is the best fix here.
> >
> > Yes. This is what this patch does (The first exclusion region still
> > uses the exclusion range while the remaining exclusion regions are
> > modified to be r/w unity-mapped regions when detecting multiple
> > exclusion regions) .
>
> Yeah, but I think it is better to just stop using the exclusion-range
> feature of the hardware (because it meand BIOSes are correct) and just
> treat every exclusion range (also the first one) as an r/w unity range.

Okay, I see. If you're okay with this (treat per-device exclusion
ranges as r/w unity-mapped regions - including the first one), I can
prepare the patch.

>
> > But, I'm guessing you're talking about that BIOS has to define r/w
> > unity-mapped regions instead of the per-device exclusions (Fix from
> > BIOS, not prevent the failure from kernel). Right?
>
> That would be best, but I fear this is too much to wish for.

Totally agree. That's why I submit this patch. :-)

-- Adrian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Bug 205201 - overflow of DMA mask and bus mask

2019-11-12 Thread Christian Zigotzky

On 12 November 2019 at 3:41 pm, Christoph Hellwig wrote:

On Mon, Nov 11, 2019 at 01:22:55PM +0100, Christian Zigotzky wrote:

Now, I can definitely say that this patch does not solve the issue.

Do you have another patch for testing or shall I bisect?

I'm still interested in the .config and dmesg.  Especially if the
board is using arch/powerpc/sysdev/fsl_pci.c, which is the only
place inside the powerpc arch code doing funny things with the
bus_dma_mask, which Robin pointed out looks fishy.


Here you are:

.config: https://bugzilla.kernel.org/attachment.cgi?id=285815

dmesg: https://bugzilla.kernel.org/attachment.cgi?id=285813

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2019-11-12 Thread Auger Eric
Hi Shameer,

On 11/12/19 6:56 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Shameerali Kolothum Thodi
>> Sent: 12 November 2019 14:21
>> To: 'Auger Eric' ; eric.auger@gmail.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
>> robin.mur...@arm.com
>> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
>> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm
>> ; xuwei (O) 
>> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>
> [...]
>> I am trying to get this running on one of our platform that has smmuv3
>> dual
>> stage support. I am seeing some issues with this when an ixgbe vf dev is
>> made pass-through and is behind a vSMMUv3 in Guest.
>>
>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
>>
>> And this is my Qemu cmd line,
>>
>> ./qemu-system-aarch64
>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host
>> \
>> -kernel Image \
>> -drive if=none,file=ubuntu,id=fs \
>> -device virtio-blk-device,drive=fs \
>> -device vfio-pci,host=:01:10.1 \
>> -bios QEMU_EFI.fd \
>> -net none \
>> -m 4G \
>> -nographic -D -d -enable-kvm \
>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
>>
>> The basic ping from Guest works fine,
>> root@ubuntu:~# ping 10.202.225.185
>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
>> ...
>>
>> But if I increase ping packet size,
>>
>> root@ubuntu:~# ping -s 1024 10.202.225.185
>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
>>
>> And from Host kernel I get,
>> [  819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets
>> detected
>> [  824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>> [  828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>> [  830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets
>> detected
>> [  832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>> [  834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets
>> detected
>>
>> Also noted that iperf cannot work as it fails to establish the connection
>>> with
> iperf
>> server.
>>
>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> reference.
>> I haven't debugged this further yet and thought of checking with you if
>> this
>>> is
>> something you have seen already or not. Or maybe I am missing
>> something
> here?
>
> Please can you try to edit and modify hw/vfio/common.c, function
> vfio_iommu_unmap_notify
>
>
> /*
> if (size <= 0x1) {
> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
> ustruct.info.addr_info.flags =
>>> IOMMU_INV_ADDR_FLAGS_ARCHID;
> if (iotlb->leaf) {
> ustruct.info.addr_info.flags |=
> IOMMU_INV_ADDR_FLAGS_LEAF;
> }
> ustruct.info.addr_info.archid = iotlb->arch_id;
> ustruct.info.addr_info.addr = start;
> ustruct.info.addr_info.granule_size = size;
> ustruct.info.addr_info.nb_granules = 1;
> trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
> iotlb->leaf);
> } else {
> */
> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
> ustruct.info.pasid_info.archid = iotlb->arch_id;
> ustruct.info.pasid_info.flags =
>>> IOMMU_INV_PASID_FLAGS_ARCHID;
> trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
> //}
>
> This modification leads to invalidate the whole asid each time we get a
> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
> saw this was the cause of such kind of issues. Please let me know if it
> fixes your perf issues

 Yes, this seems to fix the 

RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2019-11-12 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 12 November 2019 14:21
> To: 'Auger Eric' ; eric.auger@gmail.com;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> robin.mur...@arm.com
> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm
> ; xuwei (O) 
> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
[...]
> > >>> I am trying to get this running on one of our platform that has smmuv3
> dual
> > >>> stage support. I am seeing some issues with this when an ixgbe vf dev is
> > >>> made pass-through and is behind a vSMMUv3 in Guest.
> > >>>
> > >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> > >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> > >>>
> > >>> And this is my Qemu cmd line,
> > >>>
> > >>> ./qemu-system-aarch64
> > >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host
> \
> > >>> -kernel Image \
> > >>> -drive if=none,file=ubuntu,id=fs \
> > >>> -device virtio-blk-device,drive=fs \
> > >>> -device vfio-pci,host=:01:10.1 \
> > >>> -bios QEMU_EFI.fd \
> > >>> -net none \
> > >>> -m 4G \
> > >>> -nographic -D -d -enable-kvm \
> > >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> > >>>
> > >>> The basic ping from Guest works fine,
> > >>> root@ubuntu:~# ping 10.202.225.185
> > >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> > >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> > >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> > >>> ...
> > >>>
> > >>> But if I increase ping packet size,
> > >>>
> > >>> root@ubuntu:~# ping -s 1024 10.202.225.185
> > >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> > >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> > >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> > >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> > >>>
> > >>> And from Host kernel I get,
> > >>> [  819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets
> detected
> > >>> [  824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [  828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [  830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets
> detected
> > >>> [  832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [  834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets
> detected
> > >>>
> > >>> Also noted that iperf cannot work as it fails to establish the 
> > >>> connection
> > with
> > >> iperf
> > >>> server.
> > >>>
> > >>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> > >> reference.
> > >>> I haven't debugged this further yet and thought of checking with you if
> this
> > is
> > >>> something you have seen already or not. Or maybe I am missing
> something
> > >> here?
> > >>
> > >> Please can you try to edit and modify hw/vfio/common.c, function
> > >> vfio_iommu_unmap_notify
> > >>
> > >>
> > >> /*
> > >> if (size <= 0x1) {
> > >> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> > >> ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
> > >> ustruct.info.addr_info.flags =
> > IOMMU_INV_ADDR_FLAGS_ARCHID;
> > >> if (iotlb->leaf) {
> > >> ustruct.info.addr_info.flags |=
> > >> IOMMU_INV_ADDR_FLAGS_LEAF;
> > >> }
> > >> ustruct.info.addr_info.archid = iotlb->arch_id;
> > >> ustruct.info.addr_info.addr = start;
> > >> ustruct.info.addr_info.granule_size = size;
> > >> ustruct.info.addr_info.nb_granules = 1;
> > >> trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
> > >> iotlb->leaf);
> > >> } else {
> > >> */
> > >> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> > >> ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
> > >> ustruct.info.pasid_info.archid = iotlb->arch_id;
> > >> ustruct.info.pasid_info.flags =
> > IOMMU_INV_PASID_FLAGS_ARCHID;
> > >> trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
> > >> //}
> > >>
> > >> This modification leads to invalidate the whole asid each time we get a
> > >> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
> > >> saw this was the cause of such kind of issues. Please let me know if it
> > >> fixes your perf issues
> > >
> > > Yes, this seems to fix 

Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to host

2019-11-12 Thread Alex Williamson
On Tue, 12 Nov 2019 11:21:40 +
"Liu, Yi L"  wrote:

> > From: Alex Williamson < alex.william...@redhat.com >
> > Sent: Friday, November 8, 2019 7:21 AM
> > To: Liu, Yi L 
> > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) 
> > to host
> > 
> > On Thu, 24 Oct 2019 08:26:23 -0400
> > Liu Yi L  wrote:
> >   
> > > This patch adds vfio support to bind guest translation structure
> > > to host iommu. VFIO exposes iommu programming capability to user-
> > > space. Guest is a user-space application in host under KVM solution.
> > > For SVA usage in Virtual Machine, guest owns GVA->GPA translation
> > > structure. And this part should be passdown to host to enable nested
> > > translation (or say two stage translation). This patch reuses the
> > > VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and adds new
> > > bind type for binding guest owned translation structure to host.
> > >
> > > *) Add two new ioctls for VFIO containers.
> > >
> > >   - VFIO_IOMMU_BIND: for bind request from userspace, it could be
> > >bind a process to a pasid or bind a guest pasid
> > >to a device, this is indicated by type
> > >   - VFIO_IOMMU_UNBIND: for unbind request from userspace, it could be
> > >unbind a process to a pasid or unbind a guest pasid
> > >to a device, also indicated by type
> > >   - Bind type:
> > >   VFIO_IOMMU_BIND_PROCESS: user-space request to bind a process
> > >to a device
> > >   VFIO_IOMMU_BIND_GUEST_PASID: bind guest owned translation
> > >structure to host iommu. e.g. guest page table
> > >
> > > *) Code logic in vfio_iommu_type1_ioctl() to handle VFIO_IOMMU_BIND/UNBIND
> > >
> > > Cc: Kevin Tian 
> > > Signed-off-by: Jean-Philippe Brucker 
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 136  
> >   
> > >  include/uapi/linux/vfio.h   |  44 +
> > >  2 files changed, 180 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 3d73a7d..1a27e25 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -2325,6 +2325,104 @@ static int vfio_iommu_type1_pasid_free(struct  
> > vfio_iommu *iommu,  
> > >   return ret;
> > >  }
> > >
> > > +static int vfio_bind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > + struct domain_capsule *dc = (struct domain_capsule *)data;
> > > + struct iommu_gpasid_bind_data *ustruct =
> > > + (struct iommu_gpasid_bind_data *) dc->data;
> > > +
> > > + return iommu_sva_bind_gpasid(dc->domain, dev, ustruct);
> > > +}
> > > +
> > > +static int vfio_unbind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > + struct domain_capsule *dc = (struct domain_capsule *)data;
> > > + struct iommu_gpasid_bind_data *ustruct =
> > > + (struct iommu_gpasid_bind_data *) dc->data;
> > > +
> > > + return iommu_sva_unbind_gpasid(dc->domain, dev,
> > > + ustruct->hpasid);
> > > +}
> > > +
> > > +/*
> > > + * unbind specific gpasid, caller of this function requires hold
> > > + * vfio_iommu->lock
> > > + */
> > > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
> > > +   struct iommu_gpasid_bind_data *gbind_data)
> > > +{
> > > + return vfio_iommu_lookup_dev(iommu, vfio_unbind_gpasid_fn, gbind_data);
> > > +}
> > > +
> > > +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu,
> > > + void __user *arg,
> > > + struct vfio_iommu_type1_bind *bind)
> > > +{
> > > + struct iommu_gpasid_bind_data gbind_data;
> > > + unsigned long minsz;
> > > + int ret = 0;
> > > +
> > > + minsz = sizeof(*bind) + sizeof(gbind_data);
> > > + if (bind->argsz < minsz)
> > > + return -EINVAL;
> > > +
> > > + if (copy_from_user(_data, arg, sizeof(gbind_data)))
> > > + return -EFAULT;
> > > +
> > > + mutex_lock(>lock);
> > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > > + ret = -EINVAL;
> > > + goto out_unlock;
> > > + }
> > > +
> > > + ret = vfio_iommu_lookup_dev(iommu, vfio_bind_gpasid_fn, _data);
> > > + /*
> > > +  * If bind failed, it may not be a total failure. Some devices within
> > > +  * the iommu group may have bind successfully. Although we don't enable
> > > +  * pasid capability for non-singletion iommu groups, a unbind operation
> > > +  * would be helpful to ensure no partial binding for an iommu group.
> > > +  */
> > > + if (ret)
> > > + /*
> > > +  * Undo all binds that already succeeded, no need to check the
> > > +  * return value here since some device within the group has no
> > > +  * successful bind when coming to this place switch.
> > > +  */
> > > +   

Re: [PATCH] iommu/rockchip: Don't provoke WARN for harmless IRQs

2019-11-12 Thread Joerg Roedel
On Mon, Nov 11, 2019 at 06:55:18PM +, Robin Murphy wrote:
> Although we don't generally expect IRQs to fire for a suspended IOMMU,
> there are certain situations (particularly with debug options) where
> we might legitimately end up with the pm_runtime_get_if_in_use() call
> from rk_iommu_irq() returning 0. Since this doesn't represent an actual
> error, follow the other parts of the driver and save the WARN_ON()
> condition for a genuine negative value. Even if we do have spurious
> IRQs due to a wedged VOP asserting the shared line, it's not this
> driver's job to try to second-guess the IRQ core to warn about that.
> 
> Reported-by: Vasily Khoruzhick 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/rockchip-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-12 Thread Joerg Roedel
On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote:
> > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel  wrote:

> > So there are a couple of options here:
> >
> > 1) Bail out and disable the IOMMU as the BIOS screwed up
> >
> > 2) Treat per-device exclusion ranges just as r/w unity-mapped
> >regions.
> >
> >
> > I think option 2) is the best fix here.
> 
> Yes. This is what this patch does (The first exclusion region still
> uses the exclusion range while the remaining exclusion regions are
> modified to be r/w unity-mapped regions when detecting multiple
> exclusion regions) .

Yeah, but I think it is better to just stop using the exclusion-range
feature of the hardware (because it meand BIOSes are correct) and just
treat every exclusion range (also the first one) as an r/w unity range.

> But, I'm guessing you're talking about that BIOS has to define r/w
> unity-mapped regions instead of the per-device exclusions (Fix from
> BIOS, not prevent the failure from kernel). Right?

That would be best, but I fear this is too much to wish for.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/6] Raspberry Pi 4 PCIe support

2019-11-12 Thread Nicolas Saenz Julienne
This series aims at providing support for Raspberry Pi 4's PCIe
controller, which is also shared with the Broadcom STB family of
devices.

There was a previous attempt to upstream this some years ago[1] but was
blocked as most STB PCIe integrations have a sparse DMA mapping[2] which
is something currently not supported by the kernel.  Luckily this is not
the case for the Raspberry Pi 4.

Note that the driver code is to be based on top of Rob Herring's series
simplifying inbound and outbound range parsing.

[1] https://patchwork.kernel.org/cover/10605933/
[2] https://patchwork.kernel.org/patch/10605957/

---

Changes since v1:
  - add generic rounddown/roundup_pow_two64() patch
  - Add MAINTAINERS patch
  - Fix Kconfig
  - Cleanup probe, use up to date APIs, exit on MSI failure
  - Get rid of linux,pci-domain and other unused constructs
  - Use edge triggered setup for MSI
  - Cleanup MSI implementation
  - Fix multiple cosmetic issues
  - Remove supend/resume code

Jim Quinlan (3):
  dt-bindings: PCI: Add bindings for brcmstb's PCIe device
  PCI: brcmstb: add Broadcom STB PCIe host controller driver
  PCI: brcmstb: add MSI capability

Nicolas Saenz Julienne (3):
  linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
  ARM: dts: bcm2711: Enable PCIe controller
  MAINTAINERS: Add brcmstb PCIe controller

 .../bindings/pci/brcm,stb-pcie.yaml   |  110 ++
 MAINTAINERS   |4 +
 arch/arm/boot/dts/bcm2711.dtsi|   46 +
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |3 +-
 drivers/pci/controller/Kconfig|9 +
 drivers/pci/controller/Makefile   |1 +
 drivers/pci/controller/pcie-brcmstb.c | 1179 +
 drivers/pci/controller/pcie-cadence-ep.c  |7 +-
 drivers/pci/controller/pcie-cadence.c |7 +-
 drivers/pci/controller/pcie-rockchip-ep.c |9 +-
 include/linux/log2.h  |   52 +
 kernel/dma/direct.c   |3 +-
 12 files changed, 1412 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
 create mode 100644 drivers/pci/controller/pcie-brcmstb.c

-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/6] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-12 Thread Nicolas Saenz Julienne
Some users need to make sure their rounding function accepts and returns
64bit long variables regardless of the architecture. Sadly
roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
new generic 64bit variant of the function and cleanup rougue custom
implementations.

Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |  3 +-
 drivers/pci/controller/pcie-cadence-ep.c  |  7 +--
 drivers/pci/controller/pcie-cadence.c |  7 +--
 drivers/pci/controller/pcie-rockchip-ep.c |  9 ++--
 include/linux/log2.h  | 52 +++
 kernel/dma/direct.c   |  3 +-
 6 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 024788549c25..027bd72505e2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -33,6 +33,7 @@
 
 #include 
 #include 
+#include 
 
 #include "mlx4_en.h"
 
@@ -252,7 +253,7 @@ static u32 freq_to_shift(u16 freq)
 {
u32 freq_khz = freq * 1000;
u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC;
-   u64 max_val_cycles_rounded = 1ULL << fls64(max_val_cycles - 1);
+   u64 max_val_cycles_rounded = roundup_pow_of_two64(max_val_cycles);
/* calculate max possible multiplier in order to fit in 64bit */
u64 max_mul = div64_u64(ULLONG_MAX, max_val_cycles_rounded);
 
diff --git a/drivers/pci/controller/pcie-cadence-ep.c 
b/drivers/pci/controller/pcie-cadence-ep.c
index def7820cb824..26ff424b16f5 100644
--- a/drivers/pci/controller/pcie-cadence-ep.c
+++ b/drivers/pci/controller/pcie-cadence-ep.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -90,11 +91,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
 
/* BAR size is 2^(aperture + 7) */
sz = max_t(size_t, epf_bar->size, CDNS_PCIE_EP_MIN_APERTURE);
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   sz = 1ULL << fls64(sz - 1);
+   sz = roundup_pow_of_two64(sz);
aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/drivers/pci/controller/pcie-cadence.c 
b/drivers/pci/controller/pcie-cadence.c
index cd795f6fc1e2..b2278e6b955c 100644
--- a/drivers/pci/controller/pcie-cadence.c
+++ b/drivers/pci/controller/pcie-cadence.c
@@ -4,6 +4,7 @@
 // Author: Cyrille Pitchen 
 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -11,11 +12,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, 
u8 fn,
   u32 r, bool is_io,
   u64 cpu_addr, u64 pci_addr, size_t size)
 {
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   u64 sz = 1ULL << fls64(size - 1);
+   u64 sz = roundup_pow_of_two64(size);
int nbits = ilog2(sz);
u32 addr0, addr1, desc0, desc1;
 
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c 
b/drivers/pci/controller/pcie-rockchip-ep.c
index d743b0a48988..ed50aaf27784 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pcie-rockchip.h"
 
@@ -70,7 +71,7 @@ static void rockchip_pcie_prog_ep_ob_atu(struct rockchip_pcie 
*rockchip, u8 fn,
 u32 r, u32 type, u64 cpu_addr,
 u64 pci_addr, size_t size)
 {
-   u64 sz = 1ULL << fls64(size - 1);
+   u64 sz = roundup_pow_of_two64(size);
int num_pass_bits = ilog2(sz);
u32 addr0, addr1, desc0, desc1;
bool is_nor_msg = (type == AXI_WRAPPER_NOR_MSG);
@@ -172,11 +173,7 @@ static int rockchip_pcie_ep_set_bar(struct pci_epc *epc, 
u8 fn,
/* BAR size is 2^(aperture + 7) */
sz = max_t(size_t, epf_bar->size, MIN_EP_APERTURE);
 
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   sz = 1ULL << fls64(sz - 1);
+   sz = roundup_pow_of_two64(sz);
aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/include/linux/log2.h b/include/linux/log2.h
index 83a4a3ca3e8a..db12d92ab6eb 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -67,6 +67,24 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
return 1UL << (fls_long(n) - 1);
 }
 
+/**
+ * __roundup_pow_of_two64() - round 64bit value up to nearest power of two
+ * @n: value to round up
+ */
+static inline 

Re: Bug 205201 - overflow of DMA mask and bus mask

2019-11-12 Thread Christoph Hellwig
On Mon, Nov 11, 2019 at 01:22:55PM +0100, Christian Zigotzky wrote:
> Now, I can definitely say that this patch does not solve the issue.
>
> Do you have another patch for testing or shall I bisect?

I'm still interested in the .config and dmesg.  Especially if the
board is using arch/powerpc/sysdev/fsl_pci.c, which is the only
place inside the powerpc arch code doing funny things with the
bus_dma_mask, which Robin pointed out looks fishy.

>
> Thanks,
> Christian
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2019-11-12 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 12 November 2019 13:22
> To: Shameerali Kolothum Thodi ;
> eric.auger@gmail.com; iommu@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
> kvm...@lists.cs.columbia.edu; j...@8bytes.org;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> robin.mur...@arm.com
> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm
> ; xuwei (O) 
> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
> Hi Shameer,
> 
> On 11/12/19 2:06 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Auger Eric [mailto:eric.au...@redhat.com]
> >> Sent: 12 November 2019 11:29
> >> To: Shameerali Kolothum Thodi ;
> >> eric.auger@gmail.com; iommu@lists.linux-foundation.org;
> >> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
> >> kvm...@lists.cs.columbia.edu; j...@8bytes.org;
> >> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> >> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> >> robin.mur...@arm.com
> >> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
> >> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm
> >> ; xuwei (O) 
> >> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >>
> >> Hi Shameer,
> >> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
> >>> Hi Eric,
> >>>
>  -Original Message-
>  From: kvmarm-boun...@lists.cs.columbia.edu
>  [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger
>  Sent: 11 July 2019 14:56
>  To: eric.auger@gmail.com; eric.au...@redhat.com;
>  iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>  k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
>  alex.william...@redhat.com; jacob.jun@linux.intel.com;
>  yi.l@intel.com; jean-philippe.bruc...@arm.com;
> will.dea...@arm.com;
>  robin.mur...@arm.com
>  Cc: kevin.t...@intel.com; vincent.ste...@arm.com;
> ashok@intel.com;
>  marc.zyng...@arm.com; tina.zh...@intel.com
>  Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
>  This series brings the VFIO part of HW nested paging support
>  in the SMMUv3.
> 
>  The series depends on:
>  [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
>  (https://www.spinics.net/lists/kernel/msg3187714.html)
> 
>  3 new IOCTLs are introduced that allow the userspace to
>  1) pass the guest stage 1 configuration
>  2) pass stage 1 MSI bindings
>  3) invalidate stage 1 related caches
> 
>  They map onto the related new IOMMU API functions.
> 
>  We introduce the capability to register specific interrupt
>  indexes (see [1]). A new DMA_FAULT interrupt index allows to register
>  an eventfd to be signaled whenever a stage 1 related fault
>  is detected at physical level. Also a specific region allows
>  to expose the fault records to the user space.
> >>>
> >>> I am trying to get this running on one of our platform that has smmuv3 
> >>> dual
> >>> stage support. I am seeing some issues with this when an ixgbe vf dev is
> >>> made pass-through and is behind a vSMMUv3 in Guest.
> >>>
> >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> >>>
> >>> And this is my Qemu cmd line,
> >>>
> >>> ./qemu-system-aarch64
> >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
> >>> -kernel Image \
> >>> -drive if=none,file=ubuntu,id=fs \
> >>> -device virtio-blk-device,drive=fs \
> >>> -device vfio-pci,host=:01:10.1 \
> >>> -bios QEMU_EFI.fd \
> >>> -net none \
> >>> -m 4G \
> >>> -nographic -D -d -enable-kvm \
> >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> >>>
> >>> The basic ping from Guest works fine,
> >>> root@ubuntu:~# ping 10.202.225.185
> >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> >>> ...
> >>>
> >>> But if I increase ping packet size,
> >>>
> >>> root@ubuntu:~# ping -s 1024 10.202.225.185
> >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> >>>
> >>> And from Host kernel 

Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2019-11-12 Thread Auger Eric
Hi Shameer,

On 11/12/19 2:06 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 12 November 2019 11:29
>> To: Shameerali Kolothum Thodi ;
>> eric.auger@gmail.com; iommu@lists.linux-foundation.org;
>> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
>> kvm...@lists.cs.columbia.edu; j...@8bytes.org;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
>> robin.mur...@arm.com
>> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
>> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm
>> ; xuwei (O) 
>> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>
>> Hi Shameer,
>> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
>>> Hi Eric,
>>>
 -Original Message-
 From: kvmarm-boun...@lists.cs.columbia.edu
 [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger
 Sent: 11 July 2019 14:56
 To: eric.auger@gmail.com; eric.au...@redhat.com;
 iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
 k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
 alex.william...@redhat.com; jacob.jun@linux.intel.com;
 yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
 robin.mur...@arm.com
 Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
 marc.zyng...@arm.com; tina.zh...@intel.com
 Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

 This series brings the VFIO part of HW nested paging support
 in the SMMUv3.

 The series depends on:
 [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
 (https://www.spinics.net/lists/kernel/msg3187714.html)

 3 new IOCTLs are introduced that allow the userspace to
 1) pass the guest stage 1 configuration
 2) pass stage 1 MSI bindings
 3) invalidate stage 1 related caches

 They map onto the related new IOMMU API functions.

 We introduce the capability to register specific interrupt
 indexes (see [1]). A new DMA_FAULT interrupt index allows to register
 an eventfd to be signaled whenever a stage 1 related fault
 is detected at physical level. Also a specific region allows
 to expose the fault records to the user space.
>>>
>>> I am trying to get this running on one of our platform that has smmuv3 dual
>>> stage support. I am seeing some issues with this when an ixgbe vf dev is
>>> made pass-through and is behind a vSMMUv3 in Guest.
>>>
>>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
>>>
>>> And this is my Qemu cmd line,
>>>
>>> ./qemu-system-aarch64
>>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
>>> -kernel Image \
>>> -drive if=none,file=ubuntu,id=fs \
>>> -device virtio-blk-device,drive=fs \
>>> -device vfio-pci,host=:01:10.1 \
>>> -bios QEMU_EFI.fd \
>>> -net none \
>>> -m 4G \
>>> -nographic -D -d -enable-kvm \
>>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
>>>
>>> The basic ping from Guest works fine,
>>> root@ubuntu:~# ping 10.202.225.185
>>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
>>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
>>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
>>> ...
>>>
>>> But if I increase ping packet size,
>>>
>>> root@ubuntu:~# ping -s 1024 10.202.225.185
>>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
>>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
>>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
>>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
>>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
>>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
>>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
>>>
>>> And from Host kernel I get,
>>> [  819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected
>>> [  824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
>>> [  828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
>>> [  830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected
>>> [  832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
>>> [  834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected
>>>
>>> Also noted that iperf cannot work as it fails to establish the connection 
>>> with
>> iperf
>>> server.
>>>
>>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
>> reference.
>>> I haven't debugged this further yet and thought of checking with you if 
>>> this is
>>> something you have seen already or not. Or maybe I am missing something
>> here?
>>
>> Please can you try to edit and modify 

RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2019-11-12 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 12 November 2019 11:29
> To: Shameerali Kolothum Thodi ;
> eric.auger@gmail.com; iommu@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
> kvm...@lists.cs.columbia.edu; j...@8bytes.org;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> robin.mur...@arm.com
> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm
> ; xuwei (O) 
> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
> Hi Shameer,
> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: kvmarm-boun...@lists.cs.columbia.edu
> >> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger
> >> Sent: 11 July 2019 14:56
> >> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
> >> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> >> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> >> robin.mur...@arm.com
> >> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
> >> marc.zyng...@arm.com; tina.zh...@intel.com
> >> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >>
> >> This series brings the VFIO part of HW nested paging support
> >> in the SMMUv3.
> >>
> >> The series depends on:
> >> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
> >> (https://www.spinics.net/lists/kernel/msg3187714.html)
> >>
> >> 3 new IOCTLs are introduced that allow the userspace to
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >> 3) invalidate stage 1 related caches
> >>
> >> They map onto the related new IOMMU API functions.
> >>
> >> We introduce the capability to register specific interrupt
> >> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
> >> an eventfd to be signaled whenever a stage 1 related fault
> >> is detected at physical level. Also a specific region allows
> >> to expose the fault records to the user space.
> >
> > I am trying to get this running on one of our platform that has smmuv3 dual
> > stage support. I am seeing some issues with this when an ixgbe vf dev is
> > made pass-through and is behind a vSMMUv3 in Guest.
> >
> > Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> > Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> >
> > And this is my Qemu cmd line,
> >
> > ./qemu-system-aarch64
> > -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
> > -kernel Image \
> > -drive if=none,file=ubuntu,id=fs \
> > -device virtio-blk-device,drive=fs \
> > -device vfio-pci,host=:01:10.1 \
> > -bios QEMU_EFI.fd \
> > -net none \
> > -m 4G \
> > -nographic -D -d -enable-kvm \
> > -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> >
> > The basic ping from Guest works fine,
> > root@ubuntu:~# ping 10.202.225.185
> > PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> > 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> > 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> > ...
> >
> > But if I increase ping packet size,
> >
> > root@ubuntu:~# ping -s 1024 10.202.225.185
> > PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> > 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> > 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> > From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> > From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> > From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> > From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> >
> > And from Host kernel I get,
> > [  819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected
> > [  824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
> > [  828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
> > [  830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected
> > [  832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
> > [  834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected
> >
> > Also noted that iperf cannot work as it fails to establish the connection 
> > with
> iperf
> > server.
> >
> > Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> reference.
> > I haven't debugged this further yet and thought of checking with you if 
> > this is
> > something you have seen already or not. Or maybe I am missing something
> here?
> 
> Please can you try to edit and modify hw/vfio/common.c, function
> vfio_iommu_unmap_notify
> 
> 
> /*
> if (size <= 0x1) {
> 

Re: [PATCH] iommu/rockchip: Don't provoke WARN for harmless IRQs

2019-11-12 Thread Marc Zyngier

On 2019-11-11 20:04, Robin Murphy wrote:
Although we don't generally expect IRQs to fire for a suspended 
IOMMU,

there are certain situations (particularly with debug options) where
we might legitimately end up with the pm_runtime_get_if_in_use() call
from rk_iommu_irq() returning 0. Since this doesn't represent an 
actual

error, follow the other parts of the driver and save the WARN_ON()
condition for a genuine negative value. Even if we do have spurious
IRQs due to a wedged VOP asserting the shared line, it's not this
driver's job to try to second-guess the IRQ core to warn about that.

Reported-by: Vasily Khoruzhick 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/rockchip-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/rockchip-iommu.c 
b/drivers/iommu/rockchip-iommu.c

index 4dcbf68dfda4..bd7e9b1e40ac 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -527,7 +527,7 @@ static irqreturn_t rk_iommu_irq(int irq, void 
*dev_id)

int i, err;

err = pm_runtime_get_if_in_use(iommu->dev);
-   if (WARN_ON_ONCE(err <= 0))
+   if (!err || WARN_ON_ONCE(err < 0))
return ret;

if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)))


Acked-by: Marc Zyngier 

M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2019-11-12 Thread Auger Eric
Hi Shameer,
On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: kvmarm-boun...@lists.cs.columbia.edu
>> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger
>> Sent: 11 July 2019 14:56
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
>> robin.mur...@arm.com
>> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
>> marc.zyng...@arm.com; tina.zh...@intel.com
>> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>
>> This series brings the VFIO part of HW nested paging support
>> in the SMMUv3.
>>
>> The series depends on:
>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
>> (https://www.spinics.net/lists/kernel/msg3187714.html)
>>
>> 3 new IOCTLs are introduced that allow the userspace to
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>> 3) invalidate stage 1 related caches
>>
>> They map onto the related new IOMMU API functions.
>>
>> We introduce the capability to register specific interrupt
>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
>> an eventfd to be signaled whenever a stage 1 related fault
>> is detected at physical level. Also a specific region allows
>> to expose the fault records to the user space.
> 
> I am trying to get this running on one of our platform that has smmuv3 dual
> stage support. I am seeing some issues with this when an ixgbe vf dev is 
> made pass-through and is behind a vSMMUv3 in Guest.
> 
> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> 
> And this is my Qemu cmd line,
> 
> ./qemu-system-aarch64
> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
> -kernel Image \
> -drive if=none,file=ubuntu,id=fs \
> -device virtio-blk-device,drive=fs \
> -device vfio-pci,host=:01:10.1 \
> -bios QEMU_EFI.fd \
> -net none \
> -m 4G \
> -nographic -D -d -enable-kvm \
> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> 
> The basic ping from Guest works fine,
> root@ubuntu:~# ping 10.202.225.185
> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> ...
> 
> But if I increase ping packet size, 
> 
> root@ubuntu:~# ping -s 1024 10.202.225.185
> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> 
> And from Host kernel I get,
> [  819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected
> [  824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
> [  828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
> [  830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected
> [  832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
> [  834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected
> 
> Also noted that iperf cannot work as it fails to establish the connection 
> with iperf
> server. 
> 
> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your 
> reference.
> I haven't debugged this further yet and thought of checking with you if this 
> is
> something you have seen already or not. Or maybe I am missing something here?

Please can you try to edit and modify hw/vfio/common.c, function
vfio_iommu_unmap_notify


/*
if (size <= 0x1) {
ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID;
if (iotlb->leaf) {
ustruct.info.addr_info.flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
}
ustruct.info.addr_info.archid = iotlb->arch_id;
ustruct.info.addr_info.addr = start;
ustruct.info.addr_info.granule_size = size;
ustruct.info.addr_info.nb_granules = 1;
trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
iotlb->leaf);
} else {
*/
ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
ustruct.info.pasid_info.archid = iotlb->arch_id;
ustruct.info.pasid_info.flags = IOMMU_INV_PASID_FLAGS_ARCHID;

RE: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to host

2019-11-12 Thread Liu, Yi L
> From: Alex Williamson < alex.william...@redhat.com >
> Sent: Friday, November 8, 2019 7:21 AM
> To: Liu, Yi L 
> Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to 
> host
> 
> On Thu, 24 Oct 2019 08:26:23 -0400
> Liu Yi L  wrote:
> 
> > This patch adds vfio support to bind guest translation structure
> > to host iommu. VFIO exposes iommu programming capability to user-
> > space. Guest is a user-space application in host under KVM solution.
> > For SVA usage in Virtual Machine, guest owns GVA->GPA translation
> > structure. And this part should be passdown to host to enable nested
> > translation (or say two stage translation). This patch reuses the
> > VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and adds new
> > bind type for binding guest owned translation structure to host.
> >
> > *) Add two new ioctls for VFIO containers.
> >
> >   - VFIO_IOMMU_BIND: for bind request from userspace, it could be
> >bind a process to a pasid or bind a guest pasid
> >to a device, this is indicated by type
> >   - VFIO_IOMMU_UNBIND: for unbind request from userspace, it could be
> >unbind a process to a pasid or unbind a guest pasid
> >to a device, also indicated by type
> >   - Bind type:
> > VFIO_IOMMU_BIND_PROCESS: user-space request to bind a process
> >to a device
> > VFIO_IOMMU_BIND_GUEST_PASID: bind guest owned translation
> >structure to host iommu. e.g. guest page table
> >
> > *) Code logic in vfio_iommu_type1_ioctl() to handle VFIO_IOMMU_BIND/UNBIND
> >
> > Cc: Kevin Tian 
> > Signed-off-by: Jean-Philippe Brucker 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 136
> 
> >  include/uapi/linux/vfio.h   |  44 +
> >  2 files changed, 180 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 3d73a7d..1a27e25 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2325,6 +2325,104 @@ static int vfio_iommu_type1_pasid_free(struct
> vfio_iommu *iommu,
> > return ret;
> >  }
> >
> > +static int vfio_bind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > +   struct iommu_gpasid_bind_data *ustruct =
> > +   (struct iommu_gpasid_bind_data *) dc->data;
> > +
> > +   return iommu_sva_bind_gpasid(dc->domain, dev, ustruct);
> > +}
> > +
> > +static int vfio_unbind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > +   struct iommu_gpasid_bind_data *ustruct =
> > +   (struct iommu_gpasid_bind_data *) dc->data;
> > +
> > +   return iommu_sva_unbind_gpasid(dc->domain, dev,
> > +   ustruct->hpasid);
> > +}
> > +
> > +/*
> > + * unbind specific gpasid, caller of this function requires hold
> > + * vfio_iommu->lock
> > + */
> > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
> > + struct iommu_gpasid_bind_data *gbind_data)
> > +{
> > +   return vfio_iommu_lookup_dev(iommu, vfio_unbind_gpasid_fn, gbind_data);
> > +}
> > +
> > +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu,
> > +   void __user *arg,
> > +   struct vfio_iommu_type1_bind *bind)
> > +{
> > +   struct iommu_gpasid_bind_data gbind_data;
> > +   unsigned long minsz;
> > +   int ret = 0;
> > +
> > +   minsz = sizeof(*bind) + sizeof(gbind_data);
> > +   if (bind->argsz < minsz)
> > +   return -EINVAL;
> > +
> > +   if (copy_from_user(_data, arg, sizeof(gbind_data)))
> > +   return -EFAULT;
> > +
> > +   mutex_lock(>lock);
> > +   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   ret = vfio_iommu_lookup_dev(iommu, vfio_bind_gpasid_fn, _data);
> > +   /*
> > +* If bind failed, it may not be a total failure. Some devices within
> > +* the iommu group may have bind successfully. Although we don't enable
> > +* pasid capability for non-singletion iommu groups, a unbind operation
> > +* would be helpful to ensure no partial binding for an iommu group.
> > +*/
> > +   if (ret)
> > +   /*
> > +* Undo all binds that already succeeded, no need to check the
> > +* return value here since some device within the group has no
> > +* successful bind when coming to this place switch.
> > +*/
> > +   vfio_iommu_type1_do_guest_unbind(iommu, _data);
> > +
> > +out_unlock:
> > +   mutex_unlock(>lock);
> > +   return ret;
> > +}
> > +
> > +static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu,
> > +  

RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

2019-11-12 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: kvmarm-boun...@lists.cs.columbia.edu
> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger
> Sent: 11 July 2019 14:56
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> robin.mur...@arm.com
> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com;
> marc.zyng...@arm.com; tina.zh...@intel.com
> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
> This series brings the VFIO part of HW nested paging support
> in the SMMUv3.
> 
> The series depends on:
> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
> (https://www.spinics.net/lists/kernel/msg3187714.html)
> 
> 3 new IOCTLs are introduced that allow the userspace to
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 3) invalidate stage 1 related caches
> 
> They map onto the related new IOMMU API functions.
> 
> We introduce the capability to register specific interrupt
> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
> an eventfd to be signaled whenever a stage 1 related fault
> is detected at physical level. Also a specific region allows
> to expose the fault records to the user space.

I am trying to get this running on one of our platform that has smmuv3 dual
stage support. I am seeing some issues with this when an ixgbe vf dev is 
made pass-through and is behind a vSMMUv3 in Guest.

Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5

And this is my Qemu cmd line,

./qemu-system-aarch64
-machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
-kernel Image \
-drive if=none,file=ubuntu,id=fs \
-device virtio-blk-device,drive=fs \
-device vfio-pci,host=:01:10.1 \
-bios QEMU_EFI.fd \
-net none \
-m 4G \
-nographic -D -d -enable-kvm \
-append "console=ttyAMA0 root=/dev/vda rw acpi=force"

The basic ping from Guest works fine,
root@ubuntu:~# ping 10.202.225.185
PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
...

But if I increase ping packet size, 

root@ubuntu:~# ping -s 1024 10.202.225.185
PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
>From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
>From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
>From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
>From 10.202.225.169 icmp_seq=69 Destination Host Unreachable

And from Host kernel I get,
[  819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected
[  824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
[  828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
[  830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected
[  832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected
[  834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected

Also noted that iperf cannot work as it fails to establish the connection with 
iperf
server. 

Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your 
reference.
I haven't debugged this further yet and thought of checking with you if this is
something you have seen already or not. Or maybe I am missing something here?

Please let me know.

Thanks,
Shameer

> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> 
> It series includes Tina's patch steming from
> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
> specific irq" plus patches originally contributed by Yi.
> 
> History:
> 
> v8 -> v9:
> - introduce specific irq framework
> - single fault region
> - iommu_unregister_device_fault_handler failure case not handled
>   yet.
> 
> v7 -> v8:
> - rebase on top of v5.2-rc1 and especially
>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
> - dynamic alloc of s1_cfg/s2_cfg
> - __arm_smmu_tlb_inv_asid/s1_range_nosync
> - check there is no HW MSI regions
> - asid invalidation using pasid extended struct (change in the uapi)
> - add s1_live/s2_live checks
> - move check about support of nested stages in domain finalise
> - fixes in error reporting according to the discussion with Robin
> - reordered the patches to have first iommu/smmuv3 patches and then
>   VFIO patches
> 
> v6 -> v7:
> - removed device handle from bind/unbind_guest_msi
> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
>   enforcement"
> 

Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

2019-11-12 Thread Auger Eric
Hi Jacob,

On 10/24/19 9:55 PM, Jacob Pan wrote:
> When Shared Virtual Address (SVA) is enabled for a guest OS via
> vIOMMU, we need to provide invalidation support at IOMMU API and driver
> level. This patch adds Intel VT-d specific function to implement
> iommu passdown invalidate API for shared virtual address.
> 
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue
> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.
> 
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-iommu.c | 170 
> 
>  1 file changed, 170 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5fab32fbc4b4..a73e76d6457a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5491,6 +5491,175 @@ static void intel_iommu_aux_detach_device(struct 
> iommu_domain *domain,
>   aux_domain_remove_dev(to_dmar_domain(domain), dev);
>  }
>  
> +/*
> + * 2D array for converting and sanitizing IOMMU generic TLB granularity to
> + * VT-d granularity. Invalidation is typically included in the unmap 
> operation
> + * as a result of DMA or VFIO unmap. However, for assigned device where guest
> + * could own the first level page tables without being shadowed by QEMU. In
above sentence needs to be rephrased.
> + * this case there is no pass down unmap to the host IOMMU as a result of 
> unmap
> + * in the guest. Only invalidations are trapped and passed down.
> + * In all cases, only first level TLB invalidation (request with PASID) can 
> be
> + * passed down, therefore we do not include IOTLB granularity for request
> + * without PASID (second level).
> + *
> + * For an example, to find the VT-d granularity encoding for IOTLB
for example
> + * type and page selective granularity within PASID:
> + * X: indexed by iommu cache type
> + * Y: indexed by enum iommu_inv_granularity
> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> + *
> + * Granu_map array indicates validity of the table. 1: valid, 0: invalid
> + *
> + */
> +const static int 
> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> + /* PASID based IOTLB, support PASID selective and page selective */
I would rather use the generic terminology, ie. IOTLB invalidation
supports PASID and ADDR granularity
> + {0, 1, 1},> +   /* PASID based dev TLBs, only support all PASIDs or 
> single PASID */
Device IOLTB invalidation supports DOMAIN and PASID granularities
> + {1, 1, 0},
> + /* PASID cache */
PASID cache invalidation support DOMAIN and PASID granularity
> + {1, 1, 0}
> +};
> +
> +const static u64 
> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> + /* PASID based IOTLB */
> + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> + /* PASID based dev TLBs */
> + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> + /* PASID cache */
> + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> +};
> +
> +static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
nit: this looks a bit weird to me to manipulate an u64 here. Why not use
a int
> +{
> + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
> + !inv_type_granu_map[type][granu])
> + return -EINVAL;
> +
> + *vtd_granu = inv_type_granu_table[type][granu];> +
> + return 0;
> +}
> +
> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> +{
> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
> +
> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
> +  * IOMMU cache invalidate API passes granu_size in bytes, and number of
> +  * granu size in contiguous memory.
> +  */
> + return order_base_2(nr_pages);
> +}
> +
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> + struct device *dev, struct iommu_cache_invalidate_info 
> *inv_info)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct intel_iommu *iommu;
> + unsigned long flags;
> + int cache_type;
> + u8 bus, devfn;
> + u16 did, sid;
> + int ret = 0;
> + u64 size;
> +
> + if (!inv_info || !dmar_domain ||
> + inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + if (!dev || !dev_is_pci(dev))
> + return -ENODEV;
> +
> + iommu = device_to_iommu(dev, , );
> + if (!iommu)
> + return -ENODEV;
> +
> 

Re: [PATCH v2 0/8] iommu: Add PASID support to Arm SMMUv3

2019-11-12 Thread zhangfei



On 2019/11/8 下午11:25, Jean-Philippe Brucker wrote:

This is version 2 of the series I sent a while ago [1], adding PASID
support to the Arm SMMUv3 driver.

Changes since v1:
* Dropped the patch adding auxiliary domain support. It's an easy way to
   test PASID, by populating PASID contexts using iommu_map/unmap(), but
   I don't know if it will ever have real users.

   Since v1 I changed my testing gear, and am using the zip accelerator
   [2] instead of a software model. It only uses SVA and testing
   auxiliary domains would require additional changes that would never go
   upstream. SVA requires another 20 patches (including I/O page faults)
   that I will send later, but at least I know that this will get used.

* ioasid patch has been carried by Jacob and should be merged for v5.5 [3]

* Split patch "Add support for Substream IDs" into patches 4 and 5.

* Added IORT support (patch 3) and addressed other comments.

[1] 
https://lore.kernel.org/linux-iommu/20190610184714.6786-1-jean-philippe.bruc...@arm.com/
[2] 
https://lore.kernel.org/linux-iommu/1572331216-9503-1-git-send-email-zhangfei@linaro.org/
[3] 
https://lore.kernel.org/linux-iommu/1570045363-24856-1-git-send-email-jacob.jun@linux.intel.com/

Jean-Philippe Brucker (8):
   dt-bindings: document PASID property for IOMMU masters
   iommu/arm-smmu-v3: Support platform SSID
   ACPI/IORT: Support PASID for platform devices
   iommu/arm-smmu-v3: Prepare for SSID support
   iommu/arm-smmu-v3: Add support for Substream IDs
   iommu/arm-smmu-v3: Add second level of context descriptor table
   iommu/arm-smmu-v3: Improve add_device() error handling
   iommu/arm-smmu-v3: Add support for PCI PASID

Thanks Jean for the patch
The series tested well on Hisilicon platform KunPeng920
Tested-by: Zhangfei Gao 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

2019-11-12 Thread Auger Eric
Hi Jacob,

On 11/8/19 11:55 PM, Jacob Pan wrote:
> On Fri, 8 Nov 2019 12:30:31 +0100
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 10/24/19 9:54 PM, Jacob Pan wrote:
>>> Make use of generic IOASID code to manage PASID allocation,
>>> free, and lookup. Replace Intel specific code.
>>>
>>> Signed-off-by: Jacob Pan 
>>> ---
>>>  drivers/iommu/intel-iommu.c | 12 ++--
>>>  drivers/iommu/intel-pasid.c | 36
>>>  drivers/iommu/intel-svm.c   |
>>> 39 +++ 3 files changed, 29
>>> insertions(+), 58 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c
>>> b/drivers/iommu/intel-iommu.c index ced1d89ef977..2ea09b988a23
>>> 100644 --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct
>>> dmar_domain *domain, domain->auxd_refcnt--;
>>>  
>>> if (!domain->auxd_refcnt && domain->default_pasid > 0)
>>> -   intel_pasid_free_id(domain->default_pasid);
>>> +   ioasid_free(domain->default_pasid);
>>>  }
>>>  
>>>  static int aux_domain_add_dev(struct dmar_domain *domain,
>>> @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct
>>> dmar_domain *domain, if (domain->default_pasid <= 0) {
>>> int pasid;
>>>  
>>> -   pasid = intel_pasid_alloc_id(domain, PASID_MIN,
>>> -
>>> pci_max_pasids(to_pci_dev(dev)),
>>> -GFP_KERNEL);
>>> -   if (pasid <= 0) {
>>> +   /* No private data needed for the default pasid */
>>> +   pasid = ioasid_alloc(NULL, PASID_MIN,
>>> pci_max_pasids(to_pci_dev(dev)) - 1,
>>> +   NULL);
>>> +   if (pasid == INVALID_IOASID) {
>>> pr_err("Can't allocate default pasid\n");
>>> return -ENODEV;
>>> }
>>> @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct
>>> dmar_domain *domain, spin_unlock(>lock);
>>> spin_unlock_irqrestore(_domain_lock, flags);
>>> if (!domain->auxd_refcnt && domain->default_pasid > 0)
>>> -   intel_pasid_free_id(domain->default_pasid);
>>> +   ioasid_free(domain->default_pasid);
>>>  
>>> return ret;
>>>  }
>>> diff --git a/drivers/iommu/intel-pasid.c
>>> b/drivers/iommu/intel-pasid.c index d81e857d2b25..e79d680fe300
>>> 100644 --- a/drivers/iommu/intel-pasid.c
>>> +++ b/drivers/iommu/intel-pasid.c
>>> @@ -26,42 +26,6 @@
>>>   */
>>>  static DEFINE_SPINLOCK(pasid_lock);
>>>  u32 intel_pasid_max_id = PASID_MAX;
>>> -static DEFINE_IDR(pasid_idr);
>>> -
>>> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
>>> -{
>>> -   int ret, min, max;
>>> -
>>> -   min = max_t(int, start, PASID_MIN);
>>> -   max = min_t(int, end, intel_pasid_max_id);
>>> -
>>> -   WARN_ON(in_interrupt());
>>> -   idr_preload(gfp);
>>> -   spin_lock(_lock);
>>> -   ret = idr_alloc(_idr, ptr, min, max, GFP_ATOMIC);
>>> -   spin_unlock(_lock);
>>> -   idr_preload_end();
>>> -
>>> -   return ret;
>>> -}
>>> -
>>> -void intel_pasid_free_id(int pasid)
>>> -{
>>> -   spin_lock(_lock);
>>> -   idr_remove(_idr, pasid);
>>> -   spin_unlock(_lock);
>>> -}
>>> -
>>> -void *intel_pasid_lookup_id(int pasid)
>>> -{
>>> -   void *p;
>>> -
>>> -   spin_lock(_lock);
>>> -   p = idr_find(_idr, pasid);
>>> -   spin_unlock(_lock);
>>> -
>>> -   return p;
>>> -}
>>>  
>>>  int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
>>> *pasid) {
>>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>>> index 9b159132405d..a9a7f85a09bc 100644
>>> --- a/drivers/iommu/intel-svm.c
>>> +++ b/drivers/iommu/intel-svm.c
>>> @@ -17,6 +17,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  
>>>  #include "intel-pasid.h"
>>> @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int
>>> *pasid, int flags, struct svm_dev_ if (pasid_max >
>>> intel_pasid_max_id) pasid_max = intel_pasid_max_id;
>>>  
>>> -   /* Do not use PASID 0 in caching mode (virtualised
>>> IOMMU) */
>>> -   ret = intel_pasid_alloc_id(svm,
>>> -  !!cap_caching_mode(iommu->cap),
>>> -  pasid_max - 1,
>>> GFP_KERNEL);
>>> -   if (ret < 0) {
>>> +   /* Do not use PASID 0, reserved for RID to PASID */
>>> +   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
>>> +   pasid_max - 1, svm);  
>> pasid_max -1 is inclusive. whereas max param in intel_pasid_alloc_id()
>> is exclusive right? If you fixed an issue, you can mention it in the
>> commit message.
> yes, i should mention that. intel_pasid_alloc_id() uses IDR which is
> end exclusive. ioasid uses xarray, which is inclusive. 
>>> +   if (svm->pasid == INVALID_IOASID) {  
>>> kfree(svm);>
>>> kfree(sdev);  
>>> +   ret = ENOSPC;  
>> -ENOSPC.
>> Nit: in 2/11 vcmd_alloc_pasid returned -ENOMEM

Re: [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup

2019-11-12 Thread Auger Eric
Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> After each setup for PASID entry, related translation caches must be flushed.
> We can combine duplicated code into one function which is less error prone.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-pasid.c | 48 
> +
>  1 file changed, 18 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index e79d680fe300..ffbd416ed3b8 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -485,6 +485,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu 
> *iommu,
>   devtlb_invalidation_with_pasid(iommu, dev, pasid);
>  }
>  
> +static void pasid_flush_caches(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + int pasid, u16 did)
> +{
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> + if (cap_caching_mode(iommu->cap)) {
> + pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> + iotlb_invalidation_with_pasid(iommu, did, pasid);
> + } else {
> + iommu_flush_write_buffer(iommu);
> + }
> +}
> +
>  /*
>   * Set up the scalable mode pasid table entry for first only
>   * translation type.
> @@ -530,16 +545,7 @@ int intel_pasid_setup_first_level(struct intel_iommu 
> *iommu,
>   /* Setup Present and PASID Granular Transfer Type: */
>   pasid_set_translation_type(pte, 1);
>   pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>  
>   return 0;
>  }
> @@ -603,16 +609,7 @@ int intel_pasid_setup_second_level(struct intel_iommu 
> *iommu,
>*/
>   pasid_set_sre(pte);
>   pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>  
>   return 0;
>  }
> @@ -646,16 +643,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
> *iommu,
>*/
>   pasid_set_sre(pte);
>   pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>  
>   return 0;
>  }
> 
Reviewed-by: Eric Auger 

Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-12 Thread Huang Adrian
Hi Joerg,

> On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel  wrote:
>
> Hi Adrian,
>
> On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote:
> >   2) When set_device_exclusion_range() parses the IVMD of devce id
> >  '4200', the exclusion range of the amd_iommu struct becomes:
> >
> >   iommu->exclusion_start = 0x9F58D000;
> >   iommu->exclusion_length = 0x804;
> >
> >   3) When parsing the second IVMD (device id '4300') in
> >  set_device_exclusion_range(), the exclusion range of the
> >  amd_iommu struct becomes:
> >
> >   iommu->exclusion_start = 0x9754D000;
> >   iommu->exclusion_length = 0x804;
> >
> >  This overwrites the first IVMD configuration, which leads to
> >  the failure of the first RAID controller.
>
> Okay, I think this is clearly a BIOS bug as the BIOS should not define
> two different exclusion ranges in the IVRS table.

Thanks for the review.

Yes. I understand this is a BIOS bug. The purpose of this patch is to
prevent the failure event though the system boots with the buggy BIOS.

> So there are a couple of options here:
>
> 1) Bail out and disable the IOMMU as the BIOS screwed up
>
> 2) Treat per-device exclusion ranges just as r/w unity-mapped
>regions.
>
>
> I think option 2) is the best fix here.

Yes. This is what this patch does (The first exclusion region still
uses the exclusion range while the remaining exclusion regions are
modified to be r/w unity-mapped regions when detecting multiple
exclusion regions) .

But, I'm guessing you're talking about that BIOS has to define r/w
unity-mapped regions instead of the per-device exclusions (Fix from
BIOS, not prevent the failure from kernel). Right?

-- Adrian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu