Re: [RFC v2 1/3] vfio: VFIO_IOMMU_CACHE_INVALIDATE
Hi Yi, On 11/6/19 2:31 AM, Liu, Yi L wrote: >> From: Alex Williamson [mailto:alex.william...@redhat.com] >> Sent: Wednesday, November 6, 2019 6:42 AM >> To: Liu, Yi L >> Subject: Re: [RFC v2 1/3] vfio: VFIO_IOMMU_CACHE_INVALIDATE >> >> On Fri, 25 Oct 2019 11:20:40 + >> "Liu, Yi L" wrote: >> >>> Hi Kevin, >>> From: Tian, Kevin Sent: Friday, October 25, 2019 5:14 PM To: Liu, Yi L ; alex.william...@redhat.com; Subject: RE: [RFC v2 1/3] vfio: VFIO_IOMMU_CACHE_INVALIDATE > From: Liu, Yi L > Sent: Thursday, October 24, 2019 8:26 PM > > From: Liu Yi L > > When the guest "owns" the stage 1 translation structures, the > host IOMMU driver has no knowledge of caching structure updates > unless the guest invalidation requests are trapped and passed down to the > host. > > This patch adds the VFIO_IOMMU_CACHE_INVALIDATE ioctl with aims at > propagating guest stage1 IOMMU cache invalidations to the host. > > Cc: Kevin Tian > Signed-off-by: Liu Yi L > Signed-off-by: Eric Auger > Signed-off-by: Jacob Pan > --- > drivers/vfio/vfio_iommu_type1.c | 55 > + > include/uapi/linux/vfio.h | 13 ++ > 2 files changed, 68 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c > b/drivers/vfio/vfio_iommu_type1.c index 96fddc1d..cd8d3a5 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -124,6 +124,34 @@ struct vfio_regions { > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ > (!list_empty(>domain_list)) > > +struct domain_capsule { > + struct iommu_domain *domain; > + void *data; > +}; > + > +/* iommu->lock must be held */ > +static int > +vfio_iommu_lookup_dev(struct vfio_iommu *iommu, > + int (*fn)(struct device *dev, void *data), > + void *data) 'lookup' usually means find a device and then return. But the real purpose here is to loop all the devices within this container and then do something. Does it make more sense to be vfio_iommu_for_each_dev? >> >> +1 >> >>> yep, I can replace it. >>> > +{ > + struct domain_capsule dc = {.data = data}; > + struct vfio_domain *d; >>> [...] 2315,6 +2352,24 @@ > static long vfio_iommu_type1_ioctl(void *iommu_data, > > return copy_to_user((void __user *)arg, , minsz) ? > -EFAULT : 0; > + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) { > + struct vfio_iommu_type1_cache_invalidate ustruct; it's weird to call a variable as struct. >>> >>> Will fix it. >>> > + int ret; > + > + minsz = offsetofend(struct > vfio_iommu_type1_cache_invalidate, > + info); > + > + if (copy_from_user(, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (ustruct.argsz < minsz || ustruct.flags) > + return -EINVAL; > + > + mutex_lock(>lock); > + ret = vfio_iommu_lookup_dev(iommu, vfio_cache_inv_fn, > + ); > + mutex_unlock(>lock); > + return ret; > } > > return -ENOTTY; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 9e843a1..ccf60a2 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -794,6 +794,19 @@ struct vfio_iommu_type1_dma_unmap { > #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15) > #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) > > +/** > + * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + > 24, >> >> What's going on with these ioctl numbers? AFAICT[1] we've used up through >> VFIO_BASE + 21, this jumps to 24, the next patch skips to 27, then the last >> patch fills >> in 28 & 29. Thanks, > > Hi Alex, > > I rebase my patch to Eric's nested stage translation patches. His base also > introduced > IOCTLs. I should have made it better. I'll try to sync with Eric to serialize > the IOCTLs. > > [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger > https://lkml.org/lkml/2019/3/17/124 Feel free to choose your IOCTL numbers without taking care of my series. I will adapt to yours if my work gets unblocked at some point. Thanks Eric > > Thanks, > Yi Liu > >> Alex >> >> [1] git grep -h VFIO_BASE | grep "VFIO_BASE +" | grep -e ^#define | \ >> awk '{print $NF}' | tr -d ')' | sort -u -n > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to host
> From: Alex Williamson > Sent: Wednesday, November 13, 2019 1:26 AM > To: Liu, Yi L > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to > host > > On Tue, 12 Nov 2019 11:21:40 + > "Liu, Yi L" wrote: > > > > From: Alex Williamson < alex.william...@redhat.com > > > > Sent: Friday, November 8, 2019 7:21 AM > > > To: Liu, Yi L > > > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page > > > tables) to host > > > > > > On Thu, 24 Oct 2019 08:26:23 -0400 > > > Liu Yi L wrote: > > > > > > > This patch adds vfio support to bind guest translation structure > > > > to host iommu. VFIO exposes iommu programming capability to user- > > > > space. Guest is a user-space application in host under KVM solution. > > > > For SVA usage in Virtual Machine, guest owns GVA->GPA translation > > > > structure. And this part should be passdown to host to enable nested > > > > translation (or say two stage translation). This patch reuses the > > > > VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and adds new > > > > bind type for binding guest owned translation structure to host. > > > > > > > > *) Add two new ioctls for VFIO containers. > > > > > > > > - VFIO_IOMMU_BIND: for bind request from userspace, it could be > > > >bind a process to a pasid or bind a guest pasid > > > >to a device, this is indicated by type > > > > - VFIO_IOMMU_UNBIND: for unbind request from userspace, it could be > > > >unbind a process to a pasid or unbind a guest pasid > > > >to a device, also indicated by type > > > > - Bind type: > > > > VFIO_IOMMU_BIND_PROCESS: user-space request to bind a process > > > >to a device > > > > VFIO_IOMMU_BIND_GUEST_PASID: bind guest owned translation > > > >structure to host iommu. e.g. guest page table > > > > > > > > *) Code logic in vfio_iommu_type1_ioctl() to handle > VFIO_IOMMU_BIND/UNBIND > > > > [...] > > > > +static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu, > > > > + void __user *arg, > > > > + struct > > > > vfio_iommu_type1_bind *bind) > > > > +{ > > > > + struct iommu_gpasid_bind_data gbind_data; > > > > + unsigned long minsz; > > > > + int ret = 0; > > > > + > > > > + minsz = sizeof(*bind) + sizeof(gbind_data); > > > > + if (bind->argsz < minsz) > > > > + return -EINVAL; > > > > > > But gbind_data can change size if new vendor specific data is added to > > > the union, so kernel updates break existing userspace. Fail. > > > > yes, we have a version field in struct iommu_gpasid_bind_data. How > > about doing sanity check per versions? kernel knows the gbind_data > > size of specific versions. Does it make sense? If yes, I'll also apply it > > to the other sanity check in this series to avoid userspace fail after > > kernel update. > > Has it already been decided that the version field will be updated for > every addition to the union? No, just my proposal. Jacob may help to explain the purpose of version field. But if we may be too "frequent" for an uapi version number updating if we inc version for each change in the union part. I may vote for the second option from you below. > It seems there are two options, either > the version definition includes the possible contents of the union, > which means we need to support multiple versions concurrently in the > kernel to maintain compatibility with userspace and follow deprecation > protocols for removing that support, or we need to consider version to > be the general form of the structure and interpret the format field to > determine necessary length to copy from the user. As I mentioned above, may be better to let @version field only over the general fields and let format to cover the possible changes in union. e.g. IOMMU_PASID_FORMAT_INTEL_VTD2 may means version 2 of Intel VT-d bind. But either way, I think we need to let kernel maintain multiple versions to support compatible userspace. e.g. may have multiple versions iommu_gpasid_bind_data_vtd struct in the union part. > > > > > + > > > > + if (copy_from_user(_data, arg, sizeof(gbind_data))) > > > > + return -EFAULT; > > > > + > > > > + mutex_lock(>lock); > > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > > > > + ret = -EINVAL; > > > > + goto out_unlock; > > > > + } > > > > + > > > > + ret = vfio_iommu_type1_do_guest_unbind(iommu, _data); > > > > + > > > > +out_unlock: > > > > + mutex_unlock(>lock); > > > > + return ret; > > > > +} > > > > + > > > > static long vfio_iommu_type1_ioctl(void *iommu_data, > > > >unsigned int cmd, unsigned long arg) > > > > { > > > > @@ -2484,6 +2582,44 @@ static long
[PATCH 1/3] dma-direct: unify the dma_capable definitions
Currently each architectures that wants to override dma_to_phys and phys_to_dma also has to provide dma_capable. But there isn't really any good reason for that. powerpc and mips just have copies of the generic one minus the latests fix, and the arm one was the inspiration for said fix, but misses the bus_dma_mask handling. Make all architectures use the generic version instead. Signed-off-by: Christoph Hellwig --- arch/arm/include/asm/dma-direct.h | 19 --- arch/mips/include/asm/dma-direct.h| 8 arch/powerpc/include/asm/dma-direct.h | 9 - include/linux/dma-direct.h| 2 +- 4 files changed, 1 insertion(+), 37 deletions(-) diff --git a/arch/arm/include/asm/dma-direct.h b/arch/arm/include/asm/dma-direct.h index b67e5fc1fe43..7c3001a6a775 100644 --- a/arch/arm/include/asm/dma-direct.h +++ b/arch/arm/include/asm/dma-direct.h @@ -14,23 +14,4 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr) return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset; } -static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) -{ - u64 limit, mask; - - if (!dev->dma_mask) - return 0; - - mask = *dev->dma_mask; - - limit = (mask + 1) & ~mask; - if (limit && size > limit) - return 0; - - if ((addr | (addr + size - 1)) & ~mask) - return 0; - - return 1; -} - #endif /* ASM_ARM_DMA_DIRECT_H */ diff --git a/arch/mips/include/asm/dma-direct.h b/arch/mips/include/asm/dma-direct.h index b5c240806e1b..14e352651ce9 100644 --- a/arch/mips/include/asm/dma-direct.h +++ b/arch/mips/include/asm/dma-direct.h @@ -2,14 +2,6 @@ #ifndef _MIPS_DMA_DIRECT_H #define _MIPS_DMA_DIRECT_H 1 -static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) -{ - if (!dev->dma_mask) - return false; - - return addr + size - 1 <= *dev->dma_mask; -} - dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr); phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr); diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h index a2912b47102c..e29e8a236b8d 100644 --- a/arch/powerpc/include/asm/dma-direct.h +++ b/arch/powerpc/include/asm/dma-direct.h @@ -2,15 +2,6 @@ #ifndef ASM_POWERPC_DMA_DIRECT_H #define ASM_POWERPC_DMA_DIRECT_H 1 -static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) -{ - if (!dev->dma_mask) - return false; - - return addr + size - 1 <= - min_not_zero(*dev->dma_mask, dev->bus_dma_mask); -} - static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr) { if (!dev) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 6db863c3eb93..991f8aa2676e 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -24,6 +24,7 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr) return paddr + ((phys_addr_t)dev->dma_pfn_offset << PAGE_SHIFT); } +#endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) { @@ -38,7 +39,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask); } -#endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */ #ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED bool force_dma_unencrypted(struct device *dev); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
unify the dma_capable definition
Hi all, there is no good reason to have different version of dma_capable. This series removes the arch overrides and also adds a few cleanups in the same area. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/3] dma-direct: avoid a forward declaration for phys_to_dma
Move dma_capable down a bit so that we don't need a forward declaration for phys_to_dma. Signed-off-by: Christoph Hellwig --- include/linux/dma-direct.h | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 991f8aa2676e..f8959f75e496 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -6,8 +6,6 @@ #include /* for min_low_pfn */ #include -static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr); - #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA #include #else @@ -26,20 +24,6 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr) } #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */ -static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) -{ - dma_addr_t end = addr + size - 1; - - if (!dev->dma_mask) - return false; - - if (!IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) && - min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn))) - return false; - - return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask); -} - #ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED bool force_dma_unencrypted(struct device *dev); #else @@ -65,6 +49,20 @@ static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr) return __sme_clr(__dma_to_phys(dev, daddr)); } +static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) +{ + dma_addr_t end = addr + size - 1; + + if (!dev->dma_mask) + return false; + + if (!IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) && + min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn))) + return false; + + return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask); +} + u64 dma_direct_get_required_mask(struct device *dev); void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/3] powerpc: remove support for NULL dev in __phys_to_dma / __dma_to_phys
Support for calling the DMA API functions without a valid device pointer was removed a while ago, so remove the stale support for that from the powerpc __phys_to_dma / __dma_to_phys helpers. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-direct.h | 4 1 file changed, 4 deletions(-) diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h index e29e8a236b8d..abc154d784b0 100644 --- a/arch/powerpc/include/asm/dma-direct.h +++ b/arch/powerpc/include/asm/dma-direct.h @@ -4,15 +4,11 @@ static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr) { - if (!dev) - return paddr + PCI_DRAM_OFFSET; return paddr + dev->archdata.dma_offset; } static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr) { - if (!dev) - return daddr - PCI_DRAM_OFFSET; return daddr - dev->archdata.dma_offset; } #endif /* ASM_POWERPC_DMA_DIRECT_H */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
remove DMA_ATTR_WRITE_BARRIER
There is no implementation of the DMA_ATTR_WRITE_BARRIER flag left now that the ia64 sn2 code has been removed. Drop the flag and the calling convention to set it in the RDMA code. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flag
This flag is not implemented by any backend and only set by the ib_umem module in a single instance. Signed-off-by: Christoph Hellwig --- Documentation/DMA-attributes.txt | 18 -- drivers/infiniband/core/umem.c | 9 ++--- include/linux/dma-mapping.h | 5 + 3 files changed, 3 insertions(+), 29 deletions(-) diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt index 8f8d97f65d73..29dcbe8826e8 100644 --- a/Documentation/DMA-attributes.txt +++ b/Documentation/DMA-attributes.txt @@ -5,24 +5,6 @@ DMA attributes This document describes the semantics of the DMA attributes that are defined in linux/dma-mapping.h. -DMA_ATTR_WRITE_BARRIER --- - -DMA_ATTR_WRITE_BARRIER is a (write) barrier attribute for DMA. DMA -to a memory region with the DMA_ATTR_WRITE_BARRIER attribute forces -all pending DMA writes to complete, and thus provides a mechanism to -strictly order DMA from a device across all intervening busses and -bridges. This barrier is not specific to a particular type of -interconnect, it applies to the system as a whole, and so its -implementation must account for the idiosyncrasies of the system all -the way from the DMA device to memory. - -As an example of a situation where DMA_ATTR_WRITE_BARRIER would be -useful, suppose that a device does a DMA write to indicate that data is -ready and available in memory. The DMA of the "completion indication" -could race with data DMA. Mapping the memory used for completion -indications with DMA_ATTR_WRITE_BARRIER would prevent the race. - DMA_ATTR_WEAK_ORDERING -- diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 24244a2f68cc..66148739b00f 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -199,7 +199,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, struct mm_struct *mm; unsigned long npages; int ret; - unsigned long dma_attrs = 0; struct scatterlist *sg; unsigned int gup_flags = FOLL_WRITE; @@ -211,9 +210,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, if (!context) return ERR_PTR(-EIO); - if (dmasync) - dma_attrs |= DMA_ATTR_WRITE_BARRIER; - /* * If the combination of the addr and size requested for this memory * region causes an integer overflow, return error. @@ -294,11 +290,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, sg_mark_end(sg); - umem->nmap = ib_dma_map_sg_attrs(context->device, + umem->nmap = ib_dma_map_sg(context->device, umem->sg_head.sgl, umem->sg_nents, - DMA_BIDIRECTIONAL, - dma_attrs); + DMA_BIDIRECTIONAL); if (!umem->nmap) { ret = -ENOMEM; diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 4d450672b7d6..a4930310d0c7 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -15,11 +15,8 @@ /** * List of possible attributes associated with a DMA mapping. The semantics * of each attribute should be defined in Documentation/DMA-attributes.txt. - * - * DMA_ATTR_WRITE_BARRIER: DMA to a memory region with this attribute - * forces all pending DMA writes to complete. */ -#define DMA_ATTR_WRITE_BARRIER (1UL << 0) + /* * DMA_ATTR_WEAK_ORDERING: Specifies that reads and writes to the mapping * may be weakly ordered, that is that reads and writes may pass each other. -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] IB/umem: remove the dmasync argument to ib_umem_get
The argument is always ignored, so remove it. Signed-off-by: Christoph Hellwig --- drivers/infiniband/core/umem.c| 3 +-- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 10 - drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 +- drivers/infiniband/hw/cxgb4/mem.c | 2 +- drivers/infiniband/hw/efa/efa_verbs.c | 2 +- drivers/infiniband/hw/hns/hns_roce_cq.c | 2 +- drivers/infiniband/hw/hns/hns_roce_db.c | 2 +- drivers/infiniband/hw/hns/hns_roce_mr.c | 4 ++-- drivers/infiniband/hw/hns/hns_roce_qp.c | 2 +- drivers/infiniband/hw/hns/hns_roce_srq.c | 4 ++-- drivers/infiniband/hw/i40iw/i40iw_verbs.c | 2 +- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/doorbell.c | 2 +- drivers/infiniband/hw/mlx4/mr.c | 2 +- drivers/infiniband/hw/mlx4/qp.c | 5 ++--- drivers/infiniband/hw/mlx4/srq.c | 2 +- drivers/infiniband/hw/mlx5/cq.c | 4 ++-- drivers/infiniband/hw/mlx5/devx.c | 2 +- drivers/infiniband/hw/mlx5/doorbell.c | 2 +- drivers/infiniband/hw/mlx5/mr.c | 2 +- drivers/infiniband/hw/mlx5/qp.c | 4 ++-- drivers/infiniband/hw/mlx5/srq.c | 2 +- drivers/infiniband/hw/mthca/mthca_provider.c | 4 +--- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 2 +- drivers/infiniband/hw/qedr/verbs.c| 21 +-- drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c | 2 +- drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c | 2 +- drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c | 4 ++-- drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c | 2 +- drivers/infiniband/sw/rdmavt/mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_mr.c| 2 +- include/rdma/ib_umem.h| 4 ++-- 32 files changed, 52 insertions(+), 57 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 66148739b00f..7a3b99597ead 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -185,10 +185,9 @@ EXPORT_SYMBOL(ib_umem_find_best_pgsz); * @addr: userspace virtual address to start at * @size: length of region to pin * @access: IB_ACCESS_xxx flags for memory being pinned - * @dmasync: flush in-flight DMA when the memory region is written */ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, - size_t size, int access, int dmasync) + size_t size, int access) { struct ib_ucontext *context; struct ib_umem *umem; diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index b4149dc9e824..99a8e0b99e66 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -855,7 +855,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd, bytes += (qplib_qp->sq.max_wqe * psn_sz); } bytes = PAGE_ALIGN(bytes); - umem = ib_umem_get(udata, ureq.qpsva, bytes, IB_ACCESS_LOCAL_WRITE, 1); + umem = ib_umem_get(udata, ureq.qpsva, bytes, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(umem)) return PTR_ERR(umem); @@ -869,7 +869,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd, bytes = (qplib_qp->rq.max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE); bytes = PAGE_ALIGN(bytes); umem = ib_umem_get(udata, ureq.qprva, bytes, - IB_ACCESS_LOCAL_WRITE, 1); + IB_ACCESS_LOCAL_WRITE); if (IS_ERR(umem)) goto rqfail; qp->rumem = umem; @@ -1322,7 +1322,7 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev, bytes = (qplib_srq->max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE); bytes = PAGE_ALIGN(bytes); - umem = ib_umem_get(udata, ureq.srqva, bytes, IB_ACCESS_LOCAL_WRITE, 1); + umem = ib_umem_get(udata, ureq.srqva, bytes, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(umem)) return PTR_ERR(umem); @@ -2565,7 +2565,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, cq->umem = ib_umem_get(udata, req.cq_va, entries * sizeof(struct cq_base), - IB_ACCESS_LOCAL_WRITE, 1); + IB_ACCESS_LOCAL_WRITE); if (IS_ERR(cq->umem)) { rc = PTR_ERR(cq->umem); goto fail; @@ -3530,7 +3530,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length, /* The fixed portion of the rkey is the same as the lkey */ mr->ib_mr.rkey = mr->qplib_mr.rkey; - umem = ib_umem_get(udata, start,
[PATCH 1/3] x86: Remove the calgary IOMMU driver
The calgary IOMMU was only used on high-end IBM systems in the early x86_64 age and has no known users left. Remove it to avoid having to touch it for pending changes to the DMA API. Signed-off-by: Christoph Hellwig --- MAINTAINERS | 10 - arch/x86/Kconfig | 30 - arch/x86/configs/x86_64_defconfig |1 - arch/x86/include/asm/calgary.h| 57 -- arch/x86/include/asm/pci_64.h | 14 - arch/x86/include/asm/tce.h| 35 - arch/x86/kernel/Makefile |1 - arch/x86/kernel/pci-calgary_64.c | 1586 - arch/x86/kernel/pci-dma.c |6 - arch/x86/kernel/tce_64.c | 177 10 files changed, 1917 deletions(-) delete mode 100644 arch/x86/include/asm/calgary.h delete mode 100644 arch/x86/include/asm/tce.h delete mode 100644 arch/x86/kernel/pci-calgary_64.c delete mode 100644 arch/x86/kernel/tce_64.c diff --git a/MAINTAINERS b/MAINTAINERS index eb19fad370d7..778e8dee3e9d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3627,16 +3627,6 @@ L: c...@lists.bufferbloat.net (moderated for non-subscribers) S: Maintained F: net/sched/sch_cake.c -CALGARY x86-64 IOMMU -M: Muli Ben-Yehuda -M: Jon Mason -L: iommu@lists.linux-foundation.org -S: Maintained -F: arch/x86/kernel/pci-calgary_64.c -F: arch/x86/kernel/tce_64.c -F: arch/x86/include/asm/calgary.h -F: arch/x86/include/asm/tce.h - CAN NETWORK DRIVERS M: Wolfgang Grandegger M: Marc Kleine-Budde diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d6e1faa28c58..4d27bdc85bf2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -932,36 +932,6 @@ config GART_IOMMU If unsure, say Y. -config CALGARY_IOMMU - bool "IBM Calgary IOMMU support" - select IOMMU_HELPER - select SWIOTLB - depends on X86_64 && PCI - ---help--- - Support for hardware IOMMUs in IBM's xSeries x366 and x460 - systems. Needed to run systems with more than 3GB of memory - properly with 32-bit PCI devices that do not support DAC - (Double Address Cycle). Calgary also supports bus level - isolation, where all DMAs pass through the IOMMU. This - prevents them from going anywhere except their intended - destination. This catches hard-to-find kernel bugs and - mis-behaving drivers and devices that do not use the DMA-API - properly to set up their DMA buffers. The IOMMU can be - turned off at boot time with the iommu=off parameter. - Normally the kernel will make the right choice by itself. - If unsure, say Y. - -config CALGARY_IOMMU_ENABLED_BY_DEFAULT - def_bool y - prompt "Should Calgary be enabled by default?" - depends on CALGARY_IOMMU - ---help--- - Should Calgary be enabled by default? if you choose 'y', Calgary - will be used (if it exists). If you choose 'n', Calgary will not be - used even if it exists. If you choose 'n' and would like to use - Calgary anyway, pass 'iommu=calgary' on the kernel command line. - If unsure, say Y. - config MAXSMP bool "Enable Maximum number of SMP Processors and NUMA Nodes" depends on X86_64 && SMP && DEBUG_KERNEL diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig index d0a5ffeae8df..0b9654c7a05c 100644 --- a/arch/x86/configs/x86_64_defconfig +++ b/arch/x86/configs/x86_64_defconfig @@ -25,7 +25,6 @@ CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_SMP=y -CONFIG_CALGARY_IOMMU=y CONFIG_NR_CPUS=64 CONFIG_SCHED_SMT=y CONFIG_PREEMPT_VOLUNTARY=y diff --git a/arch/x86/include/asm/calgary.h b/arch/x86/include/asm/calgary.h deleted file mode 100644 index facd374a1bf7.. --- a/arch/x86/include/asm/calgary.h +++ /dev/null @@ -1,57 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Derived from include/asm-powerpc/iommu.h - * - * Copyright IBM Corporation, 2006-2007 - * - * Author: Jon Mason - * Author: Muli Ben-Yehuda - */ - -#ifndef _ASM_X86_CALGARY_H -#define _ASM_X86_CALGARY_H - -#include -#include -#include -#include -#include - -struct iommu_table { - const struct cal_chipset_ops *chip_ops; /* chipset specific funcs */ - unsigned long it_base; /* mapped address of tce table */ - unsigned long it_hint; /* Hint for next alloc */ - unsigned long *it_map; /* A simple allocation bitmap for now */ - void __iomem *bbar; /* Bridge BAR */ - u64tar_val; /* Table Address Register */ - struct timer_list watchdog_timer; - spinlock_t it_lock; /* Protects it_map */ - unsigned int it_size; /* Size of iommu table in entries */ - unsigned char it_busno; /* Bus number this table belongs to */ -}; - -struct cal_chipset_ops { - void (*handle_quirks)(struct iommu_table
[PATCH 2/3] x86/pci: Remove pci_64.h
This file only contains external declarations for two non-existing function pointers. Signed-off-by: Christoph Hellwig --- arch/x86/include/asm/pci.h| 4 arch/x86/include/asm/pci_64.h | 14 -- 2 files changed, 18 deletions(-) delete mode 100644 arch/x86/include/asm/pci_64.h diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index e662f987dfa2..d9e28aad2738 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -120,10 +120,6 @@ void native_restore_msi_irqs(struct pci_dev *dev); #endif #endif /* __KERNEL__ */ -#ifdef CONFIG_X86_64 -#include -#endif - /* generic pci stuff */ #include diff --git a/arch/x86/include/asm/pci_64.h b/arch/x86/include/asm/pci_64.h deleted file mode 100644 index 4e1aef506aa5.. --- a/arch/x86/include/asm/pci_64.h +++ /dev/null @@ -1,14 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ASM_X86_PCI_64_H -#define _ASM_X86_PCI_64_H - -#ifdef __KERNEL__ - -extern int (*pci_config_read)(int seg, int bus, int dev, int fn, - int reg, int len, u32 *value); -extern int (*pci_config_write)(int seg, int bus, int dev, int fn, - int reg, int len, u32 value); - -#endif /* __KERNEL__ */ - -#endif /* _ASM_X86_PCI_64_H */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/3] x86/pci: Remove #ifdef __KERNEL__ guard from
pci.h is not a UAPI header, so the __KERNEL__ ifdef is rather pointless. Signed-off-by: Christoph Hellwig --- arch/x86/include/asm/pci.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index d9e28aad2738..90d0731fdcb6 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -12,8 +12,6 @@ #include #include -#ifdef __KERNEL__ - struct pci_sysdata { int domain; /* PCI domain */ int node; /* NUMA node */ @@ -118,7 +116,6 @@ void native_restore_msi_irqs(struct pci_dev *dev); #define native_setup_msi_irqs NULL #define native_teardown_msi_irqNULL #endif -#endif /* __KERNEL__ */ /* generic pci stuff */ #include -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Remove the calgary iommu driver
Hi all, per the discussion a few month ago with Jon we've agreed that the calgary iommu driver is obsolete and in the way. Jon said back then he'd send a patch to remove it, but as that didn't happen I prepared one, plus two small cleanups touching the same area. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
On Wed, Nov 13, 2019 at 10:50:27AM +0800, Lu Baolu wrote: > Currently, this is a block issue for using per-device dma ops in Intel > IOMMU driver. Hence block this driver from using the generic iommu dma > ops. That is in fact the reason why I bring it up :) > I'd like to align Intel IOMMU driver with other vendors. Use iommu dma > ops for devices which have been selected to go through iommu. And use > direct dma ops if selected to by pass. > > One concern of this propose is that for devices with limited address > capability, shall we force it to use iommu or alternatively use swiotlb > if user decides to let it by pass iommu. > > I understand that using swiotlb will cause some overhead due to the > bounced buffer, but Intel IOMMU is default on hence any users who use a > default kernel won't suffer this. We only need to document this so that > users understand this overhead when they decide to let such devices by > pass iommu. This is common to all vendor iommu drivers as far as I can > see. Indeed. And one idea would be to lift the code in the powerpc dma_iommu_ops that check a flag and use the direct ops to the generic dma code and a flag in struct device. We can then switch the intel iommu ops (and AMD Gart) over to it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
Hi Christoph, On 11/12/19 3:16 PM, Christoph Hellwig wrote: On Fri, Jul 26, 2019 at 09:56:51AM +0800, Lu Baolu wrote: I think current code doesn't do the right thing. The user asks the iommu driver to use identity domain for a device, but the driver force it back to DMA domain because of the device address capability. expensive. I don't think that this change is a good idea, and even if we decide that this is a good idea after all that should be done in a separate prep patch that explains the rationale. Yes. Make sense. Now that the bounce code has landed it might be good time to revisit this patch in isolation and with a better explanation. Yes. Thanks for bringing this up. Currently, this is a block issue for using per-device dma ops in Intel IOMMU driver. Hence block this driver from using the generic iommu dma ops. I'd like to align Intel IOMMU driver with other vendors. Use iommu dma ops for devices which have been selected to go through iommu. And use direct dma ops if selected to by pass. One concern of this propose is that for devices with limited address capability, shall we force it to use iommu or alternatively use swiotlb if user decides to let it by pass iommu. I understand that using swiotlb will cause some overhead due to the bounced buffer, but Intel IOMMU is default on hence any users who use a default kernel won't suffer this. We only need to document this so that users understand this overhead when they decide to let such devices by pass iommu. This is common to all vendor iommu drivers as far as I can see. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.14 010/115] iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
From: Robin Murphy [ Upstream commit 85c7a0f1ef624ef58173ef52ea77780257bdfe04 ] In removing the pagetable-wide lock, we gained the possibility of the vanishingly unlikely case where we have a race between two concurrent unmappers splitting the same block entry. The logic to handle this is fairly straightforward - whoever loses the race frees their partial next-level table and instead dereferences the winner's newly-installed entry in order to fall back to a regular unmap, which intentionally echoes the pre-existing case of recursively splitting a 1GB block down to 4KB pages by installing a full table of 2MB blocks first. Unfortunately, the chump who implemented that logic failed to update the condition check for that fallback, meaning that if said race occurs at the last level (where the loser's unmap_idx is valid) then the unmap won't actually happen. Fix that to properly account for both the race and recursive cases. Fixes: 2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation") Signed-off-by: Robin Murphy [will: re-jig control flow to avoid duplicate cmpxchg test] Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/io-pgtable-arm.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index e8018a308868e..17a9225283dd1 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -551,13 +551,12 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, return 0; tablep = iopte_deref(pte, data); + } else if (unmap_idx >= 0) { + io_pgtable_tlb_add_flush(>iop, iova, size, size, true); + return size; } - if (unmap_idx < 0) - return __arm_lpae_unmap(data, iova, size, lvl, tablep); - - io_pgtable_tlb_add_flush(>iop, iova, size, size, true); - return size; + return __arm_lpae_unmap(data, iova, size, lvl, tablep); } static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.19 022/209] iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout
From: Zhen Lei [ Upstream commit 0f02477d16980938a84aba8688a4e3a303306116 ] The condition break condition of: (int)(VAL - sync_idx) >= 0 in the __arm_smmu_sync_poll_msi() polling loop requires that sync_idx must be increased monotonically according to the sequence of the CMDs in the cmdq. However, since the msidata is populated using atomic_inc_return_relaxed() before taking the command-queue spinlock, then the following scenario can occur: CPU0CPU1 msidata=0 msidata=1 insert cmd1 insert cmd0 smmu execute cmd1 smmu execute cmd0 poll timeout, because msidata=1 is overridden by cmd0, that means VAL=0, sync_idx=1. This is not a functional problem, since the caller will eventually either timeout or exit due to another CMD_SYNC, however it's clearly not what the code is supposed to be doing. Fix it, by incrementing the sequence count with the command-queue lock held, allowing us to drop the atomic operations altogether. Signed-off-by: Zhen Lei [will: dropped the specialised cmd building routine for now] Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm-smmu-v3.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 40fbf20d69e5a..2ab7100bcff12 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -567,7 +567,7 @@ struct arm_smmu_device { int gerr_irq; int combined_irq; - atomic_tsync_nr; + u32 sync_nr; unsigned long ias; /* IPA */ unsigned long oas; /* PA */ @@ -964,14 +964,13 @@ static int __arm_smmu_cmdq_issue_sync_msi(struct arm_smmu_device *smmu) struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC, .sync = { - .msidata = atomic_inc_return_relaxed(>sync_nr), .msiaddr = virt_to_phys(>sync_count), }, }; - arm_smmu_cmdq_build_cmd(cmd, ); - spin_lock_irqsave(>cmdq.lock, flags); + ent.sync.msidata = ++smmu->sync_nr; + arm_smmu_cmdq_build_cmd(cmd, ); arm_smmu_cmdq_insert_cmd(smmu, cmd); spin_unlock_irqrestore(>cmdq.lock, flags); @@ -2196,7 +2195,6 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu) { int ret; - atomic_set(>sync_nr, 0); ret = arm_smmu_init_queues(smmu); if (ret) return ret; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH AUTOSEL 4.19 021/209] iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
From: Robin Murphy [ Upstream commit 85c7a0f1ef624ef58173ef52ea77780257bdfe04 ] In removing the pagetable-wide lock, we gained the possibility of the vanishingly unlikely case where we have a race between two concurrent unmappers splitting the same block entry. The logic to handle this is fairly straightforward - whoever loses the race frees their partial next-level table and instead dereferences the winner's newly-installed entry in order to fall back to a regular unmap, which intentionally echoes the pre-existing case of recursively splitting a 1GB block down to 4KB pages by installing a full table of 2MB blocks first. Unfortunately, the chump who implemented that logic failed to update the condition check for that fallback, meaning that if said race occurs at the last level (where the loser's unmap_idx is valid) then the unmap won't actually happen. Fix that to properly account for both the race and recursive cases. Fixes: 2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation") Signed-off-by: Robin Murphy [will: re-jig control flow to avoid duplicate cmpxchg test] Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/io-pgtable-arm.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 88641b4560bc8..2f79efd16a052 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -574,13 +574,12 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, return 0; tablep = iopte_deref(pte, data); + } else if (unmap_idx >= 0) { + io_pgtable_tlb_add_flush(>iop, iova, size, size, true); + return size; } - if (unmap_idx < 0) - return __arm_lpae_unmap(data, iova, size, lvl, tablep); - - io_pgtable_tlb_add_flush(>iop, iova, size, size, true); - return size; + return __arm_lpae_unmap(data, iova, size, lvl, tablep); } static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
On Wed, Nov 13, 2019 at 12:05 AM Joerg Roedel wrote: > > On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote: > > > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel wrote: > > > > So there are a couple of options here: > > > > > > 1) Bail out and disable the IOMMU as the BIOS screwed up > > > > > > 2) Treat per-device exclusion ranges just as r/w unity-mapped > > >regions. > > > > > > > > > I think option 2) is the best fix here. > > > > Yes. This is what this patch does (The first exclusion region still > > uses the exclusion range while the remaining exclusion regions are > > modified to be r/w unity-mapped regions when detecting multiple > > exclusion regions) . > > Yeah, but I think it is better to just stop using the exclusion-range > feature of the hardware (because it meand BIOSes are correct) and just > treat every exclusion range (also the first one) as an r/w unity range. Okay, I see. If you're okay with this (treat per-device exclusion ranges as r/w unity-mapped regions - including the first one), I can prepare the patch. > > > But, I'm guessing you're talking about that BIOS has to define r/w > > unity-mapped regions instead of the per-device exclusions (Fix from > > BIOS, not prevent the failure from kernel). Right? > > That would be best, but I fear this is too much to wish for. Totally agree. That's why I submit this patch. :-) -- Adrian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Bug 205201 - overflow of DMA mask and bus mask
On 12 November 2019 at 3:41 pm, Christoph Hellwig wrote: On Mon, Nov 11, 2019 at 01:22:55PM +0100, Christian Zigotzky wrote: Now, I can definitely say that this patch does not solve the issue. Do you have another patch for testing or shall I bisect? I'm still interested in the .config and dmesg. Especially if the board is using arch/powerpc/sysdev/fsl_pci.c, which is the only place inside the powerpc arch code doing funny things with the bus_dma_mask, which Robin pointed out looks fishy. Here you are: .config: https://bugzilla.kernel.org/attachment.cgi?id=285815 dmesg: https://bugzilla.kernel.org/attachment.cgi?id=285813 Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
Hi Shameer, On 11/12/19 6:56 PM, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -Original Message- >> From: Shameerali Kolothum Thodi >> Sent: 12 November 2019 14:21 >> To: 'Auger Eric' ; eric.auger@gmail.com; >> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org; >> alex.william...@redhat.com; jacob.jun@linux.intel.com; >> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; >> robin.mur...@arm.com >> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; >> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm >> ; xuwei (O) >> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) >> > [...] >> I am trying to get this running on one of our platform that has smmuv3 >> dual >> stage support. I am seeing some issues with this when an ixgbe vf dev is >> made pass-through and is behind a vSMMUv3 in Guest. >> >> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 >> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 >> >> And this is my Qemu cmd line, >> >> ./qemu-system-aarch64 >> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host >> \ >> -kernel Image \ >> -drive if=none,file=ubuntu,id=fs \ >> -device virtio-blk-device,drive=fs \ >> -device vfio-pci,host=:01:10.1 \ >> -bios QEMU_EFI.fd \ >> -net none \ >> -m 4G \ >> -nographic -D -d -enable-kvm \ >> -append "console=ttyAMA0 root=/dev/vda rw acpi=force" >> >> The basic ping from Guest works fine, >> root@ubuntu:~# ping 10.202.225.185 >> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. >> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms >> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms >> ... >> >> But if I increase ping packet size, >> >> root@ubuntu:~# ping -s 1024 10.202.225.185 >> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. >> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms >> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms >> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable >> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable >> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable >> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable >> >> And from Host kernel I get, >> [ 819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets >> detected >> [ 824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets >> detected >> [ 828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets >> detected >> [ 830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets >> detected >> [ 832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets >> detected >> [ 834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets >> detected >> >> Also noted that iperf cannot work as it fails to establish the connection >>> with > iperf >> server. >> >> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your > reference. >> I haven't debugged this further yet and thought of checking with you if >> this >>> is >> something you have seen already or not. Or maybe I am missing >> something > here? > > Please can you try to edit and modify hw/vfio/common.c, function > vfio_iommu_unmap_notify > > > /* > if (size <= 0x1) { > ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; > ustruct.info.granularity = IOMMU_INV_GRANU_ADDR; > ustruct.info.addr_info.flags = >>> IOMMU_INV_ADDR_FLAGS_ARCHID; > if (iotlb->leaf) { > ustruct.info.addr_info.flags |= > IOMMU_INV_ADDR_FLAGS_LEAF; > } > ustruct.info.addr_info.archid = iotlb->arch_id; > ustruct.info.addr_info.addr = start; > ustruct.info.addr_info.granule_size = size; > ustruct.info.addr_info.nb_granules = 1; > trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1, > iotlb->leaf); > } else { > */ > ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; > ustruct.info.granularity = IOMMU_INV_GRANU_PASID; > ustruct.info.pasid_info.archid = iotlb->arch_id; > ustruct.info.pasid_info.flags = >>> IOMMU_INV_PASID_FLAGS_ARCHID; > trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id); > //} > > This modification leads to invalidate the whole asid each time we get a > guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I > saw this was the cause of such kind of issues. Please let me know if it > fixes your perf issues Yes, this seems to fix the
RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
Hi Eric, > -Original Message- > From: Shameerali Kolothum Thodi > Sent: 12 November 2019 14:21 > To: 'Auger Eric' ; eric.auger@gmail.com; > iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org; > alex.william...@redhat.com; jacob.jun@linux.intel.com; > yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; > robin.mur...@arm.com > Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; > marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm > ; xuwei (O) > Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) > [...] > > >>> I am trying to get this running on one of our platform that has smmuv3 > dual > > >>> stage support. I am seeing some issues with this when an ixgbe vf dev is > > >>> made pass-through and is behind a vSMMUv3 in Guest. > > >>> > > >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 > > >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 > > >>> > > >>> And this is my Qemu cmd line, > > >>> > > >>> ./qemu-system-aarch64 > > >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host > \ > > >>> -kernel Image \ > > >>> -drive if=none,file=ubuntu,id=fs \ > > >>> -device virtio-blk-device,drive=fs \ > > >>> -device vfio-pci,host=:01:10.1 \ > > >>> -bios QEMU_EFI.fd \ > > >>> -net none \ > > >>> -m 4G \ > > >>> -nographic -D -d -enable-kvm \ > > >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force" > > >>> > > >>> The basic ping from Guest works fine, > > >>> root@ubuntu:~# ping 10.202.225.185 > > >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. > > >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms > > >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms > > >>> ... > > >>> > > >>> But if I increase ping packet size, > > >>> > > >>> root@ubuntu:~# ping -s 1024 10.202.225.185 > > >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. > > >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms > > >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms > > >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable > > >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable > > >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable > > >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable > > >>> > > >>> And from Host kernel I get, > > >>> [ 819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets > detected > > >>> [ 824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets > detected > > >>> [ 828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets > detected > > >>> [ 830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets > detected > > >>> [ 832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets > detected > > >>> [ 834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets > detected > > >>> > > >>> Also noted that iperf cannot work as it fails to establish the > > >>> connection > > with > > >> iperf > > >>> server. > > >>> > > >>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your > > >> reference. > > >>> I haven't debugged this further yet and thought of checking with you if > this > > is > > >>> something you have seen already or not. Or maybe I am missing > something > > >> here? > > >> > > >> Please can you try to edit and modify hw/vfio/common.c, function > > >> vfio_iommu_unmap_notify > > >> > > >> > > >> /* > > >> if (size <= 0x1) { > > >> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; > > >> ustruct.info.granularity = IOMMU_INV_GRANU_ADDR; > > >> ustruct.info.addr_info.flags = > > IOMMU_INV_ADDR_FLAGS_ARCHID; > > >> if (iotlb->leaf) { > > >> ustruct.info.addr_info.flags |= > > >> IOMMU_INV_ADDR_FLAGS_LEAF; > > >> } > > >> ustruct.info.addr_info.archid = iotlb->arch_id; > > >> ustruct.info.addr_info.addr = start; > > >> ustruct.info.addr_info.granule_size = size; > > >> ustruct.info.addr_info.nb_granules = 1; > > >> trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1, > > >> iotlb->leaf); > > >> } else { > > >> */ > > >> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; > > >> ustruct.info.granularity = IOMMU_INV_GRANU_PASID; > > >> ustruct.info.pasid_info.archid = iotlb->arch_id; > > >> ustruct.info.pasid_info.flags = > > IOMMU_INV_PASID_FLAGS_ARCHID; > > >> trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id); > > >> //} > > >> > > >> This modification leads to invalidate the whole asid each time we get a > > >> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I > > >> saw this was the cause of such kind of issues. Please let me know if it > > >> fixes your perf issues > > > > > > Yes, this seems to fix
Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to host
On Tue, 12 Nov 2019 11:21:40 + "Liu, Yi L" wrote: > > From: Alex Williamson < alex.william...@redhat.com > > > Sent: Friday, November 8, 2019 7:21 AM > > To: Liu, Yi L > > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) > > to host > > > > On Thu, 24 Oct 2019 08:26:23 -0400 > > Liu Yi L wrote: > > > > > This patch adds vfio support to bind guest translation structure > > > to host iommu. VFIO exposes iommu programming capability to user- > > > space. Guest is a user-space application in host under KVM solution. > > > For SVA usage in Virtual Machine, guest owns GVA->GPA translation > > > structure. And this part should be passdown to host to enable nested > > > translation (or say two stage translation). This patch reuses the > > > VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and adds new > > > bind type for binding guest owned translation structure to host. > > > > > > *) Add two new ioctls for VFIO containers. > > > > > > - VFIO_IOMMU_BIND: for bind request from userspace, it could be > > >bind a process to a pasid or bind a guest pasid > > >to a device, this is indicated by type > > > - VFIO_IOMMU_UNBIND: for unbind request from userspace, it could be > > >unbind a process to a pasid or unbind a guest pasid > > >to a device, also indicated by type > > > - Bind type: > > > VFIO_IOMMU_BIND_PROCESS: user-space request to bind a process > > >to a device > > > VFIO_IOMMU_BIND_GUEST_PASID: bind guest owned translation > > >structure to host iommu. e.g. guest page table > > > > > > *) Code logic in vfio_iommu_type1_ioctl() to handle VFIO_IOMMU_BIND/UNBIND > > > > > > Cc: Kevin Tian > > > Signed-off-by: Jean-Philippe Brucker > > > Signed-off-by: Liu Yi L > > > Signed-off-by: Jacob Pan > > > --- > > > drivers/vfio/vfio_iommu_type1.c | 136 > > > > > include/uapi/linux/vfio.h | 44 + > > > 2 files changed, 180 insertions(+) > > > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > b/drivers/vfio/vfio_iommu_type1.c > > > index 3d73a7d..1a27e25 100644 > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > @@ -2325,6 +2325,104 @@ static int vfio_iommu_type1_pasid_free(struct > > vfio_iommu *iommu, > > > return ret; > > > } > > > > > > +static int vfio_bind_gpasid_fn(struct device *dev, void *data) > > > +{ > > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > > + struct iommu_gpasid_bind_data *ustruct = > > > + (struct iommu_gpasid_bind_data *) dc->data; > > > + > > > + return iommu_sva_bind_gpasid(dc->domain, dev, ustruct); > > > +} > > > + > > > +static int vfio_unbind_gpasid_fn(struct device *dev, void *data) > > > +{ > > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > > + struct iommu_gpasid_bind_data *ustruct = > > > + (struct iommu_gpasid_bind_data *) dc->data; > > > + > > > + return iommu_sva_unbind_gpasid(dc->domain, dev, > > > + ustruct->hpasid); > > > +} > > > + > > > +/* > > > + * unbind specific gpasid, caller of this function requires hold > > > + * vfio_iommu->lock > > > + */ > > > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu, > > > + struct iommu_gpasid_bind_data *gbind_data) > > > +{ > > > + return vfio_iommu_lookup_dev(iommu, vfio_unbind_gpasid_fn, gbind_data); > > > +} > > > + > > > +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu, > > > + void __user *arg, > > > + struct vfio_iommu_type1_bind *bind) > > > +{ > > > + struct iommu_gpasid_bind_data gbind_data; > > > + unsigned long minsz; > > > + int ret = 0; > > > + > > > + minsz = sizeof(*bind) + sizeof(gbind_data); > > > + if (bind->argsz < minsz) > > > + return -EINVAL; > > > + > > > + if (copy_from_user(_data, arg, sizeof(gbind_data))) > > > + return -EFAULT; > > > + > > > + mutex_lock(>lock); > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > > > + ret = -EINVAL; > > > + goto out_unlock; > > > + } > > > + > > > + ret = vfio_iommu_lookup_dev(iommu, vfio_bind_gpasid_fn, _data); > > > + /* > > > + * If bind failed, it may not be a total failure. Some devices within > > > + * the iommu group may have bind successfully. Although we don't enable > > > + * pasid capability for non-singletion iommu groups, a unbind operation > > > + * would be helpful to ensure no partial binding for an iommu group. > > > + */ > > > + if (ret) > > > + /* > > > + * Undo all binds that already succeeded, no need to check the > > > + * return value here since some device within the group has no > > > + * successful bind when coming to this place switch. > > > + */ > > > +
Re: [PATCH] iommu/rockchip: Don't provoke WARN for harmless IRQs
On Mon, Nov 11, 2019 at 06:55:18PM +, Robin Murphy wrote: > Although we don't generally expect IRQs to fire for a suspended IOMMU, > there are certain situations (particularly with debug options) where > we might legitimately end up with the pm_runtime_get_if_in_use() call > from rk_iommu_irq() returning 0. Since this doesn't represent an actual > error, follow the other parts of the driver and save the WARN_ON() > condition for a genuine negative value. Even if we do have spurious > IRQs due to a wedged VOP asserting the shared line, it's not this > driver's job to try to second-guess the IRQ core to warn about that. > > Reported-by: Vasily Khoruzhick > Signed-off-by: Robin Murphy > --- > drivers/iommu/rockchip-iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote: > > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel wrote: > > So there are a couple of options here: > > > > 1) Bail out and disable the IOMMU as the BIOS screwed up > > > > 2) Treat per-device exclusion ranges just as r/w unity-mapped > >regions. > > > > > > I think option 2) is the best fix here. > > Yes. This is what this patch does (The first exclusion region still > uses the exclusion range while the remaining exclusion regions are > modified to be r/w unity-mapped regions when detecting multiple > exclusion regions) . Yeah, but I think it is better to just stop using the exclusion-range feature of the hardware (because it meand BIOSes are correct) and just treat every exclusion range (also the first one) as an r/w unity range. > But, I'm guessing you're talking about that BIOS has to define r/w > unity-mapped regions instead of the per-device exclusions (Fix from > BIOS, not prevent the failure from kernel). Right? That would be best, but I fear this is too much to wish for. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/6] Raspberry Pi 4 PCIe support
This series aims at providing support for Raspberry Pi 4's PCIe controller, which is also shared with the Broadcom STB family of devices. There was a previous attempt to upstream this some years ago[1] but was blocked as most STB PCIe integrations have a sparse DMA mapping[2] which is something currently not supported by the kernel. Luckily this is not the case for the Raspberry Pi 4. Note that the driver code is to be based on top of Rob Herring's series simplifying inbound and outbound range parsing. [1] https://patchwork.kernel.org/cover/10605933/ [2] https://patchwork.kernel.org/patch/10605957/ --- Changes since v1: - add generic rounddown/roundup_pow_two64() patch - Add MAINTAINERS patch - Fix Kconfig - Cleanup probe, use up to date APIs, exit on MSI failure - Get rid of linux,pci-domain and other unused constructs - Use edge triggered setup for MSI - Cleanup MSI implementation - Fix multiple cosmetic issues - Remove supend/resume code Jim Quinlan (3): dt-bindings: PCI: Add bindings for brcmstb's PCIe device PCI: brcmstb: add Broadcom STB PCIe host controller driver PCI: brcmstb: add MSI capability Nicolas Saenz Julienne (3): linux/log2.h: Add roundup/rounddown_pow_two64() family of functions ARM: dts: bcm2711: Enable PCIe controller MAINTAINERS: Add brcmstb PCIe controller .../bindings/pci/brcm,stb-pcie.yaml | 110 ++ MAINTAINERS |4 + arch/arm/boot/dts/bcm2711.dtsi| 46 + drivers/net/ethernet/mellanox/mlx4/en_clock.c |3 +- drivers/pci/controller/Kconfig|9 + drivers/pci/controller/Makefile |1 + drivers/pci/controller/pcie-brcmstb.c | 1179 + drivers/pci/controller/pcie-cadence-ep.c |7 +- drivers/pci/controller/pcie-cadence.c |7 +- drivers/pci/controller/pcie-rockchip-ep.c |9 +- include/linux/log2.h | 52 + kernel/dma/direct.c |3 +- 12 files changed, 1412 insertions(+), 18 deletions(-) create mode 100644 Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml create mode 100644 drivers/pci/controller/pcie-brcmstb.c -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/6] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
Some users need to make sure their rounding function accepts and returns 64bit long variables regardless of the architecture. Sadly roundup/rounddown_pow_two() takes and returns unsigned longs. Create a new generic 64bit variant of the function and cleanup rougue custom implementations. Signed-off-by: Nicolas Saenz Julienne --- drivers/net/ethernet/mellanox/mlx4/en_clock.c | 3 +- drivers/pci/controller/pcie-cadence-ep.c | 7 +-- drivers/pci/controller/pcie-cadence.c | 7 +-- drivers/pci/controller/pcie-rockchip-ep.c | 9 ++-- include/linux/log2.h | 52 +++ kernel/dma/direct.c | 3 +- 6 files changed, 63 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c index 024788549c25..027bd72505e2 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c @@ -33,6 +33,7 @@ #include #include +#include #include "mlx4_en.h" @@ -252,7 +253,7 @@ static u32 freq_to_shift(u16 freq) { u32 freq_khz = freq * 1000; u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC; - u64 max_val_cycles_rounded = 1ULL << fls64(max_val_cycles - 1); + u64 max_val_cycles_rounded = roundup_pow_of_two64(max_val_cycles); /* calculate max possible multiplier in order to fit in 64bit */ u64 max_mul = div64_u64(ULLONG_MAX, max_val_cycles_rounded); diff --git a/drivers/pci/controller/pcie-cadence-ep.c b/drivers/pci/controller/pcie-cadence-ep.c index def7820cb824..26ff424b16f5 100644 --- a/drivers/pci/controller/pcie-cadence-ep.c +++ b/drivers/pci/controller/pcie-cadence-ep.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "pcie-cadence.h" @@ -90,11 +91,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn, /* BAR size is 2^(aperture + 7) */ sz = max_t(size_t, epf_bar->size, CDNS_PCIE_EP_MIN_APERTURE); - /* -* roundup_pow_of_two() returns an unsigned long, which is not suited -* for 64bit values. -*/ - sz = 1ULL << fls64(sz - 1); + sz = roundup_pow_of_two64(sz); aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */ if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) { diff --git a/drivers/pci/controller/pcie-cadence.c b/drivers/pci/controller/pcie-cadence.c index cd795f6fc1e2..b2278e6b955c 100644 --- a/drivers/pci/controller/pcie-cadence.c +++ b/drivers/pci/controller/pcie-cadence.c @@ -4,6 +4,7 @@ // Author: Cyrille Pitchen #include +#include #include "pcie-cadence.h" @@ -11,11 +12,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, u8 fn, u32 r, bool is_io, u64 cpu_addr, u64 pci_addr, size_t size) { - /* -* roundup_pow_of_two() returns an unsigned long, which is not suited -* for 64bit values. -*/ - u64 sz = 1ULL << fls64(size - 1); + u64 sz = roundup_pow_of_two64(size); int nbits = ilog2(sz); u32 addr0, addr1, desc0, desc1; diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c index d743b0a48988..ed50aaf27784 100644 --- a/drivers/pci/controller/pcie-rockchip-ep.c +++ b/drivers/pci/controller/pcie-rockchip-ep.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "pcie-rockchip.h" @@ -70,7 +71,7 @@ static void rockchip_pcie_prog_ep_ob_atu(struct rockchip_pcie *rockchip, u8 fn, u32 r, u32 type, u64 cpu_addr, u64 pci_addr, size_t size) { - u64 sz = 1ULL << fls64(size - 1); + u64 sz = roundup_pow_of_two64(size); int num_pass_bits = ilog2(sz); u32 addr0, addr1, desc0, desc1; bool is_nor_msg = (type == AXI_WRAPPER_NOR_MSG); @@ -172,11 +173,7 @@ static int rockchip_pcie_ep_set_bar(struct pci_epc *epc, u8 fn, /* BAR size is 2^(aperture + 7) */ sz = max_t(size_t, epf_bar->size, MIN_EP_APERTURE); - /* -* roundup_pow_of_two() returns an unsigned long, which is not suited -* for 64bit values. -*/ - sz = 1ULL << fls64(sz - 1); + sz = roundup_pow_of_two64(sz); aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */ if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) { diff --git a/include/linux/log2.h b/include/linux/log2.h index 83a4a3ca3e8a..db12d92ab6eb 100644 --- a/include/linux/log2.h +++ b/include/linux/log2.h @@ -67,6 +67,24 @@ unsigned long __rounddown_pow_of_two(unsigned long n) return 1UL << (fls_long(n) - 1); } +/** + * __roundup_pow_of_two64() - round 64bit value up to nearest power of two + * @n: value to round up + */ +static inline
Re: Bug 205201 - overflow of DMA mask and bus mask
On Mon, Nov 11, 2019 at 01:22:55PM +0100, Christian Zigotzky wrote: > Now, I can definitely say that this patch does not solve the issue. > > Do you have another patch for testing or shall I bisect? I'm still interested in the .config and dmesg. Especially if the board is using arch/powerpc/sysdev/fsl_pci.c, which is the only place inside the powerpc arch code doing funny things with the bus_dma_mask, which Robin pointed out looks fishy. > > Thanks, > Christian ---end quoted text--- ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 12 November 2019 13:22 > To: Shameerali Kolothum Thodi ; > eric.auger@gmail.com; iommu@lists.linux-foundation.org; > linux-ker...@vger.kernel.org; k...@vger.kernel.org; > kvm...@lists.cs.columbia.edu; j...@8bytes.org; > alex.william...@redhat.com; jacob.jun@linux.intel.com; > yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; > robin.mur...@arm.com > Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; > marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm > ; xuwei (O) > Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) > > Hi Shameer, > > On 11/12/19 2:06 PM, Shameerali Kolothum Thodi wrote: > > Hi Eric, > > > >> -Original Message- > >> From: Auger Eric [mailto:eric.au...@redhat.com] > >> Sent: 12 November 2019 11:29 > >> To: Shameerali Kolothum Thodi ; > >> eric.auger@gmail.com; iommu@lists.linux-foundation.org; > >> linux-ker...@vger.kernel.org; k...@vger.kernel.org; > >> kvm...@lists.cs.columbia.edu; j...@8bytes.org; > >> alex.william...@redhat.com; jacob.jun@linux.intel.com; > >> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; > >> robin.mur...@arm.com > >> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; > >> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm > >> ; xuwei (O) > >> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) > >> > >> Hi Shameer, > >> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote: > >>> Hi Eric, > >>> > -Original Message- > From: kvmarm-boun...@lists.cs.columbia.edu > [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger > Sent: 11 July 2019 14:56 > To: eric.auger@gmail.com; eric.au...@redhat.com; > iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org; > alex.william...@redhat.com; jacob.jun@linux.intel.com; > yi.l@intel.com; jean-philippe.bruc...@arm.com; > will.dea...@arm.com; > robin.mur...@arm.com > Cc: kevin.t...@intel.com; vincent.ste...@arm.com; > ashok@intel.com; > marc.zyng...@arm.com; tina.zh...@intel.com > Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) > > This series brings the VFIO part of HW nested paging support > in the SMMUv3. > > The series depends on: > [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part) > (https://www.spinics.net/lists/kernel/msg3187714.html) > > 3 new IOCTLs are introduced that allow the userspace to > 1) pass the guest stage 1 configuration > 2) pass stage 1 MSI bindings > 3) invalidate stage 1 related caches > > They map onto the related new IOMMU API functions. > > We introduce the capability to register specific interrupt > indexes (see [1]). A new DMA_FAULT interrupt index allows to register > an eventfd to be signaled whenever a stage 1 related fault > is detected at physical level. Also a specific region allows > to expose the fault records to the user space. > >>> > >>> I am trying to get this running on one of our platform that has smmuv3 > >>> dual > >>> stage support. I am seeing some issues with this when an ixgbe vf dev is > >>> made pass-through and is behind a vSMMUv3 in Guest. > >>> > >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 > >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 > >>> > >>> And this is my Qemu cmd line, > >>> > >>> ./qemu-system-aarch64 > >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \ > >>> -kernel Image \ > >>> -drive if=none,file=ubuntu,id=fs \ > >>> -device virtio-blk-device,drive=fs \ > >>> -device vfio-pci,host=:01:10.1 \ > >>> -bios QEMU_EFI.fd \ > >>> -net none \ > >>> -m 4G \ > >>> -nographic -D -d -enable-kvm \ > >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force" > >>> > >>> The basic ping from Guest works fine, > >>> root@ubuntu:~# ping 10.202.225.185 > >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. > >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms > >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms > >>> ... > >>> > >>> But if I increase ping packet size, > >>> > >>> root@ubuntu:~# ping -s 1024 10.202.225.185 > >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. > >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms > >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms > >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable > >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable > >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable > >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable > >>> > >>> And from Host kernel
Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
Hi Shameer, On 11/12/19 2:06 PM, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -Original Message- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: 12 November 2019 11:29 >> To: Shameerali Kolothum Thodi ; >> eric.auger@gmail.com; iommu@lists.linux-foundation.org; >> linux-ker...@vger.kernel.org; k...@vger.kernel.org; >> kvm...@lists.cs.columbia.edu; j...@8bytes.org; >> alex.william...@redhat.com; jacob.jun@linux.intel.com; >> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; >> robin.mur...@arm.com >> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; >> marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm >> ; xuwei (O) >> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) >> >> Hi Shameer, >> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote: >>> Hi Eric, >>> -Original Message- From: kvmarm-boun...@lists.cs.columbia.edu [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger Sent: 11 July 2019 14:56 To: eric.auger@gmail.com; eric.au...@redhat.com; iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org; alex.william...@redhat.com; jacob.jun@linux.intel.com; yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; robin.mur...@arm.com Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; marc.zyng...@arm.com; tina.zh...@intel.com Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) This series brings the VFIO part of HW nested paging support in the SMMUv3. The series depends on: [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part) (https://www.spinics.net/lists/kernel/msg3187714.html) 3 new IOCTLs are introduced that allow the userspace to 1) pass the guest stage 1 configuration 2) pass stage 1 MSI bindings 3) invalidate stage 1 related caches They map onto the related new IOMMU API functions. We introduce the capability to register specific interrupt indexes (see [1]). A new DMA_FAULT interrupt index allows to register an eventfd to be signaled whenever a stage 1 related fault is detected at physical level. Also a specific region allows to expose the fault records to the user space. >>> >>> I am trying to get this running on one of our platform that has smmuv3 dual >>> stage support. I am seeing some issues with this when an ixgbe vf dev is >>> made pass-through and is behind a vSMMUv3 in Guest. >>> >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 >>> >>> And this is my Qemu cmd line, >>> >>> ./qemu-system-aarch64 >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \ >>> -kernel Image \ >>> -drive if=none,file=ubuntu,id=fs \ >>> -device virtio-blk-device,drive=fs \ >>> -device vfio-pci,host=:01:10.1 \ >>> -bios QEMU_EFI.fd \ >>> -net none \ >>> -m 4G \ >>> -nographic -D -d -enable-kvm \ >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force" >>> >>> The basic ping from Guest works fine, >>> root@ubuntu:~# ping 10.202.225.185 >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms >>> ... >>> >>> But if I increase ping packet size, >>> >>> root@ubuntu:~# ping -s 1024 10.202.225.185 >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable >>> >>> And from Host kernel I get, >>> [ 819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected >>> [ 824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected >>> [ 828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected >>> [ 830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected >>> [ 832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected >>> [ 834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected >>> >>> Also noted that iperf cannot work as it fails to establish the connection >>> with >> iperf >>> server. >>> >>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your >> reference. >>> I haven't debugged this further yet and thought of checking with you if >>> this is >>> something you have seen already or not. Or maybe I am missing something >> here? >> >> Please can you try to edit and modify
RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 12 November 2019 11:29 > To: Shameerali Kolothum Thodi ; > eric.auger@gmail.com; iommu@lists.linux-foundation.org; > linux-ker...@vger.kernel.org; k...@vger.kernel.org; > kvm...@lists.cs.columbia.edu; j...@8bytes.org; > alex.william...@redhat.com; jacob.jun@linux.intel.com; > yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; > robin.mur...@arm.com > Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; > marc.zyng...@arm.com; tina.zh...@intel.com; Linuxarm > ; xuwei (O) > Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) > > Hi Shameer, > On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote: > > Hi Eric, > > > >> -Original Message- > >> From: kvmarm-boun...@lists.cs.columbia.edu > >> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger > >> Sent: 11 July 2019 14:56 > >> To: eric.auger@gmail.com; eric.au...@redhat.com; > >> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org; > >> alex.william...@redhat.com; jacob.jun@linux.intel.com; > >> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; > >> robin.mur...@arm.com > >> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; > >> marc.zyng...@arm.com; tina.zh...@intel.com > >> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) > >> > >> This series brings the VFIO part of HW nested paging support > >> in the SMMUv3. > >> > >> The series depends on: > >> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part) > >> (https://www.spinics.net/lists/kernel/msg3187714.html) > >> > >> 3 new IOCTLs are introduced that allow the userspace to > >> 1) pass the guest stage 1 configuration > >> 2) pass stage 1 MSI bindings > >> 3) invalidate stage 1 related caches > >> > >> They map onto the related new IOMMU API functions. > >> > >> We introduce the capability to register specific interrupt > >> indexes (see [1]). A new DMA_FAULT interrupt index allows to register > >> an eventfd to be signaled whenever a stage 1 related fault > >> is detected at physical level. Also a specific region allows > >> to expose the fault records to the user space. > > > > I am trying to get this running on one of our platform that has smmuv3 dual > > stage support. I am seeing some issues with this when an ixgbe vf dev is > > made pass-through and is behind a vSMMUv3 in Guest. > > > > Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 > > Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 > > > > And this is my Qemu cmd line, > > > > ./qemu-system-aarch64 > > -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \ > > -kernel Image \ > > -drive if=none,file=ubuntu,id=fs \ > > -device virtio-blk-device,drive=fs \ > > -device vfio-pci,host=:01:10.1 \ > > -bios QEMU_EFI.fd \ > > -net none \ > > -m 4G \ > > -nographic -D -d -enable-kvm \ > > -append "console=ttyAMA0 root=/dev/vda rw acpi=force" > > > > The basic ping from Guest works fine, > > root@ubuntu:~# ping 10.202.225.185 > > PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. > > 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms > > 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms > > ... > > > > But if I increase ping packet size, > > > > root@ubuntu:~# ping -s 1024 10.202.225.185 > > PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. > > 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms > > 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms > > From 10.202.225.169 icmp_seq=66 Destination Host Unreachable > > From 10.202.225.169 icmp_seq=67 Destination Host Unreachable > > From 10.202.225.169 icmp_seq=68 Destination Host Unreachable > > From 10.202.225.169 icmp_seq=69 Destination Host Unreachable > > > > And from Host kernel I get, > > [ 819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected > > [ 824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected > > [ 828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected > > [ 830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected > > [ 832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected > > [ 834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected > > > > Also noted that iperf cannot work as it fails to establish the connection > > with > iperf > > server. > > > > Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your > reference. > > I haven't debugged this further yet and thought of checking with you if > > this is > > something you have seen already or not. Or maybe I am missing something > here? > > Please can you try to edit and modify hw/vfio/common.c, function > vfio_iommu_unmap_notify > > > /* > if (size <= 0x1) { >
Re: [PATCH] iommu/rockchip: Don't provoke WARN for harmless IRQs
On 2019-11-11 20:04, Robin Murphy wrote: Although we don't generally expect IRQs to fire for a suspended IOMMU, there are certain situations (particularly with debug options) where we might legitimately end up with the pm_runtime_get_if_in_use() call from rk_iommu_irq() returning 0. Since this doesn't represent an actual error, follow the other parts of the driver and save the WARN_ON() condition for a genuine negative value. Even if we do have spurious IRQs due to a wedged VOP asserting the shared line, it's not this driver's job to try to second-guess the IRQ core to warn about that. Reported-by: Vasily Khoruzhick Signed-off-by: Robin Murphy --- drivers/iommu/rockchip-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 4dcbf68dfda4..bd7e9b1e40ac 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -527,7 +527,7 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id) int i, err; err = pm_runtime_get_if_in_use(iommu->dev); - if (WARN_ON_ONCE(err <= 0)) + if (!err || WARN_ON_ONCE(err < 0)) return ret; if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks))) Acked-by: Marc Zyngier M. -- Jazz is not dead. It just smells funny... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
Hi Shameer, On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -Original Message- >> From: kvmarm-boun...@lists.cs.columbia.edu >> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger >> Sent: 11 July 2019 14:56 >> To: eric.auger@gmail.com; eric.au...@redhat.com; >> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org; >> alex.william...@redhat.com; jacob.jun@linux.intel.com; >> yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; >> robin.mur...@arm.com >> Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; >> marc.zyng...@arm.com; tina.zh...@intel.com >> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) >> >> This series brings the VFIO part of HW nested paging support >> in the SMMUv3. >> >> The series depends on: >> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part) >> (https://www.spinics.net/lists/kernel/msg3187714.html) >> >> 3 new IOCTLs are introduced that allow the userspace to >> 1) pass the guest stage 1 configuration >> 2) pass stage 1 MSI bindings >> 3) invalidate stage 1 related caches >> >> They map onto the related new IOMMU API functions. >> >> We introduce the capability to register specific interrupt >> indexes (see [1]). A new DMA_FAULT interrupt index allows to register >> an eventfd to be signaled whenever a stage 1 related fault >> is detected at physical level. Also a specific region allows >> to expose the fault records to the user space. > > I am trying to get this running on one of our platform that has smmuv3 dual > stage support. I am seeing some issues with this when an ixgbe vf dev is > made pass-through and is behind a vSMMUv3 in Guest. > > Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 > Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 > > And this is my Qemu cmd line, > > ./qemu-system-aarch64 > -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \ > -kernel Image \ > -drive if=none,file=ubuntu,id=fs \ > -device virtio-blk-device,drive=fs \ > -device vfio-pci,host=:01:10.1 \ > -bios QEMU_EFI.fd \ > -net none \ > -m 4G \ > -nographic -D -d -enable-kvm \ > -append "console=ttyAMA0 root=/dev/vda rw acpi=force" > > The basic ping from Guest works fine, > root@ubuntu:~# ping 10.202.225.185 > PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. > 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms > 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms > ... > > But if I increase ping packet size, > > root@ubuntu:~# ping -s 1024 10.202.225.185 > PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. > 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms > 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms > From 10.202.225.169 icmp_seq=66 Destination Host Unreachable > From 10.202.225.169 icmp_seq=67 Destination Host Unreachable > From 10.202.225.169 icmp_seq=68 Destination Host Unreachable > From 10.202.225.169 icmp_seq=69 Destination Host Unreachable > > And from Host kernel I get, > [ 819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected > [ 824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected > [ 828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected > [ 830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected > [ 832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected > [ 834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected > > Also noted that iperf cannot work as it fails to establish the connection > with iperf > server. > > Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your > reference. > I haven't debugged this further yet and thought of checking with you if this > is > something you have seen already or not. Or maybe I am missing something here? Please can you try to edit and modify hw/vfio/common.c, function vfio_iommu_unmap_notify /* if (size <= 0x1) { ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; ustruct.info.granularity = IOMMU_INV_GRANU_ADDR; ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID; if (iotlb->leaf) { ustruct.info.addr_info.flags |= IOMMU_INV_ADDR_FLAGS_LEAF; } ustruct.info.addr_info.archid = iotlb->arch_id; ustruct.info.addr_info.addr = start; ustruct.info.addr_info.granule_size = size; ustruct.info.addr_info.nb_granules = 1; trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1, iotlb->leaf); } else { */ ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; ustruct.info.granularity = IOMMU_INV_GRANU_PASID; ustruct.info.pasid_info.archid = iotlb->arch_id; ustruct.info.pasid_info.flags = IOMMU_INV_PASID_FLAGS_ARCHID;
RE: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to host
> From: Alex Williamson < alex.william...@redhat.com > > Sent: Friday, November 8, 2019 7:21 AM > To: Liu, Yi L > Subject: Re: [RFC v2 3/3] vfio/type1: bind guest pasid (guest page tables) to > host > > On Thu, 24 Oct 2019 08:26:23 -0400 > Liu Yi L wrote: > > > This patch adds vfio support to bind guest translation structure > > to host iommu. VFIO exposes iommu programming capability to user- > > space. Guest is a user-space application in host under KVM solution. > > For SVA usage in Virtual Machine, guest owns GVA->GPA translation > > structure. And this part should be passdown to host to enable nested > > translation (or say two stage translation). This patch reuses the > > VFIO_IOMMU_BIND proposal from Jean-Philippe Brucker, and adds new > > bind type for binding guest owned translation structure to host. > > > > *) Add two new ioctls for VFIO containers. > > > > - VFIO_IOMMU_BIND: for bind request from userspace, it could be > >bind a process to a pasid or bind a guest pasid > >to a device, this is indicated by type > > - VFIO_IOMMU_UNBIND: for unbind request from userspace, it could be > >unbind a process to a pasid or unbind a guest pasid > >to a device, also indicated by type > > - Bind type: > > VFIO_IOMMU_BIND_PROCESS: user-space request to bind a process > >to a device > > VFIO_IOMMU_BIND_GUEST_PASID: bind guest owned translation > >structure to host iommu. e.g. guest page table > > > > *) Code logic in vfio_iommu_type1_ioctl() to handle VFIO_IOMMU_BIND/UNBIND > > > > Cc: Kevin Tian > > Signed-off-by: Jean-Philippe Brucker > > Signed-off-by: Liu Yi L > > Signed-off-by: Jacob Pan > > --- > > drivers/vfio/vfio_iommu_type1.c | 136 > > > include/uapi/linux/vfio.h | 44 + > > 2 files changed, 180 insertions(+) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c > > index 3d73a7d..1a27e25 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -2325,6 +2325,104 @@ static int vfio_iommu_type1_pasid_free(struct > vfio_iommu *iommu, > > return ret; > > } > > > > +static int vfio_bind_gpasid_fn(struct device *dev, void *data) > > +{ > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > + struct iommu_gpasid_bind_data *ustruct = > > + (struct iommu_gpasid_bind_data *) dc->data; > > + > > + return iommu_sva_bind_gpasid(dc->domain, dev, ustruct); > > +} > > + > > +static int vfio_unbind_gpasid_fn(struct device *dev, void *data) > > +{ > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > + struct iommu_gpasid_bind_data *ustruct = > > + (struct iommu_gpasid_bind_data *) dc->data; > > + > > + return iommu_sva_unbind_gpasid(dc->domain, dev, > > + ustruct->hpasid); > > +} > > + > > +/* > > + * unbind specific gpasid, caller of this function requires hold > > + * vfio_iommu->lock > > + */ > > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu, > > + struct iommu_gpasid_bind_data *gbind_data) > > +{ > > + return vfio_iommu_lookup_dev(iommu, vfio_unbind_gpasid_fn, gbind_data); > > +} > > + > > +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu, > > + void __user *arg, > > + struct vfio_iommu_type1_bind *bind) > > +{ > > + struct iommu_gpasid_bind_data gbind_data; > > + unsigned long minsz; > > + int ret = 0; > > + > > + minsz = sizeof(*bind) + sizeof(gbind_data); > > + if (bind->argsz < minsz) > > + return -EINVAL; > > + > > + if (copy_from_user(_data, arg, sizeof(gbind_data))) > > + return -EFAULT; > > + > > + mutex_lock(>lock); > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > > + ret = -EINVAL; > > + goto out_unlock; > > + } > > + > > + ret = vfio_iommu_lookup_dev(iommu, vfio_bind_gpasid_fn, _data); > > + /* > > +* If bind failed, it may not be a total failure. Some devices within > > +* the iommu group may have bind successfully. Although we don't enable > > +* pasid capability for non-singletion iommu groups, a unbind operation > > +* would be helpful to ensure no partial binding for an iommu group. > > +*/ > > + if (ret) > > + /* > > +* Undo all binds that already succeeded, no need to check the > > +* return value here since some device within the group has no > > +* successful bind when coming to this place switch. > > +*/ > > + vfio_iommu_type1_do_guest_unbind(iommu, _data); > > + > > +out_unlock: > > + mutex_unlock(>lock); > > + return ret; > > +} > > + > > +static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu, > > +
RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
Hi Eric, > -Original Message- > From: kvmarm-boun...@lists.cs.columbia.edu > [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Eric Auger > Sent: 11 July 2019 14:56 > To: eric.auger@gmail.com; eric.au...@redhat.com; > iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org; > alex.william...@redhat.com; jacob.jun@linux.intel.com; > yi.l@intel.com; jean-philippe.bruc...@arm.com; will.dea...@arm.com; > robin.mur...@arm.com > Cc: kevin.t...@intel.com; vincent.ste...@arm.com; ashok@intel.com; > marc.zyng...@arm.com; tina.zh...@intel.com > Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) > > This series brings the VFIO part of HW nested paging support > in the SMMUv3. > > The series depends on: > [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part) > (https://www.spinics.net/lists/kernel/msg3187714.html) > > 3 new IOCTLs are introduced that allow the userspace to > 1) pass the guest stage 1 configuration > 2) pass stage 1 MSI bindings > 3) invalidate stage 1 related caches > > They map onto the related new IOMMU API functions. > > We introduce the capability to register specific interrupt > indexes (see [1]). A new DMA_FAULT interrupt index allows to register > an eventfd to be signaled whenever a stage 1 related fault > is detected at physical level. Also a specific region allows > to expose the fault records to the user space. I am trying to get this running on one of our platform that has smmuv3 dual stage support. I am seeing some issues with this when an ixgbe vf dev is made pass-through and is behind a vSMMUv3 in Guest. Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 And this is my Qemu cmd line, ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \ -kernel Image \ -drive if=none,file=ubuntu,id=fs \ -device virtio-blk-device,drive=fs \ -device vfio-pci,host=:01:10.1 \ -bios QEMU_EFI.fd \ -net none \ -m 4G \ -nographic -D -d -enable-kvm \ -append "console=ttyAMA0 root=/dev/vda rw acpi=force" The basic ping from Guest works fine, root@ubuntu:~# ping 10.202.225.185 PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms ... But if I increase ping packet size, root@ubuntu:~# ping -s 1024 10.202.225.185 PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms >From 10.202.225.169 icmp_seq=66 Destination Host Unreachable >From 10.202.225.169 icmp_seq=67 Destination Host Unreachable >From 10.202.225.169 icmp_seq=68 Destination Host Unreachable >From 10.202.225.169 icmp_seq=69 Destination Host Unreachable And from Host kernel I get, [ 819.970742] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected [ 824.002707] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected [ 828.034683] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected [ 830.050673] ixgbe :01:00.1 enp1s0f1: 4 Spoofed packets detected [ 832.066659] ixgbe :01:00.1 enp1s0f1: 1 Spoofed packets detected [ 834.082640] ixgbe :01:00.1 enp1s0f1: 3 Spoofed packets detected Also noted that iperf cannot work as it fails to establish the connection with iperf server. Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your reference. I haven't debugged this further yet and thought of checking with you if this is something you have seen already or not. Or maybe I am missing something here? Please let me know. Thanks, Shameer > Best Regards > > Eric > > This series can be found at: > https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 > > It series includes Tina's patch steming from > [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device > specific irq" plus patches originally contributed by Yi. > > History: > > v8 -> v9: > - introduce specific irq framework > - single fault region > - iommu_unregister_device_fault_handler failure case not handled > yet. > > v7 -> v8: > - rebase on top of v5.2-rc1 and especially > 8be39a1a04c1 iommu/arm-smmu-v3: Add a master->domain pointer > - dynamic alloc of s1_cfg/s2_cfg > - __arm_smmu_tlb_inv_asid/s1_range_nosync > - check there is no HW MSI regions > - asid invalidation using pasid extended struct (change in the uapi) > - add s1_live/s2_live checks > - move check about support of nested stages in domain finalise > - fixes in error reporting according to the discussion with Robin > - reordered the patches to have first iommu/smmuv3 patches and then > VFIO patches > > v6 -> v7: > - removed device handle from bind/unbind_guest_msi > - added "iommu/smmuv3: Nested mode single MSI doorbell per domain > enforcement" >
Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function
Hi Jacob, On 10/24/19 9:55 PM, Jacob Pan wrote: > When Shared Virtual Address (SVA) is enabled for a guest OS via > vIOMMU, we need to provide invalidation support at IOMMU API and driver > level. This patch adds Intel VT-d specific function to implement > iommu passdown invalidate API for shared virtual address. > > The use case is for supporting caching structure invalidation > of assigned SVM capable devices. Emulated IOMMU exposes queue > invalidation capability and passes down all descriptors from the guest > to the physical IOMMU. > > The assumption is that guest to host device ID mapping should be > resolved prior to calling IOMMU driver. Based on the device handle, > host IOMMU driver can replace certain fields before submit to the > invalidation queue. > > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj > Signed-off-by: Liu, Yi L > --- > drivers/iommu/intel-iommu.c | 170 > > 1 file changed, 170 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 5fab32fbc4b4..a73e76d6457a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5491,6 +5491,175 @@ static void intel_iommu_aux_detach_device(struct > iommu_domain *domain, > aux_domain_remove_dev(to_dmar_domain(domain), dev); > } > > +/* > + * 2D array for converting and sanitizing IOMMU generic TLB granularity to > + * VT-d granularity. Invalidation is typically included in the unmap > operation > + * as a result of DMA or VFIO unmap. However, for assigned device where guest > + * could own the first level page tables without being shadowed by QEMU. In above sentence needs to be rephrased. > + * this case there is no pass down unmap to the host IOMMU as a result of > unmap > + * in the guest. Only invalidations are trapped and passed down. > + * In all cases, only first level TLB invalidation (request with PASID) can > be > + * passed down, therefore we do not include IOTLB granularity for request > + * without PASID (second level). > + * > + * For an example, to find the VT-d granularity encoding for IOTLB for example > + * type and page selective granularity within PASID: > + * X: indexed by iommu cache type > + * Y: indexed by enum iommu_inv_granularity > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] > + * > + * Granu_map array indicates validity of the table. 1: valid, 0: invalid > + * > + */ > +const static int > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = { > + /* PASID based IOTLB, support PASID selective and page selective */ I would rather use the generic terminology, ie. IOTLB invalidation supports PASID and ADDR granularity > + {0, 1, 1},> + /* PASID based dev TLBs, only support all PASIDs or > single PASID */ Device IOLTB invalidation supports DOMAIN and PASID granularities > + {1, 1, 0}, > + /* PASID cache */ PASID cache invalidation support DOMAIN and PASID granularity > + {1, 1, 0} > +}; > + > +const static u64 > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = { > + /* PASID based IOTLB */ > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > + /* PASID based dev TLBs */ > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, > + /* PASID cache */ > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, > +}; > + > +static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu) nit: this looks a bit weird to me to manipulate an u64 here. Why not use a int > +{ > + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR || > + !inv_type_granu_map[type][granu]) > + return -EINVAL; > + > + *vtd_granu = inv_type_granu_table[type][granu];> + > + return 0; > +} > + > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) > +{ > + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT; > + > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. > + * IOMMU cache invalidate API passes granu_size in bytes, and number of > + * granu size in contiguous memory. > + */ > + return order_base_2(nr_pages); > +} > + > +#ifdef CONFIG_INTEL_IOMMU_SVM > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, > + struct device *dev, struct iommu_cache_invalidate_info > *inv_info) > +{ > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > + struct device_domain_info *info; > + struct intel_iommu *iommu; > + unsigned long flags; > + int cache_type; > + u8 bus, devfn; > + u16 did, sid; > + int ret = 0; > + u64 size; > + > + if (!inv_info || !dmar_domain || > + inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) > + return -EINVAL; > + > + if (!dev || !dev_is_pci(dev)) > + return -ENODEV; > + > + iommu = device_to_iommu(dev, , ); > + if (!iommu) > + return -ENODEV; > + >
Re: [PATCH v2 0/8] iommu: Add PASID support to Arm SMMUv3
On 2019/11/8 下午11:25, Jean-Philippe Brucker wrote: This is version 2 of the series I sent a while ago [1], adding PASID support to the Arm SMMUv3 driver. Changes since v1: * Dropped the patch adding auxiliary domain support. It's an easy way to test PASID, by populating PASID contexts using iommu_map/unmap(), but I don't know if it will ever have real users. Since v1 I changed my testing gear, and am using the zip accelerator [2] instead of a software model. It only uses SVA and testing auxiliary domains would require additional changes that would never go upstream. SVA requires another 20 patches (including I/O page faults) that I will send later, but at least I know that this will get used. * ioasid patch has been carried by Jacob and should be merged for v5.5 [3] * Split patch "Add support for Substream IDs" into patches 4 and 5. * Added IORT support (patch 3) and addressed other comments. [1] https://lore.kernel.org/linux-iommu/20190610184714.6786-1-jean-philippe.bruc...@arm.com/ [2] https://lore.kernel.org/linux-iommu/1572331216-9503-1-git-send-email-zhangfei@linaro.org/ [3] https://lore.kernel.org/linux-iommu/1570045363-24856-1-git-send-email-jacob.jun@linux.intel.com/ Jean-Philippe Brucker (8): dt-bindings: document PASID property for IOMMU masters iommu/arm-smmu-v3: Support platform SSID ACPI/IORT: Support PASID for platform devices iommu/arm-smmu-v3: Prepare for SSID support iommu/arm-smmu-v3: Add support for Substream IDs iommu/arm-smmu-v3: Add second level of context descriptor table iommu/arm-smmu-v3: Improve add_device() error handling iommu/arm-smmu-v3: Add support for PCI PASID Thanks Jean for the patch The series tested well on Hisilicon platform KunPeng920 Tested-by: Zhangfei Gao ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
Hi Jacob, On 11/8/19 11:55 PM, Jacob Pan wrote: > On Fri, 8 Nov 2019 12:30:31 +0100 > Auger Eric wrote: > >> Hi Jacob, >> >> On 10/24/19 9:54 PM, Jacob Pan wrote: >>> Make use of generic IOASID code to manage PASID allocation, >>> free, and lookup. Replace Intel specific code. >>> >>> Signed-off-by: Jacob Pan >>> --- >>> drivers/iommu/intel-iommu.c | 12 ++-- >>> drivers/iommu/intel-pasid.c | 36 >>> drivers/iommu/intel-svm.c | >>> 39 +++ 3 files changed, 29 >>> insertions(+), 58 deletions(-) >>> >>> diff --git a/drivers/iommu/intel-iommu.c >>> b/drivers/iommu/intel-iommu.c index ced1d89ef977..2ea09b988a23 >>> 100644 --- a/drivers/iommu/intel-iommu.c >>> +++ b/drivers/iommu/intel-iommu.c >>> @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct >>> dmar_domain *domain, domain->auxd_refcnt--; >>> >>> if (!domain->auxd_refcnt && domain->default_pasid > 0) >>> - intel_pasid_free_id(domain->default_pasid); >>> + ioasid_free(domain->default_pasid); >>> } >>> >>> static int aux_domain_add_dev(struct dmar_domain *domain, >>> @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct >>> dmar_domain *domain, if (domain->default_pasid <= 0) { >>> int pasid; >>> >>> - pasid = intel_pasid_alloc_id(domain, PASID_MIN, >>> - >>> pci_max_pasids(to_pci_dev(dev)), >>> -GFP_KERNEL); >>> - if (pasid <= 0) { >>> + /* No private data needed for the default pasid */ >>> + pasid = ioasid_alloc(NULL, PASID_MIN, >>> pci_max_pasids(to_pci_dev(dev)) - 1, >>> + NULL); >>> + if (pasid == INVALID_IOASID) { >>> pr_err("Can't allocate default pasid\n"); >>> return -ENODEV; >>> } >>> @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct >>> dmar_domain *domain, spin_unlock(>lock); >>> spin_unlock_irqrestore(_domain_lock, flags); >>> if (!domain->auxd_refcnt && domain->default_pasid > 0) >>> - intel_pasid_free_id(domain->default_pasid); >>> + ioasid_free(domain->default_pasid); >>> >>> return ret; >>> } >>> diff --git a/drivers/iommu/intel-pasid.c >>> b/drivers/iommu/intel-pasid.c index d81e857d2b25..e79d680fe300 >>> 100644 --- a/drivers/iommu/intel-pasid.c >>> +++ b/drivers/iommu/intel-pasid.c >>> @@ -26,42 +26,6 @@ >>> */ >>> static DEFINE_SPINLOCK(pasid_lock); >>> u32 intel_pasid_max_id = PASID_MAX; >>> -static DEFINE_IDR(pasid_idr); >>> - >>> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp) >>> -{ >>> - int ret, min, max; >>> - >>> - min = max_t(int, start, PASID_MIN); >>> - max = min_t(int, end, intel_pasid_max_id); >>> - >>> - WARN_ON(in_interrupt()); >>> - idr_preload(gfp); >>> - spin_lock(_lock); >>> - ret = idr_alloc(_idr, ptr, min, max, GFP_ATOMIC); >>> - spin_unlock(_lock); >>> - idr_preload_end(); >>> - >>> - return ret; >>> -} >>> - >>> -void intel_pasid_free_id(int pasid) >>> -{ >>> - spin_lock(_lock); >>> - idr_remove(_idr, pasid); >>> - spin_unlock(_lock); >>> -} >>> - >>> -void *intel_pasid_lookup_id(int pasid) >>> -{ >>> - void *p; >>> - >>> - spin_lock(_lock); >>> - p = idr_find(_idr, pasid); >>> - spin_unlock(_lock); >>> - >>> - return p; >>> -} >>> >>> int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int >>> *pasid) { >>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c >>> index 9b159132405d..a9a7f85a09bc 100644 >>> --- a/drivers/iommu/intel-svm.c >>> +++ b/drivers/iommu/intel-svm.c >>> @@ -17,6 +17,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> >>> #include "intel-pasid.h" >>> @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int >>> *pasid, int flags, struct svm_dev_ if (pasid_max > >>> intel_pasid_max_id) pasid_max = intel_pasid_max_id; >>> >>> - /* Do not use PASID 0 in caching mode (virtualised >>> IOMMU) */ >>> - ret = intel_pasid_alloc_id(svm, >>> - !!cap_caching_mode(iommu->cap), >>> - pasid_max - 1, >>> GFP_KERNEL); >>> - if (ret < 0) { >>> + /* Do not use PASID 0, reserved for RID to PASID */ >>> + svm->pasid = ioasid_alloc(NULL, PASID_MIN, >>> + pasid_max - 1, svm); >> pasid_max -1 is inclusive. whereas max param in intel_pasid_alloc_id() >> is exclusive right? If you fixed an issue, you can mention it in the >> commit message. > yes, i should mention that. intel_pasid_alloc_id() uses IDR which is > end exclusive. ioasid uses xarray, which is inclusive. >>> + if (svm->pasid == INVALID_IOASID) { >>> kfree(svm);> >>> kfree(sdev); >>> + ret = ENOSPC; >> -ENOSPC. >> Nit: in 2/11 vcmd_alloc_pasid returned -ENOMEM
Re: [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup
Hi Jacob, On 10/24/19 9:54 PM, Jacob Pan wrote: > After each setup for PASID entry, related translation caches must be flushed. > We can combine duplicated code into one function which is less error prone. > > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-pasid.c | 48 > + > 1 file changed, 18 insertions(+), 30 deletions(-) > > diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c > index e79d680fe300..ffbd416ed3b8 100644 > --- a/drivers/iommu/intel-pasid.c > +++ b/drivers/iommu/intel-pasid.c > @@ -485,6 +485,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu > *iommu, > devtlb_invalidation_with_pasid(iommu, dev, pasid); > } > > +static void pasid_flush_caches(struct intel_iommu *iommu, > + struct pasid_entry *pte, > + int pasid, u16 did) > +{ > + if (!ecap_coherent(iommu->ecap)) > + clflush_cache_range(pte, sizeof(*pte)); > + > + if (cap_caching_mode(iommu->cap)) { > + pasid_cache_invalidation_with_pasid(iommu, did, pasid); > + iotlb_invalidation_with_pasid(iommu, did, pasid); > + } else { > + iommu_flush_write_buffer(iommu); > + } > +} > + > /* > * Set up the scalable mode pasid table entry for first only > * translation type. > @@ -530,16 +545,7 @@ int intel_pasid_setup_first_level(struct intel_iommu > *iommu, > /* Setup Present and PASID Granular Transfer Type: */ > pasid_set_translation_type(pte, 1); > pasid_set_present(pte); > - > - if (!ecap_coherent(iommu->ecap)) > - clflush_cache_range(pte, sizeof(*pte)); > - > - if (cap_caching_mode(iommu->cap)) { > - pasid_cache_invalidation_with_pasid(iommu, did, pasid); > - iotlb_invalidation_with_pasid(iommu, did, pasid); > - } else { > - iommu_flush_write_buffer(iommu); > - } > + pasid_flush_caches(iommu, pte, pasid, did); > > return 0; > } > @@ -603,16 +609,7 @@ int intel_pasid_setup_second_level(struct intel_iommu > *iommu, >*/ > pasid_set_sre(pte); > pasid_set_present(pte); > - > - if (!ecap_coherent(iommu->ecap)) > - clflush_cache_range(pte, sizeof(*pte)); > - > - if (cap_caching_mode(iommu->cap)) { > - pasid_cache_invalidation_with_pasid(iommu, did, pasid); > - iotlb_invalidation_with_pasid(iommu, did, pasid); > - } else { > - iommu_flush_write_buffer(iommu); > - } > + pasid_flush_caches(iommu, pte, pasid, did); > > return 0; > } > @@ -646,16 +643,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu > *iommu, >*/ > pasid_set_sre(pte); > pasid_set_present(pte); > - > - if (!ecap_coherent(iommu->ecap)) > - clflush_cache_range(pte, sizeof(*pte)); > - > - if (cap_caching_mode(iommu->cap)) { > - pasid_cache_invalidation_with_pasid(iommu, did, pasid); > - iotlb_invalidation_with_pasid(iommu, did, pasid); > - } else { > - iommu_flush_write_buffer(iommu); > - } > + pasid_flush_caches(iommu, pte, pasid, did); > > return 0; > } > Reviewed-by: Eric Auger Thanks Eric ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
Hi Joerg, > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel wrote: > > Hi Adrian, > > On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote: > > 2) When set_device_exclusion_range() parses the IVMD of devce id > > '4200', the exclusion range of the amd_iommu struct becomes: > > > > iommu->exclusion_start = 0x9F58D000; > > iommu->exclusion_length = 0x804; > > > > 3) When parsing the second IVMD (device id '4300') in > > set_device_exclusion_range(), the exclusion range of the > > amd_iommu struct becomes: > > > > iommu->exclusion_start = 0x9754D000; > > iommu->exclusion_length = 0x804; > > > > This overwrites the first IVMD configuration, which leads to > > the failure of the first RAID controller. > > Okay, I think this is clearly a BIOS bug as the BIOS should not define > two different exclusion ranges in the IVRS table. Thanks for the review. Yes. I understand this is a BIOS bug. The purpose of this patch is to prevent the failure event though the system boots with the buggy BIOS. > So there are a couple of options here: > > 1) Bail out and disable the IOMMU as the BIOS screwed up > > 2) Treat per-device exclusion ranges just as r/w unity-mapped >regions. > > > I think option 2) is the best fix here. Yes. This is what this patch does (The first exclusion region still uses the exclusion range while the remaining exclusion regions are modified to be r/w unity-mapped regions when detecting multiple exclusion regions) . But, I'm guessing you're talking about that BIOS has to define r/w unity-mapped regions instead of the per-device exclusions (Fix from BIOS, not prevent the failure from kernel). Right? -- Adrian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu