Re: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by endpoint
On Mon, Mar 23, 2020 at 06:04:37PM +, Bharat Bhushan wrote: > > > -static int viommu_domain_finalise(struct viommu_dev *viommu, > > > +static int viommu_domain_finalise(struct viommu_endpoint *vdev, > > > struct iommu_domain *domain) > > > { > > > int ret; > > > struct viommu_domain *vdomain = to_viommu_domain(domain); > > > + struct viommu_dev *viommu = vdev->viommu; > > > > > > vdomain->viommu = viommu; > > > vdomain->map_flags = viommu->map_flags; > > > > > > - domain->pgsize_bitmap = viommu->pgsize_bitmap; > > > + if (vdev->pgsize_bitmap) > > > + domain->pgsize_bitmap = vdev->pgsize_bitmap; > > > + else > > > + domain->pgsize_bitmap = viommu->pgsize_bitmap; > > > + > > > > nit: it could be nicer to initialize vdev->pgsize_bitmap in add_device(), > > To what size we should initialize in add_device, PAGE_SIZE? No to viommu->pgsize_bitmap Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
Alexey Kardashevskiy writes: > On 24/03/2020 04:22, Christoph Hellwig wrote: >> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote: >>> >>> This is what I was trying, but considering I am new to DMA subsystem, I >>> am not sure I got all the details correct. The idea is to look at the >>> cpu addr and see if that can be used in direct map fashion(is >>> bus_dma_limit the right restriction here?) if not fallback to dynamic >>> IOMMU mapping. >> >> I don't think we can throw all these complications into the dma >> mapping code. At some point I also wonder what the point is, >> especially for scatterlist mappings, where the iommu can coalesce. > > This is for persistent memory which you can DMA to/from but yet it does > not appear in the system as a normal memory and therefore requires > special handling anyway (O_DIRECT or DAX, I do not know the exact > mechanics). All other devices in the system should just run as usual, > i.e. use 1:1 mapping if possible. This is O_DIRECT with a user buffer that is actually mmap from a dax mounted file system. What we really need is something that will falback to iommu_map_page based on dma_addr. ie. Something equivalent to current dma_direct_map_page(), but instead of fallback to swiotlb_map page we should fallback to iommu_map_page(). Something like? dma_addr_t dma_direct_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, unsigned long attrs) { phys_addr_t phys = page_to_phys(page) + offset; dma_addr_t dma_addr = phys_to_dma(dev, phys); if (unlikely(!dma_capable(dev, dma_addr, size, true))) { return iommu_map(dev, phys, size, dir, attrs); return DMA_MAPPING_ERROR; } ... -aneesh ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On 24/03/2020 14:37, Alexey Kardashevskiy wrote: > > > On 24/03/2020 04:20, Christoph Hellwig wrote: >> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote: > 0x100.. .. 0x101.. > > 2x4G, each is 1TB aligned. And we can map directly only the first 4GB > (because of the maximum IOMMU table size) but not the other. And 1:1 on > that "pseries" is done with offset=0x0800.... > > So we want to check every bus address against dev->bus_dma_limit, not > dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to > 0x0800.0001.. and 1:1 mapping for the second 4GB would not be > tried. Does this sound reasonable? Thanks, bus_dma_limit is just another limiting factor applied on top of coherent_dma_mask or dma_mask respectively. >>> >>> This is not enough for the task: in my example, I'd set bus limit to >>> 0x0800.0001.. but this would disable bypass for all RAM >>> addresses - the first and the second 4GB blocks. >> >> So what about something like the version here: >> >> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3 > > > dma_alloc_direct() and dma_map_direct() do the same thing now which is > good, did I miss anything else? > > This lets us disable bypass automatically if this weird memory appears > in the system but does not let us have 1:1 after that even for normal > RAM. Thanks, Ah no, does not help much, simple setting dma_ops_bypass will though. But eventually, in this function: static inline bool dma_map_direct(struct device *dev, const struct dma_map_ops *ops) { if (likely(!ops)) return true; if (!dev->dma_ops_bypass) return false; return min_not_zero(*dev->dma_mask, dev->bus_dma_limit) >= dma_direct_get_required_mask(dev); } we rather want it to take a dma handle and a size, and add if (dev->bus_dma_limit) return dev->bus_dma_limit > dma_handle + size; where dma_handle=phys_to_dma(dev, phys) (I am not doing it here as unmap needs the same test and it does not receive phys as a parameter). -- Alexey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On 24/03/2020 04:20, Christoph Hellwig wrote: > On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote: 0x100.. .. 0x101.. 2x4G, each is 1TB aligned. And we can map directly only the first 4GB (because of the maximum IOMMU table size) but not the other. And 1:1 on that "pseries" is done with offset=0x0800.... So we want to check every bus address against dev->bus_dma_limit, not dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to 0x0800.0001.. and 1:1 mapping for the second 4GB would not be tried. Does this sound reasonable? Thanks, >>> >>> bus_dma_limit is just another limiting factor applied on top of >>> coherent_dma_mask or dma_mask respectively. >> >> This is not enough for the task: in my example, I'd set bus limit to >> 0x0800.0001.. but this would disable bypass for all RAM >> addresses - the first and the second 4GB blocks. > > So what about something like the version here: > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3 dma_alloc_direct() and dma_map_direct() do the same thing now which is good, did I miss anything else? This lets us disable bypass automatically if this weird memory appears in the system but does not let us have 1:1 after that even for normal RAM. Thanks, -- Alexey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On 24/03/2020 04:22, Christoph Hellwig wrote: > On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote: >> >> This is what I was trying, but considering I am new to DMA subsystem, I >> am not sure I got all the details correct. The idea is to look at the >> cpu addr and see if that can be used in direct map fashion(is >> bus_dma_limit the right restriction here?) if not fallback to dynamic >> IOMMU mapping. > > I don't think we can throw all these complications into the dma > mapping code. At some point I also wonder what the point is, > especially for scatterlist mappings, where the iommu can coalesce. This is for persistent memory which you can DMA to/from but yet it does not appear in the system as a normal memory and therefore requires special handling anyway (O_DIRECT or DAX, I do not know the exact mechanics). All other devices in the system should just run as usual, i.e. use 1:1 mapping if possible. -- Alexey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V10 02/11] iommu/uapi: Define a mask for bind data
On 2020/3/24 3:37, Jacob Pan wrote: On Sun, 22 Mar 2020 09:29:32 +0800> Lu Baolu wrote: On 2020/3/21 7:27, Jacob Pan wrote: Memory type related flags can be grouped together for one simple check. --- v9 renamed from EMT to MTS since these are memory type support flags. --- Signed-off-by: Jacob Pan --- include/uapi/linux/iommu.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 4ad3496e5c43..d7bcbc5f79b0 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd { __u32 pat; __u32 emt; }; - +#define IOMMU_SVA_VTD_GPASID_MTS_MASK (IOMMU_SVA_VTD_GPASID_CD | \ +IOMMU_SVA_VTD_GPASID_EMTE | \ +IOMMU_SVA_VTD_GPASID_PCD | \ + IOMMU_SVA_VTD_GPASID_PWT) As name implies, can this move to intel-iommu.h? I also thought about this but the masks are in vendor specific part of the UAPI. I looked through this patch series. It looks good to me. I will do some code style cleanup and take it to v5.7. I am not the right person to decide whether include/uapi/linux/iommu.h is the right place for this, so I will move it to Intel IOMMU driver for now. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs
On 2020/3/24 7:01, Raj, Ashok wrote: Hi Jean On Fri, Mar 20, 2020 at 10:29:55AM +0100, Jean-Philippe Brucker wrote: +#define to_intel_svm_dev(handle) container_of(handle, struct intel_svm_dev, sva) +struct iommu_sva * +intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata) +{ + struct iommu_sva *sva = ERR_PTR(-EINVAL); + struct intel_svm_dev *sdev = NULL; + int flags = 0; + int ret; + + /* +* TODO: Consolidate with generic iommu-sva bind after it is merged. +* It will require shared SVM data structures, i.e. combine io_mm +* and intel_svm etc. +*/ + if (drvdata) + flags = *(int *)drvdata; drvdata is more for storing device driver contexts that can be passed to iommu_sva_ops, but I get that this is temporary. As usual I'm dreading supervisor mode making it into the common API. What are your plans regarding SUPERVISOR_MODE and PRIVATE_PASID flags? The previous discussion on the subject [1] had me hoping that you could replace supervisor mode with normal mappings (auxiliary domains?) I'm less worried about PRIVATE_PASID, it would just add complexity into We don't seem to have an immediate need for PRIVATE_PASID. There are some talks about potential usages, but nothing concrete. I think it might be good to get rid of it now and add when we really need. For SUPERVISOR_MODE, the idea is to have aux domain. Baolu is working on something to replace. Certainly the entire kernel address is opening up the whole kimono.. so we are looking at dynamically creating mappings on demand. It might take some of the benefits of SVA in general with no need to create mappings, but for security somebody has to pay the price :-) My thought is to reuse below aux-domain API. int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev) Currently, it asks the vendor iommu driver to allocate a PASID and bind the domain with it. We can change it to allow the caller to pass in an existing supervisor PASID. int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev, int *pasid) In the vendor iommu driver, if (*pasid == INVALID_IOASID), it will allocate a pasid (the same as current behavior); otherwise, attach the domain with the pass-in pasid. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs
Hi Jean On Fri, Mar 20, 2020 at 10:29:55AM +0100, Jean-Philippe Brucker wrote: > > +#define to_intel_svm_dev(handle) container_of(handle, struct > > intel_svm_dev, sva) > > +struct iommu_sva * > > +intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata) > > +{ > > + struct iommu_sva *sva = ERR_PTR(-EINVAL); > > + struct intel_svm_dev *sdev = NULL; > > + int flags = 0; > > + int ret; > > + > > + /* > > +* TODO: Consolidate with generic iommu-sva bind after it is merged. > > +* It will require shared SVM data structures, i.e. combine io_mm > > +* and intel_svm etc. > > +*/ > > + if (drvdata) > > + flags = *(int *)drvdata; > > drvdata is more for storing device driver contexts that can be passed to > iommu_sva_ops, but I get that this is temporary. > > As usual I'm dreading supervisor mode making it into the common API. What > are your plans regarding SUPERVISOR_MODE and PRIVATE_PASID flags? The > previous discussion on the subject [1] had me hoping that you could > replace supervisor mode with normal mappings (auxiliary domains?) > I'm less worried about PRIVATE_PASID, it would just add complexity into We don't seem to have an immediate need for PRIVATE_PASID. There are some talks about potential usages, but nothing concrete. I think it might be good to get rid of it now and add when we really need. For SUPERVISOR_MODE, the idea is to have aux domain. Baolu is working on something to replace. Certainly the entire kernel address is opening up the whole kimono.. so we are looking at dynamically creating mappings on demand. It might take some of the benefits of SVA in general with no need to create mappings, but for security somebody has to pay the price :-) Cheers, Ashok > the API and iommu-sva implementation, but doesn't really have security > implications. > > [1] > https://lore.kernel.org/linux-iommu/20190228220449.ga12...@araj-mobl1.jf.intel.com/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: Lower severity of add/remove device messages
These user messages are not really informational, but mostly of debug nature. Lower their severity. Signed-off-by: Ezequiel Garcia --- drivers/iommu/iommu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3e3528436e0b..1ebd17033714 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -758,7 +758,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev) trace_add_device_to_group(group->id, dev); - dev_info(dev, "Adding to iommu group %d\n", group->id); + dev_dbg(dev, "Adding to iommu group %d\n", group->id); return 0; @@ -792,7 +792,7 @@ void iommu_group_remove_device(struct device *dev) struct iommu_group *group = dev->iommu_group; struct group_device *tmp_device, *device = NULL; - dev_info(dev, "Removing from iommu group %d\n", group->id); + dev_dbg(dev, "Removing from iommu group %d\n", group->id); /* Pre-notify listeners that a device is being removed. */ blocking_notifier_call_chain(&group->notifier, @@ -2337,8 +2337,8 @@ request_default_domain_for_dev(struct device *dev, unsigned long type) iommu_group_create_direct_mappings(group, dev); - dev_info(dev, "Using iommu %s mapping\n", -type == IOMMU_DOMAIN_DMA ? "dma" : "direct"); + dev_dbg(dev, "Using iommu %s mapping\n", + type == IOMMU_DOMAIN_DMA ? "dma" : "direct"); ret = 0; out: -- 2.26.0.rc2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V10 02/11] iommu/uapi: Define a mask for bind data
On Sun, 22 Mar 2020 09:29:32 +0800 Lu Baolu wrote: > On 2020/3/21 7:27, Jacob Pan wrote: > > Memory type related flags can be grouped together for one simple > > check. > > > > --- > > v9 renamed from EMT to MTS since these are memory type support > > flags. --- > > > > Signed-off-by: Jacob Pan > > --- > > include/uapi/linux/iommu.h | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > > index 4ad3496e5c43..d7bcbc5f79b0 100644 > > --- a/include/uapi/linux/iommu.h > > +++ b/include/uapi/linux/iommu.h > > @@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd { > > __u32 pat; > > __u32 emt; > > }; > > - > > +#define IOMMU_SVA_VTD_GPASID_MTS_MASK > > (IOMMU_SVA_VTD_GPASID_CD | \ > > +IOMMU_SVA_VTD_GPASID_EMTE > > | \ > > +IOMMU_SVA_VTD_GPASID_PCD > > | \ > > + > > IOMMU_SVA_VTD_GPASID_PWT) > > As name implies, can this move to intel-iommu.h? > I also thought about this but the masks are in vendor specific part of the UAPI. > Best regards, > baolu > > > /** > >* struct iommu_gpasid_bind_data - Information about device and > > guest PASID binding > >* @version: Version of this data structure > > [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by endpoint
Hi Jean, > -Original Message- > From: Jean-Philippe Brucker > Sent: Monday, March 23, 2020 3:30 PM > To: Bharat Bhushan > Cc: j...@8bytes.org; m...@redhat.com; jasow...@redhat.com; > virtualizat...@lists.linux-foundation.org; iommu@lists.linux-foundation.org; > eric.au...@redhat.com > Subject: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by > endpoint > > External Email > > -- > Hi Bharat, > > Please add the IOMMU list on your next posting > > On Mon, Mar 23, 2020 at 02:11:08PM +0530, Bharat Bhushan wrote: > > Different endpoint can support different page size, probe endpoint if > > it supports specific page size otherwise use global page sizes. > > > > Signed-off-by: Bharat Bhushan > > --- > > drivers/iommu/virtio-iommu.c | 24 > > include/uapi/linux/virtio_iommu.h | 6 ++ > > 2 files changed, 26 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/iommu/virtio-iommu.c > > b/drivers/iommu/virtio-iommu.c index cce329d71fba..e69347ca4ee6 100644 > > --- a/drivers/iommu/virtio-iommu.c > > +++ b/drivers/iommu/virtio-iommu.c > > @@ -78,6 +78,7 @@ struct viommu_endpoint { > > struct viommu_dev *viommu; > > struct viommu_domain*vdomain; > > struct list_headresv_regions; > > + u64 pgsize_bitmap; > > }; > > > > struct viommu_request { > > @@ -415,6 +416,14 @@ static int viommu_replay_mappings(struct > viommu_domain *vdomain) > > return ret; > > } > > > > +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev, > > + struct virtio_iommu_probe_pgsize_mask *mask) > > + > > +{ > > + vdev->pgsize_bitmap = mask->pgsize_bitmap; > > We need to read this through le64_to_cpu(). Also check that the length of the > field > provided by the device is >= sizeof(mask) (like > viommu_add_resv_mem() does) > > > + return 0; > > +} > > + > > static int viommu_add_resv_mem(struct viommu_endpoint *vdev, > >struct virtio_iommu_probe_resv_mem *mem, > >size_t len) > > @@ -494,11 +503,13 @@ static int viommu_probe_endpoint(struct viommu_dev > *viommu, struct device *dev) > > while (type != VIRTIO_IOMMU_PROBE_T_NONE && > >cur < viommu->probe_size) { > > len = le16_to_cpu(prop->length) + sizeof(*prop); > > - > > switch (type) { > > case VIRTIO_IOMMU_PROBE_T_RESV_MEM: > > ret = viommu_add_resv_mem(vdev, (void *)prop, len); > > break; > > + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK: > > + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop); > > + break; > > default: > > dev_err(dev, "unknown viommu prop 0x%x\n", type); > > } > > @@ -607,16 +618,21 @@ static struct iommu_domain > *viommu_domain_alloc(unsigned type) > > return &vdomain->domain; > > } > > > > -static int viommu_domain_finalise(struct viommu_dev *viommu, > > +static int viommu_domain_finalise(struct viommu_endpoint *vdev, > > struct iommu_domain *domain) > > { > > int ret; > > struct viommu_domain *vdomain = to_viommu_domain(domain); > > + struct viommu_dev *viommu = vdev->viommu; > > > > vdomain->viommu = viommu; > > vdomain->map_flags = viommu->map_flags; > > > > - domain->pgsize_bitmap = viommu->pgsize_bitmap; > > + if (vdev->pgsize_bitmap) > > + domain->pgsize_bitmap = vdev->pgsize_bitmap; > > + else > > + domain->pgsize_bitmap = viommu->pgsize_bitmap; > > + > > nit: it could be nicer to initialize vdev->pgsize_bitmap in add_device(), To what size we should initialize in add_device, PAGE_SIZE? Thanks -Bharat > override it > in probe_endpoint(), and just copy it here. > > > domain->geometry= viommu->geometry; > > > > ret = ida_alloc_range(&viommu->domain_ids, viommu->first_domain, @@ > > -657,7 +673,7 @@ static int viommu_attach_dev(struct iommu_domain > *domain, struct device *dev) > > * Properly initialize the domain now that we know which viommu > > * owns it. > > */ > > - ret = viommu_domain_finalise(vdev->viommu, domain); > > + ret = viommu_domain_finalise(vdev, domain); > > Attaching additional endpoints with different masks to the domain should > return > an error > > > } else if (vdomain->viommu != vdev->viommu) { > > dev_err(dev, "cannot attach to foreign vIOMMU\n"); > > ret = -EXDEV; > > diff --git a/include/uapi/linux/virtio_iommu.h > > b/include/uapi/linux/virtio_iommu.h > > index 237e36a280cb..aff3db0ef54b 100644 > > --- a/include/uapi/linux/virtio_iommu.h > > +++ b/include/uapi/linux/virtio_iommu.h > > @@ -111,6 +111,7 @@ struct virtio_iommu_req_un
RE: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by endpoint
Hi Jean, > -Original Message- > From: Jean-Philippe Brucker > Sent: Monday, March 23, 2020 3:30 PM > To: Bharat Bhushan > Cc: j...@8bytes.org; m...@redhat.com; jasow...@redhat.com; > virtualizat...@lists.linux-foundation.org; iommu@lists.linux-foundation.org; > eric.au...@redhat.com > Subject: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by > endpoint > > External Email > > -- > Hi Bharat, > > Please add the IOMMU list on your next posting iommu@lists.linux-foundation.org is there, any other mailing list we need to add? > > On Mon, Mar 23, 2020 at 02:11:08PM +0530, Bharat Bhushan wrote: > > Different endpoint can support different page size, probe endpoint if > > it supports specific page size otherwise use global page sizes. > > > > Signed-off-by: Bharat Bhushan > > --- > > drivers/iommu/virtio-iommu.c | 24 > > include/uapi/linux/virtio_iommu.h | 6 ++ > > 2 files changed, 26 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/iommu/virtio-iommu.c > > b/drivers/iommu/virtio-iommu.c index cce329d71fba..e69347ca4ee6 100644 > > --- a/drivers/iommu/virtio-iommu.c > > +++ b/drivers/iommu/virtio-iommu.c > > @@ -78,6 +78,7 @@ struct viommu_endpoint { > > struct viommu_dev *viommu; > > struct viommu_domain*vdomain; > > struct list_headresv_regions; > > + u64 pgsize_bitmap; > > }; > > > > struct viommu_request { > > @@ -415,6 +416,14 @@ static int viommu_replay_mappings(struct > viommu_domain *vdomain) > > return ret; > > } > > > > +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev, > > + struct virtio_iommu_probe_pgsize_mask *mask) > > + > > +{ > > + vdev->pgsize_bitmap = mask->pgsize_bitmap; > > We need to read this through le64_to_cpu(). Also check that the length of the > field > provided by the device is >= sizeof(mask) (like > viommu_add_resv_mem() does) Will take care of all the comments in next verions Thank -Bharat > > > + return 0; > > +} > > + > > static int viommu_add_resv_mem(struct viommu_endpoint *vdev, > >struct virtio_iommu_probe_resv_mem *mem, > >size_t len) > > @@ -494,11 +503,13 @@ static int viommu_probe_endpoint(struct viommu_dev > *viommu, struct device *dev) > > while (type != VIRTIO_IOMMU_PROBE_T_NONE && > >cur < viommu->probe_size) { > > len = le16_to_cpu(prop->length) + sizeof(*prop); > > - > > switch (type) { > > case VIRTIO_IOMMU_PROBE_T_RESV_MEM: > > ret = viommu_add_resv_mem(vdev, (void *)prop, len); > > break; > > + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK: > > + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop); > > + break; > > default: > > dev_err(dev, "unknown viommu prop 0x%x\n", type); > > } > > @@ -607,16 +618,21 @@ static struct iommu_domain > *viommu_domain_alloc(unsigned type) > > return &vdomain->domain; > > } > > > > -static int viommu_domain_finalise(struct viommu_dev *viommu, > > +static int viommu_domain_finalise(struct viommu_endpoint *vdev, > > struct iommu_domain *domain) > > { > > int ret; > > struct viommu_domain *vdomain = to_viommu_domain(domain); > > + struct viommu_dev *viommu = vdev->viommu; > > > > vdomain->viommu = viommu; > > vdomain->map_flags = viommu->map_flags; > > > > - domain->pgsize_bitmap = viommu->pgsize_bitmap; > > + if (vdev->pgsize_bitmap) > > + domain->pgsize_bitmap = vdev->pgsize_bitmap; > > + else > > + domain->pgsize_bitmap = viommu->pgsize_bitmap; > > + > > nit: it could be nicer to initialize vdev->pgsize_bitmap in add_device(), > override it > in probe_endpoint(), and just copy it here. > > > domain->geometry= viommu->geometry; > > > > ret = ida_alloc_range(&viommu->domain_ids, viommu->first_domain, @@ > > -657,7 +673,7 @@ static int viommu_attach_dev(struct iommu_domain > *domain, struct device *dev) > > * Properly initialize the domain now that we know which viommu > > * owns it. > > */ > > - ret = viommu_domain_finalise(vdev->viommu, domain); > > + ret = viommu_domain_finalise(vdev, domain); > > Attaching additional endpoints with different masks to the domain should > return > an error > > > } else if (vdomain->viommu != vdev->viommu) { > > dev_err(dev, "cannot attach to foreign vIOMMU\n"); > > ret = -EXDEV; > > diff --git a/include/uapi/linux/virtio_iommu.h > > b/include/uapi/linux/virtio_iommu.h > > index 237e36a280cb..aff3db0ef54b 100644 > > --- a/include/uapi/linux/virtio_iommu.h > > +++ b/include/
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote: > > This is what I was trying, but considering I am new to DMA subsystem, I > am not sure I got all the details correct. The idea is to look at the > cpu addr and see if that can be used in direct map fashion(is > bus_dma_limit the right restriction here?) if not fallback to dynamic > IOMMU mapping. I don't think we can throw all these complications into the dma mapping code. At some point I also wonder what the point is, especially for scatterlist mappings, where the iommu can coalesce. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote: > >> 0x100.. .. 0x101.. > >> > >> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB > >> (because of the maximum IOMMU table size) but not the other. And 1:1 on > >> that "pseries" is done with offset=0x0800.... > >> > >> So we want to check every bus address against dev->bus_dma_limit, not > >> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to > >> 0x0800.0001.. and 1:1 mapping for the second 4GB would not be > >> tried. Does this sound reasonable? Thanks, > > > > bus_dma_limit is just another limiting factor applied on top of > > coherent_dma_mask or dma_mask respectively. > > This is not enough for the task: in my example, I'd set bus limit to > 0x0800.0001.. but this would disable bypass for all RAM > addresses - the first and the second 4GB blocks. So what about something like the version here: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 10/15] iommu/arm-smmu: Use accessor functions for iommu private data
Hi Joerg, Thanks for tackling this! On 2020-03-20 9:14 am, Joerg Roedel wrote: From: Joerg Roedel Make use of dev_iommu_priv_set/get() functions and simplify the code where possible with this change. Tested-by: Will Deacon # arm-smmu Signed-off-by: Joerg Roedel --- drivers/iommu/arm-smmu.c | 57 +--- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 980aae73b45b..7aa36e6c19c0 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -98,12 +98,15 @@ struct arm_smmu_master_cfg { s16 smendx[]; }; #define INVALID_SMENDX-1 -#define __fwspec_cfg(fw) ((struct arm_smmu_master_cfg *)fw->iommu_priv) -#define fwspec_smmu(fw) (__fwspec_cfg(fw)->smmu) -#define fwspec_smendx(fw, i) \ - (i >= fw->num_ids ? INVALID_SMENDX : __fwspec_cfg(fw)->smendx[i]) -#define for_each_cfg_sme(fw, i, idx) \ - for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i) +#define __fwspec_cfg(dev) ((struct arm_smmu_master_cfg *)dev_iommu_priv_get(dev)) +#define fwspec_smmu(dev) (__fwspec_cfg(dev)->smmu) +#define fwspec_smendx(dev, i) \ + (i >= dev_iommu_fwspec_get(dev)->num_ids ?\ + INVALID_SMENDX :\ + __fwspec_cfg(dev)->smendx[i]) +#define for_each_cfg_sme(dev, i, idx) \ + for (i = 0; idx = fwspec_smendx(dev, i), \ +i < dev_iommu_fwspec_get(dev)->num_ids; ++i) Yikes, this ends up pretty ugly, and I'd prefer not have this much complexity hidden in macros that were intended just to be convenient shorthand. Would you mind pulling in the patch below as a precursor? Other than that, the rest of the series looks OK at a glance. We should also move fwspec->ops to dev_iommu, as those are "IOMMU API" data rather than "firmware" data, but let's consider that separately as this series is already long enough. Thanks, Robin. ->8- Subject: [PATCH] iommu/arm-smmu: Refactor master_cfg/fwspec usage In preparation for restructuring iommu_fwspec, refactor the way we access the arm_smmu_master_cfg private data to be less dependent on the current layout. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 42 +--- 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 16c4b87af42b..b4978f45a7f2 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -98,12 +98,10 @@ struct arm_smmu_master_cfg { s16 smendx[]; }; #define INVALID_SMENDX -1 -#define __fwspec_cfg(fw) ((struct arm_smmu_master_cfg *)fw->iommu_priv) -#define fwspec_smmu(fw) (__fwspec_cfg(fw)->smmu) -#define fwspec_smendx(fw, i) \ - (i >= fw->num_ids ? INVALID_SMENDX : __fwspec_cfg(fw)->smendx[i]) -#define for_each_cfg_sme(fw, i, idx) \ - for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i) +#define cfg_smendx(cfg, fw, i) \ + (i >= fw->num_ids ? INVALID_SMENDX : cfg->smendx[i]) +#define for_each_cfg_sme(cfg, fw, i, idx) \ + for (i = 0; idx = cfg_smendx(cfg, fw, i), i < fw->num_ids; ++i) static bool using_legacy_binding, using_generic_binding; @@ -1069,7 +1067,7 @@ static int arm_smmu_master_alloc_smes(struct device *dev) mutex_lock(&smmu->stream_map_mutex); /* Figure out a viable stream map entry allocation */ - for_each_cfg_sme(fwspec, i, idx) { + for_each_cfg_sme(cfg, fwspec, i, idx) { u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]); u16 mask = FIELD_GET(ARM_SMMU_SMR_MASK, fwspec->ids[i]); @@ -1100,7 +1098,7 @@ static int arm_smmu_master_alloc_smes(struct device *dev) iommu_group_put(group); /* It worked! Now, poke the actual hardware */ - for_each_cfg_sme(fwspec, i, idx) { + for_each_cfg_sme(cfg, fwspec, i, idx) { arm_smmu_write_sme(smmu, idx); smmu->s2crs[idx].group = group; } @@ -1117,14 +1115,14 @@ static int arm_smmu_master_alloc_smes(struct device *dev) return ret; } -static void arm_smmu_master_free_smes(struct iommu_fwspec *fwspec) +static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg, + struct iommu_fwspec *fwspec) { - struct arm_smmu_device *smmu = fwspec_smmu(fwspec); - struct arm_smmu_master_cfg *cfg = fwspec->iommu_priv; + struct arm_smmu_device *smmu = cfg->smmu; int i, idx; mutex_lock(&smmu->stream_map_mutex); - for_each_cfg_sme(fwspec, i, idx) { + for_each_cfg_sme(cfg, fwspec, i, idx) { if (arm_smmu_free_sme(smmu, idx)) arm_smmu_write_sme(smmu, idx); cfg->smendx[i] = INVALID_SMENDX; @@ -1133,6 +1131,7 @@ static void arm_smmu_m
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
Christoph Hellwig writes: > On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote: >> > > +/* >> > > + * Allows IOMMU drivers to bypass dynamic translations if the >> > > DMA mask >> > > + * is large enough. >> > > + */ >> > > +if (dev->dma_ops_bypass) { >> > > +if (min_not_zero(dev->coherent_dma_mask, >> > > dev->bus_dma_limit) >= >> > > +dma_direct_get_required_mask(dev)) >> > > +return true; >> > > +} >> > >> > >> > Why not do this in dma_map_direct() as well? >> >> Mostly beacuse it is a relatively expensive operation, including a >> fls64. > > Which I guess isn't too bad compared to a dynamic IOMMU mapping. Can > you just send a draft patch for what you'd like to see for ppc? This is what I was trying, but considering I am new to DMA subsystem, I am not sure I got all the details correct. The idea is to look at the cpu addr and see if that can be used in direct map fashion(is bus_dma_limit the right restriction here?) if not fallback to dynamic IOMMU mapping. diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index e486d1d78de2..bc7e6a8b2caa 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -31,6 +31,87 @@ static inline bool dma_iommu_map_bypass(struct device *dev, (!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING)); } +static inline bool __dma_direct_map_capable(struct device *dev, struct page *page, + unsigned long offset, size_t size) +{ + phys_addr_t phys = page_to_phys(page) + offset; + dma_addr_t dma_addr = phys_to_dma(dev, phys); + dma_addr_t end = dma_addr + size - 1; + + return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit); +} + +static inline bool dma_direct_map_capable(struct device *dev, struct page *page, + unsigned long offset, size_t size, + unsigned long attrs) +{ + if (!dma_iommu_map_bypass(dev, attrs)) + return false; + + if (!dev->dma_mask) + return false; + + return __dma_direct_map_capable(dev, page, offset, size); +} + + +static inline bool dma_direct_unmap_capable(struct device *dev, dma_addr_t addr, size_t size, + unsigned long attrs) +{ + dma_addr_t end = addr + size - 1; + + if (!dma_iommu_map_bypass(dev, attrs)) + return false; + + if (!dev->dma_mask) + return false; + + return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit); +} + +static inline bool dma_direct_sg_map_capable(struct device *dev, struct scatterlist *sglist, +int nelems, unsigned long attrs) +{ + int i; + struct scatterlist *sg; + + if (!dma_iommu_map_bypass(dev, attrs)) + return false; + + if (!dev->dma_mask) + return false; + + for_each_sg(sglist, sg, nelems, i) { + if (!__dma_direct_map_capable(dev, sg_page(sg), + sg->offset, sg->length)) + return false; + } + return true; +} + +static inline bool dma_direct_sg_unmap_capable(struct device *dev, struct scatterlist *sglist, + int nelems, unsigned long attrs) +{ + int i; + dma_addr_t end; + struct scatterlist *sg; + + if (!dma_iommu_map_bypass(dev, attrs)) + return false; + + if (!dev->dma_mask) + return false; + + for_each_sg(sglist, sg, nelems, i) { + end = sg->dma_address + sg_dma_len(sg); + + if (end > min_not_zero(*dev->dma_mask, dev->bus_dma_limit)) + return false; + } + return true; +} + + /* Allocates a contiguous real buffer and creates mappings over it. * Returns the virtual address of the buffer and sets dma_handle * to the dma address (mapping) of the first page. @@ -67,7 +148,7 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page, enum dma_data_direction direction, unsigned long attrs) { - if (dma_iommu_map_bypass(dev, attrs)) + if (dma_direct_map_capable(dev, page, offset, size, attrs)) return dma_direct_map_page(dev, page, offset, size, direction, attrs); return iommu_map_page(dev, get_iommu_table_base(dev), page, offset, @@ -79,7 +160,7 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction, unsigned long attrs) { - if (!dma_iommu_map_bypass(d
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On Mon, Mar 23, 2020 at 12:14:08PM +, Robin Murphy wrote: > On 2020-03-20 2:16 pm, Christoph Hellwig wrote: >> Several IOMMU drivers have a bypass mode where they can use a direct >> mapping if the devices DMA mask is large enough. Add generic support >> to the core dma-mapping code to do that to switch those drivers to >> a common solution. > > Hmm, this is _almost_, but not quite the same as the case where drivers are > managing their own IOMMU mappings, but still need to use streaming DMA for > cache maintenance on the underlying pages. In that case they should simply not call the DMA API at all. We'll just need versions of the cache maintainance APIs that tie in with the raw IOMMU API. > For that we need the ops bypass > to be a "true" bypass and also avoid SWIOTLB regardless of the device's DMA > mask. That behaviour should in fact be fine for the opportunistic bypass > case here as well, since the mask being "big enough" implies by definition > that this should never need to bounce either. In practice it does. But that means adding yet another code path vs the simple direct call to dma_direct_* vs calling the DMA ops which I'd rather avoid. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On 2020-03-20 2:16 pm, Christoph Hellwig wrote: Several IOMMU drivers have a bypass mode where they can use a direct mapping if the devices DMA mask is large enough. Add generic support to the core dma-mapping code to do that to switch those drivers to a common solution. Hmm, this is _almost_, but not quite the same as the case where drivers are managing their own IOMMU mappings, but still need to use streaming DMA for cache maintenance on the underlying pages. For that we need the ops bypass to be a "true" bypass and also avoid SWIOTLB regardless of the device's DMA mask. That behaviour should in fact be fine for the opportunistic bypass case here as well, since the mask being "big enough" implies by definition that this should never need to bounce either. For the (hopefully less common) third case where, due to groups or user overrides, we end up giving an identity DMA domain to a device with limited DMA masks which _does_ need SWIOTLB, I'd like to think we can solve that by not giving the device IOMMU DMA ops in the first place, such that it never needs to engage the bypass mechanism at all. Thoughts? Robin. Signed-off-by: Christoph Hellwig --- include/linux/device.h | 6 ++ include/linux/dma-mapping.h | 30 ++ kernel/dma/mapping.c| 36 +++- 3 files changed, 51 insertions(+), 21 deletions(-) diff --git a/include/linux/device.h b/include/linux/device.h index 0cd7c647c16c..09be8bb2c4a6 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -525,6 +525,11 @@ struct dev_links_info { * sync_state() callback. * @dma_coherent: this particular device is dma coherent, even if the *architecture supports non-coherent devices. + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the + * streaming DMA operations (->map_* / ->unmap_* / ->sync_*), + * and optionall (if the coherent mask is large enough) also + * for dma allocations. This flag is managed by the dma ops + * instance from ->dma_supported. * * At the lowest level, every device in a Linux system is represented by an * instance of struct device. The device structure contains the information @@ -625,6 +630,7 @@ struct device { defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) booldma_coherent:1; #endif + booldma_ops_bypass : 1; }; static inline struct device *kobj_to_dev(struct kobject *kobj) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 330ad58fbf4d..c3af0cf5e435 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma, } #endif /* CONFIG_DMA_DECLARE_COHERENT */ -static inline bool dma_is_direct(const struct dma_map_ops *ops) +/* + * Check if the devices uses a direct mapping for streaming DMA operations. + * This allows IOMMU drivers to set a bypass mode if the DMA mask is large + * enough. + */ +static inline bool dma_map_direct(struct device *dev, + const struct dma_map_ops *ops) { - return likely(!ops); + return likely(!ops) || dev->dma_ops_bypass; } /* @@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev, dma_addr_t addr; BUG_ON(!valid_dma_direction(dir)); - if (dma_is_direct(ops)) + if (dma_map_direct(dev, ops)) addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); else addr = ops->map_page(dev, page, offset, size, dir, attrs); @@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, const struct dma_map_ops *ops = get_dma_ops(dev); BUG_ON(!valid_dma_direction(dir)); - if (dma_is_direct(ops)) + if (dma_map_direct(dev, ops)) dma_direct_unmap_page(dev, addr, size, dir, attrs); else if (ops->unmap_page) ops->unmap_page(dev, addr, size, dir, attrs); @@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int ents; BUG_ON(!valid_dma_direction(dir)); - if (dma_is_direct(ops)) + if (dma_map_direct(dev, ops)) ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); else ents = ops->map_sg(dev, sg, nents, dir, attrs); @@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg BUG_ON(!valid_dma_direction(dir)); debug_dma_unmap_sg(dev, sg, nents, dir); - if (dma_is_direct(ops)) + if (dma_map_direct(dev, ops)) dma_direct_unmap_sg(dev, sg, nents, dir, attrs); else if (ops->unmap_sg) ops->unmap_sg(dev, sg, nents, dir, attrs); @@ -352,7 +358,7 @@ static inline dm
Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by endpoint
Hi Bharat, Please add the IOMMU list on your next posting On Mon, Mar 23, 2020 at 02:11:08PM +0530, Bharat Bhushan wrote: > Different endpoint can support different page size, probe > endpoint if it supports specific page size otherwise use > global page sizes. > > Signed-off-by: Bharat Bhushan > --- > drivers/iommu/virtio-iommu.c | 24 > include/uapi/linux/virtio_iommu.h | 6 ++ > 2 files changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index cce329d71fba..e69347ca4ee6 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -78,6 +78,7 @@ struct viommu_endpoint { > struct viommu_dev *viommu; > struct viommu_domain*vdomain; > struct list_headresv_regions; > + u64 pgsize_bitmap; > }; > > struct viommu_request { > @@ -415,6 +416,14 @@ static int viommu_replay_mappings(struct viommu_domain > *vdomain) > return ret; > } > > +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev, > + struct virtio_iommu_probe_pgsize_mask *mask) > + > +{ > + vdev->pgsize_bitmap = mask->pgsize_bitmap; We need to read this through le64_to_cpu(). Also check that the length of the field provided by the device is >= sizeof(mask) (like viommu_add_resv_mem() does) > + return 0; > +} > + > static int viommu_add_resv_mem(struct viommu_endpoint *vdev, > struct virtio_iommu_probe_resv_mem *mem, > size_t len) > @@ -494,11 +503,13 @@ static int viommu_probe_endpoint(struct viommu_dev > *viommu, struct device *dev) > while (type != VIRTIO_IOMMU_PROBE_T_NONE && > cur < viommu->probe_size) { > len = le16_to_cpu(prop->length) + sizeof(*prop); > - > switch (type) { > case VIRTIO_IOMMU_PROBE_T_RESV_MEM: > ret = viommu_add_resv_mem(vdev, (void *)prop, len); > break; > + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK: > + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop); > + break; > default: > dev_err(dev, "unknown viommu prop 0x%x\n", type); > } > @@ -607,16 +618,21 @@ static struct iommu_domain > *viommu_domain_alloc(unsigned type) > return &vdomain->domain; > } > > -static int viommu_domain_finalise(struct viommu_dev *viommu, > +static int viommu_domain_finalise(struct viommu_endpoint *vdev, > struct iommu_domain *domain) > { > int ret; > struct viommu_domain *vdomain = to_viommu_domain(domain); > + struct viommu_dev *viommu = vdev->viommu; > > vdomain->viommu = viommu; > vdomain->map_flags = viommu->map_flags; > > - domain->pgsize_bitmap = viommu->pgsize_bitmap; > + if (vdev->pgsize_bitmap) > + domain->pgsize_bitmap = vdev->pgsize_bitmap; > + else > + domain->pgsize_bitmap = viommu->pgsize_bitmap; > + nit: it could be nicer to initialize vdev->pgsize_bitmap in add_device(), override it in probe_endpoint(), and just copy it here. > domain->geometry= viommu->geometry; > > ret = ida_alloc_range(&viommu->domain_ids, viommu->first_domain, > @@ -657,7 +673,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, > struct device *dev) >* Properly initialize the domain now that we know which viommu >* owns it. >*/ > - ret = viommu_domain_finalise(vdev->viommu, domain); > + ret = viommu_domain_finalise(vdev, domain); Attaching additional endpoints with different masks to the domain should return an error > } else if (vdomain->viommu != vdev->viommu) { > dev_err(dev, "cannot attach to foreign vIOMMU\n"); > ret = -EXDEV; > diff --git a/include/uapi/linux/virtio_iommu.h > b/include/uapi/linux/virtio_iommu.h > index 237e36a280cb..aff3db0ef54b 100644 > --- a/include/uapi/linux/virtio_iommu.h > +++ b/include/uapi/linux/virtio_iommu.h > @@ -111,6 +111,7 @@ struct virtio_iommu_req_unmap { > > #define VIRTIO_IOMMU_PROBE_T_NONE0 > #define VIRTIO_IOMMU_PROBE_T_RESV_MEM1 > +#define VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK 2 > > #define VIRTIO_IOMMU_PROBE_T_MASK0xfff > > @@ -119,6 +120,11 @@ struct virtio_iommu_probe_property { > __le16 length; > }; > > +struct virtio_iommu_probe_pgsize_mask { > + struct virtio_iommu_probe_property head; Compilers will introduce 4 bytes of padding here, to align the next field. We need to make them explicit by adding a 4-bytes 'reserved' field. > + uint64_tpgsize_bitmap; __le64 Than
Re: [PATCH 0/3] Request direct mapping for modem firmware subdevice
Hi Robin, On 2020-03-12 17:35, Robin Murphy wrote: On 2020-03-12 6:28 am, Sai Prakash Ranjan wrote: Hi Robin, Are you talking about this one for SoC specific change - https://lore.kernel.org/patchwork/patch/1183530/ Exactly - this particular wheel needs no reinventing at all. [ I guess I should go review those patches properly... :) ] It would be great if you could review the patch - https://lore.kernel.org/patchwork/patch/1183530/ Sibi has posted a v2 of this series based on that patch. Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm-smmu-v3 high cpu usage for NVMe
On 2020-03-23 09:03, John Garry wrote: On 20/03/2020 16:33, Marc Zyngier wrote: JFYI, I've been playing for "perf annotate" today and it's giving strange results for my NVMe testing. So "report" looks somewhat sane, if not a worryingly high % for arm_smmu_cmdq_issue_cmdlist(): 55.39% irq/342-nvme0q1 [kernel.kallsyms] [k] arm_smmu_cmdq_issue_cmdlist 9.74% irq/342-nvme0q1 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 2.02% irq/342-nvme0q1 [kernel.kallsyms] [k] nvme_irq 1.86% irq/342-nvme0q1 [kernel.kallsyms] [k] fput_many 1.73% irq/342-nvme0q1 [kernel.kallsyms] [k] arm_smmu_atc_inv_domain.constprop.42 1.67% irq/342-nvme0q1 [kernel.kallsyms] [k] __arm_lpae_unmap 1.49% irq/342-nvme0q1 [kernel.kallsyms] [k] aio_complete_rw But "annotate" consistently tells me that a specific instruction consumes ~99% of the load for the enqueue function: : /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */ : if (sync) { 0.00 : 80001071c948: ldr w0, [x29, #108] : int ret = 0; 0.00 : 80001071c94c: mov w24, #0x0 // #0 : if (sync) { 0.00 : 80001071c950: cbnz w0, 80001071c990 : arch_local_irq_restore(): 0.00 : 80001071c954: msr daif, x21 : arm_smmu_cmdq_issue_cmdlist(): : } : } : : local_irq_restore(flags); : return ret; : } 99.51 : 80001071c958: adrp x0, 800011909000 Hi Marc, This is likely the side effect of the re-enabling of interrupts (msr daif, x21) on the previous instruction which causes the perf interrupt to fire right after. ok, makes sense. Time to enable pseudo-NMIs in the PMUv3 driver... Do you know if there is any plan for this? There was. Julien Thierry has a bunch of patches for that [1], but they needs reviving. In the meantime, maybe I can do some trickery by putting the local_irq_restore() in a separate function, outside arm_smmu_cmdq_issue_cmdlist(), to get a fair profile for that same function. I don't see how you can improve the profiling without compromising the locking in this case... Thanks, M. [1] https://patchwork.kernel.org/cover/11047407/ -- Jazz is not dead. It just smells funny... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On 23/03/2020 19:37, Christoph Hellwig wrote: > On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote: > > [full quote deleted, please follow proper quoting rules] > >>> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops >>> *ops) >>> +{ >>> + if (!ops) >>> + return true; >>> + >>> + /* >>> +* Allows IOMMU drivers to bypass dynamic translations if the DMA mask >>> +* is large enough. >>> +*/ >>> + if (dev->dma_ops_bypass) { >>> + if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >= >>> + dma_direct_get_required_mask(dev)) >>> + return true; >>> + } >> >> >> Why not do this in dma_map_direct() as well? > > Mostly beacuse it is a relatively expensive operation, including a > fls64. Ah, ok. >> Or simply have just one dma_map_direct()? > > What do you mean with that? I mean use dma_alloc_direct() instead of dma_map_direct() everywhere, you explained just above. > >> And one more general question - we need a way to use non-direct IOMMU >> for RAM above certain limit. >> >> Let's say we have a system with: >> 0 .. 0x1.. >> 0x100.. .. 0x101.. >> >> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB >> (because of the maximum IOMMU table size) but not the other. And 1:1 on >> that "pseries" is done with offset=0x0800.... >> >> So we want to check every bus address against dev->bus_dma_limit, not >> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to >> 0x0800.0001.. and 1:1 mapping for the second 4GB would not be >> tried. Does this sound reasonable? Thanks, > > bus_dma_limit is just another limiting factor applied on top of > coherent_dma_mask or dma_mask respectively. This is not enough for the task: in my example, I'd set bus limit to 0x0800.0001.. but this would disable bypass for all RAM addresses - the first and the second 4GB blocks. -- Alexey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm-smmu-v3 high cpu usage for NVMe
On 20/03/2020 16:33, Marc Zyngier wrote: JFYI, I've been playing for "perf annotate" today and it's giving strange results for my NVMe testing. So "report" looks somewhat sane, if not a worryingly high % for arm_smmu_cmdq_issue_cmdlist(): 55.39% irq/342-nvme0q1 [kernel.kallsyms] [k] arm_smmu_cmdq_issue_cmdlist 9.74% irq/342-nvme0q1 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 2.02% irq/342-nvme0q1 [kernel.kallsyms] [k] nvme_irq 1.86% irq/342-nvme0q1 [kernel.kallsyms] [k] fput_many 1.73% irq/342-nvme0q1 [kernel.kallsyms] [k] arm_smmu_atc_inv_domain.constprop.42 1.67% irq/342-nvme0q1 [kernel.kallsyms] [k] __arm_lpae_unmap 1.49% irq/342-nvme0q1 [kernel.kallsyms] [k] aio_complete_rw But "annotate" consistently tells me that a specific instruction consumes ~99% of the load for the enqueue function: : /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */ : if (sync) { 0.00 : 80001071c948: ldr w0, [x29, #108] : int ret = 0; 0.00 : 80001071c94c: mov w24, #0x0 // #0 : if (sync) { 0.00 : 80001071c950: cbnz w0, 80001071c990 : arch_local_irq_restore(): 0.00 : 80001071c954: msr daif, x21 : arm_smmu_cmdq_issue_cmdlist(): : } : } : : local_irq_restore(flags); : return ret; : } 99.51 : 80001071c958: adrp x0, 800011909000 Hi Marc, This is likely the side effect of the re-enabling of interrupts (msr daif, x21) on the previous instruction which causes the perf interrupt to fire right after. ok, makes sense. Time to enable pseudo-NMIs in the PMUv3 driver... Do you know if there is any plan for this? In the meantime, maybe I can do some trickery by putting the local_irq_restore() in a separate function, outside arm_smmu_cmdq_issue_cmdlist(), to get a fair profile for that same function. Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote: > > > + /* > > > + * Allows IOMMU drivers to bypass dynamic translations if the DMA mask > > > + * is large enough. > > > + */ > > > + if (dev->dma_ops_bypass) { > > > + if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >= > > > + dma_direct_get_required_mask(dev)) > > > + return true; > > > + } > > > > > > Why not do this in dma_map_direct() as well? > > Mostly beacuse it is a relatively expensive operation, including a > fls64. Which I guess isn't too bad compared to a dynamic IOMMU mapping. Can you just send a draft patch for what you'd like to see for ppc? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote: [full quote deleted, please follow proper quoting rules] > > +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops > > *ops) > > +{ > > + if (!ops) > > + return true; > > + > > + /* > > +* Allows IOMMU drivers to bypass dynamic translations if the DMA mask > > +* is large enough. > > +*/ > > + if (dev->dma_ops_bypass) { > > + if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >= > > + dma_direct_get_required_mask(dev)) > > + return true; > > + } > > > Why not do this in dma_map_direct() as well? Mostly beacuse it is a relatively expensive operation, including a fls64. > Or simply have just one dma_map_direct()? What do you mean with that? > And one more general question - we need a way to use non-direct IOMMU > for RAM above certain limit. > > Let's say we have a system with: > 0 .. 0x1.. > 0x100.. .. 0x101.. > > 2x4G, each is 1TB aligned. And we can map directly only the first 4GB > (because of the maximum IOMMU table size) but not the other. And 1:1 on > that "pseries" is done with offset=0x0800.... > > So we want to check every bus address against dev->bus_dma_limit, not > dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to > 0x0800.0001.. and 1:1 mapping for the second 4GB would not be > tried. Does this sound reasonable? Thanks, bus_dma_limit is just another limiting factor applied on top of coherent_dma_mask or dma_mask respectively. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu