RE: [PATCH v4 03/12] iommu/vt-d: Move page table helpers into header
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 03/12] iommu/vt-d: Move page table helpers into header > > So that they could also be used in other source files. Just a refine. :) "This patch moves page table helpers to header file, so that other source files can use them." Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 10/12] iommu/vt-d: Add first level page table interface
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 10/12] iommu/vt-d: Add first level page table interface > > This adds an interface to setup the PASID entries for first This patch adds interface to setup the PASID entries for first. :) > level page table translation. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Signed-off-by: Sanjay Kumar > Signed-off-by: Lu Baolu > Reviewed-by: Ashok Raj > --- > drivers/iommu/intel-pasid.c | 81 + > drivers/iommu/intel-pasid.h | 11 + > include/linux/intel-iommu.h | 1 + > 3 files changed, 93 insertions(+) > > diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c > index 69530317c323..d8ca1e6a8e5e 100644 > --- a/drivers/iommu/intel-pasid.c > +++ b/drivers/iommu/intel-pasid.c > @@ -10,6 +10,7 @@ > #define pr_fmt(fmt) "DMAR: " fmt > > #include > +#include > #include > #include > #include > @@ -388,6 +389,26 @@ static inline void pasid_set_page_snoop(struct > pasid_entry > *pe, bool value) > pasid_set_bits(>val[1], 1 << 23, value); > } > > +/* > + * Setup the First Level Page table Pointer field (Bit 140~191) > + * of a scalable mode PASID entry. > + */ > +static inline void > +pasid_set_flptr(struct pasid_entry *pe, u64 value) > +{ > + pasid_set_bits(>val[2], VTD_PAGE_MASK, value); > +} > + > +/* > + * Setup the First Level Paging Mode field (Bit 130~131) of a > + * scalable mode PASID entry. > + */ > +static inline void > +pasid_set_flpm(struct pasid_entry *pe, u64 value) > +{ > + pasid_set_bits(>val[2], GENMASK_ULL(3, 2), value << 2); > +} > + > static void > pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu, > u16 did, int pasid) > @@ -458,6 +479,66 @@ void intel_pasid_tear_down_entry(struct intel_iommu > *iommu, > devtlb_invalidation_with_pasid(iommu, dev, pasid); > } > > +/* > + * Set up the scalable mode pasid table entry for first only > + * translation type. > + */ > +int intel_pasid_setup_first_level(struct intel_iommu *iommu, > + struct device *dev, pgd_t *pgd, > + int pasid, int flags) > +{ > + u16 did = FLPT_DEFAULT_DID; > + struct pasid_entry *pte; aha, same comment with previous patch. may be better us pt_entry or pasid_entry. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping > > So that the pasid related info, such as the pasid table and the > maximum of pasid could be used during setting up scalable mode > context. A little bit refine. Wish it helps. :) "This patch passes the pasid related info(e.g. the pasid table and the maximum of pasid) to context mapping, so that pasid related fields can be setup accordingly in scalable mode context entry." Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > > This adds the interfaces to setup or tear down the structures > for second level page table translations. This includes types > of second level only translation and pass through. A little bit refining to the description:) "This patch adds interfaces for setup or tear down second level translation in PASID granularity. Translation type includes second level only type and pass-through type." > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Signed-off-by: Sanjay Kumar [...] > + > +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, > + struct device *dev, int pasid) > +{ > + struct pasid_entry *pte; pte is confusing as it is similar with pte in paging structures. may use pt_entry or just pasid_entry. This comment applies to other "pte"s in this patch. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > Subject: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes > > Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid > entry for first-level or pass-through translation should be > programmed with a domain id different from those used for > second-level or nested translation. It is recommended that > software could use a same domain id for all first-only and > pass-through translations. > > This reserves a domain id for first-level and pass-through > translations. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Cc: Sanjay Kumar > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel-iommu.c | 10 ++ > drivers/iommu/intel-pasid.h | 6 ++ > 2 files changed, 16 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 9331240c70b8..2f7455ee4e7a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1618,6 +1618,16 @@ static int iommu_init_domains(struct intel_iommu > *iommu) >*/ > set_bit(0, iommu->domain_ids); > > + /* > + * Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid > + * entry for first-level or pass-through translation modes should > + * be programmed with a domain id different from those used for > + * second-level or nested translation. We reserve a domain id for > + * this purpose. > + */ > + if (sm_supported(iommu)) > + set_bit(FLPT_DEFAULT_DID, iommu->domain_ids); "FLPT_DEFAULT_DID" looks very likely for first level translation. How about "PT_FL_DEFAULT_DID"? > return 0; > } > > diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h > index 12f480c2bb8b..03c1612d173c 100644 > --- a/drivers/iommu/intel-pasid.h > +++ b/drivers/iommu/intel-pasid.h > @@ -17,6 +17,12 @@ > #define PDE_PFN_MASK PAGE_MASK > #define PASID_PDE_SHIFT 6 > > +/* > + * Domain ID reserved for pasid entries programmed for first-level > + * only and pass-through transfer modes. > + */ > +#define FLPT_DEFAULT_DID 1 Would be helpful to elaborate why DID 1 is selected in the patch description. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi Baolu, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM [...] > --- > drivers/iommu/dmar.c| 83 +++-- > drivers/iommu/intel-svm.c | 76 -- > drivers/iommu/intel_irq_remapping.c | 6 ++- > include/linux/intel-iommu.h | 9 +++- > 4 files changed, 115 insertions(+), 59 deletions(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index d9c748b6f9e4..ec10427b98ac 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int > index) > int head, tail; > struct q_inval *qi = iommu->qi; > int wait_index = (index + 1) % QI_LENGTH; > + int shift = qi_shift(iommu); > > if (qi->desc_status[wait_index] == QI_ABORT) > return -EAGAIN; > @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu, > int index) >*/ > if (fault & DMA_FSTS_IQE) { > head = readl(iommu->reg + DMAR_IQH_REG); > - if ((head >> DMAR_IQ_SHIFT) == index) { > + if ((head >> shift) == index) { > + struct qi_desc *desc = qi->desc + head; > + > pr_err("VT-d detected invalid descriptor: " > "low=%llx, high=%llx\n", > - (unsigned long long)qi->desc[index].low, > - (unsigned long long)qi->desc[index].high); > - memcpy(>desc[index], >desc[wait_index], > - sizeof(struct qi_desc)); > + (unsigned long long)desc->qw0, > + (unsigned long long)desc->qw1); Still missing qw2 and qw3. May make the print differ based on if smts is configed. > + memcpy(desc, qi->desc + (wait_index << shift), Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << shift)," be more safe? > +1 << shift); > writel(DMA_FSTS_IQE, iommu->reg + DMAR_FSTS_REG); > return -EINVAL; > } > @@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu *iommu, > int index) >*/ > if (fault & DMA_FSTS_ITE) { > head = readl(iommu->reg + DMAR_IQH_REG); > - head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH; > + head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH; > head |= 1; > tail = readl(iommu->reg + DMAR_IQT_REG); > - tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH; > + tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH; > > writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG); > > @@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct > intel_iommu *iommu) > { > int rc; > struct q_inval *qi = iommu->qi; > - struct qi_desc *hw, wait_desc; > + int offset, shift, length; > + struct qi_desc wait_desc; > int wait_index, index; > unsigned long flags; > > if (!qi) > return 0; > > - hw = qi->desc; > - > restart: > rc = 0; > > @@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct > intel_iommu *iommu) > > index = qi->free_head; > wait_index = (index + 1) % QI_LENGTH; > + shift = qi_shift(iommu); > + length = 1 << shift; > > qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE; > > - hw[index] = *desc; > - > - wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) | > + offset = index << shift; > + memcpy(qi->desc + offset, desc, length); > + wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) | > QI_IWD_STATUS_WRITE | QI_IWD_TYPE; > - wait_desc.high = virt_to_phys(>desc_status[wait_index]); > + wait_desc.qw1 = virt_to_phys(>desc_status[wait_index]); > + wait_desc.qw2 = 0; > + wait_desc.qw3 = 0; > > - hw[wait_index] = wait_desc; > + offset = wait_index << shift; > + memcpy(qi->desc + offset, _desc, length); same question with above one. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator
Hi, On 10/20/18 2:11 AM, Jean-Philippe Brucker wrote: +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid) +{ + return -ESRCH; return NULL; Best regards, Lu Baolu +} +#endif /* CONFIG_IOASID */ +#endif /* __LINUX_IOASID_H */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] iommu/vt-d: Group and domain relationship
Hi, On 11/6/18 6:40 PM, James Sewart wrote: Hey Lu, Would you be able to go into more detail about the issues with allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc? This door is closed because intel iommu driver does everything for IOMMU_DOMAIN_DMA: allocating a domain and setup the context entries for the domain. Why do we want to open this door? Probably we want the generic iommu layer to handle these things (it's called default domain). So we can't just open the door but not cleanup the things right? I haven't spent time on details. So I cc'ed Jacob for corrections. Best regards, Lu Baolu Cheers, James. On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu wrote: Hi, On 10/30/18 10:18 PM, James Sewart via iommu wrote: Hey, I’ve been investigating the relationship between iommu groups and domains on our systems and have a few question. Why does the intel iommu code not allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain type has the side effect that the default_domain for an iommu group is not set, which, when using for e.g. dma_map_ops.map_page, means a domain is allocated per device. Intel vt-d driver doesn't implement the default domain and allocates domain only on demanded. There are lots of things to do before we allow iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED. Best regards, Lu Baolu This seems to be the opposite behaviour to the AMD iommu code which supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu group if a domain is not attached to the device rather than allocating a new one. On AMD every device in an iommu group will share the same domain. Appended is what I think a patch to implement domain_alloc for IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing shows each device in a group will share a domain by default, it also allows allocating a new dma domain that can be successfully attached to a group with iommu_attach_group. Looking for comment on why the behaviour is how it is currently and if there are any issues with the solution I’ve been testing. Cheers, James. diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index bff2abd6..3a58389f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5170,10 +5170,15 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) struct dmar_domain *dmar_domain; struct iommu_domain *domain; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type == IOMMU_DOMAIN_UNMANAGED) + dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); + else if(type == IOMMU_DOMAIN_DMA) + dmar_domain = alloc_domain(0); + else if(type == IOMMU_DOMAIN_IDENTITY) + dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY); + else return NULL; - dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); if (!dmar_domain) { pr_err("Can't allocate dmar_domain\n"); return NULL; @@ -5186,9 +5191,12 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) domain_update_iommu_cap(dmar_domain); domain = _domain->domain; - domain->geometry.aperture_start = 0; - domain->geometry.aperture_end = __DOMAIN_MAX_ADDR(dmar_domain->gaw); - domain->geometry.force_aperture = true; + + if (type == IOMMU_DOMAIN_UNMANAGED) { + domain->geometry.aperture_start = 0; + domain->geometry.aperture_end = __DOMAIN_MAX_ADDR(dmar_domain->gaw); + domain->geometry.force_aperture = true; + } return domain; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device
Hi Alex, On 11/7/18 7:53 AM, Alex Williamson wrote: On Mon, 5 Nov 2018 15:34:06 +0800 Lu Baolu wrote: A parent device might create different types of mediated devices. For example, a mediated device could be created by the parent device with full isolation and protection provided by the IOMMU. One usage case could be found on Intel platforms where a mediated device is an assignable subset of a PCI, the DMA requests on behalf of it are all tagged with a PASID. Since IOMMU supports PASID-granular translations (scalable mode in vt-d 3.0), this mediated device could be individually protected and isolated by an IOMMU. This patch adds two new members in struct mdev_device: * iommu_device - This, if set, indicates that the mediated device could be fully isolated and protected by IOMMU via attaching an iommu domain to this device. If empty, it indicates using vendor defined isolation. * iommu_domain - This is a place holder for an iommu domain. A domain could be store here for later use once it has been attached to the iommu_device of this mdev. Below helpers are added to set and get above iommu device and iommu domain pointers. * mdev_set/get_iommu_device(dev, iommu_device) - Set or get the iommu device which represents this mdev in IOMMU's device scope. Drivers don't need to set the iommu device if it uses vendor defined isolation. * mdev_set/get_iommu_domain(domain) - A iommu domain which has been attached to the iommu device in order to protect and isolate the mediated device will be kept in the mdev data structure and could be retrieved later. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Suggested-by: Kevin Tian Suggested-by: Alex Williamson Signed-off-by: Lu Baolu --- drivers/vfio/mdev/mdev_core.c| 36 drivers/vfio/mdev/mdev_private.h | 2 ++ include/linux/mdev.h | 23 3 files changed, 61 insertions(+) diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index 0212f0ee8aea..5119809225c5 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool force_remove) return 0; } +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + mdev->iommu_device = iommu_device; + + return 0; +} +EXPORT_SYMBOL(mdev_set_iommu_device); + +struct device *mdev_get_iommu_device(struct device *dev) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + return mdev->iommu_device; +} +EXPORT_SYMBOL(mdev_get_iommu_device); + +int mdev_set_iommu_domain(struct device *dev, void *domain) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + mdev->iommu_domain = domain; + + return 0; +} +EXPORT_SYMBOL(mdev_set_iommu_domain); + +void *mdev_get_iommu_domain(struct device *dev) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + return mdev->iommu_domain; +} +EXPORT_SYMBOL(mdev_get_iommu_domain); + static int __init mdev_init(void) { return mdev_bus_register(); diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h index b5819b7d7ef7..c01518068e84 100644 --- a/drivers/vfio/mdev/mdev_private.h +++ b/drivers/vfio/mdev/mdev_private.h @@ -34,6 +34,8 @@ struct mdev_device { struct list_head next; struct kobject *type_kobj; bool active; + struct device *iommu_device; + void *iommu_domain; }; #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) diff --git a/include/linux/mdev.h b/include/linux/mdev.h index b6e048e1045f..c46777d3e568 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -14,6 +14,29 @@ #define MDEV_H struct mdev_device; +struct iommu_domain; + +/* + * Called by the parent device driver to set the PCI device which represents s/PCI // There is no requirement or expectation that the device is PCI. Fair enough. + * this mdev in iommu protection scope. By default, the iommu device is NULL, + * that indicates using vendor defined isolation. + * + * @dev: the mediated device that iommu will isolate. + * @iommu_device: a pci device which represents the iommu for @dev. + * + * Return 0 for success, otherwise negative error value. + */ +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device); + +struct device *mdev_get_iommu_device(struct device *dev); + +/* + * Called by vfio iommu modules to save the iommu domain after a domain being + * attached to the mediated device. + */ +int mdev_set_iommu_domain(struct device *dev, void *domain); + +void *mdev_get_iommu_domain(struct device *dev); I can't say I really understand the purpose of this, the cover letter indicates this is a placeholder, should we add it separately when we have a requirement for it? Oh, I am
Re: [PATCH 06/10] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs
On Mon, Oct 8, 2018 at 1:04 AM Christoph Hellwig wrote: > > No need to duplicate the code - map_sg is equivalent to map_page > for each page in the scatterlist. > > Signed-off-by: Christoph Hellwig > --- > kernel/dma/swiotlb.c | 34 -- > 1 file changed, 12 insertions(+), 22 deletions(-) Hey all, So, I've found this patch seems to break userspace booting on the HiKey960 board. Initially I thought this was an issue with the mali drivers, and have worked w/ the mali team to try to find a solution, but I've since found that booting just the upstream kernel (with no graphics support) will see userland hang/block unless this patch is reverted. When I see the hangs, it seems like the filesystems are stuck or something, as kernel messages still show up and sometimes I can get to a shell, but commands that I run in that shell (like ls) just hang. I don't see any other error messages. Reverting this patch then gets it work. In order to cleanly revert the patch, I have to revert the following set: "arm64: use the generic swiotlb_dma_ops" "swiotlb: add support for non-coherent DMA" "swiotlb: don't dip into swiotlb pool for coherent allocations" "swiotlb: refactor swiotlb_map_page" "swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs" But at that point if I just re-apply "swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs", I reproduce the hangs. Any suggestions for how to further debug what might be going wrong would be appreciated! thanks -john ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist
Hi Robin, On Tue, Nov 06, 2018 at 06:27:39PM +, Robin Murphy wrote: > > I re-ran the test to get some accuracy within the function and got: > > 1) pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); > > // reduced from 422 usec to 56 usec == 366 usec less > > 2) if (!(prot & IOMMU_CACHE)) {...} //flush routine > > // reduced from 439 usec to 236 usec == 203 usec less > > Note: new memset takes about 164 usec, resulting in 400 usec diff > >for the entire iommu_dma_alloc() function call. > > > > It looks like this might be more than the diff between clear_page > > and memset, and might be related to mapping and cache. Any idea? > > Hmm, I guess it might not be so much clear_page() itself as all the gubbins > involved in getting there from prep_new_page(). I could perhaps make some > vague guesses about how the A57 cores might get tickled by the different > code patterns, but the Denver cores are well beyond my ability to reason > about. Out of even further curiosity, how does the quick hack below compare? I tried out that change. And the results are as followings: a. Routine (1) reduced from 422 usec to 55 usec b. Routine (2) increased from 441 usec to 833 usec c. Overall, it seems to remain the same: 900+ usec > > > > @@ -581,6 +593,12 @@ struct page **iommu_dma_alloc(struct device *dev, > > > > size_t size, gfp_t gfp, > > > > if (sg_alloc_table_from_pages(, pages, count, 0, size, > > > > GFP_KERNEL)) > > > > goto out_free_iova; > > > > + if (gfp_zero) { > > > > + /* Now zero all the pages in the scatterlist */ > > > > + for_each_sg(sgt.sgl, s, sgt.orig_nents, i) > > > > + memset(sg_virt(s), 0, s->length); > > > > > > What if the pages came from highmem? I know that doesn't happen on arm64 > > > today, but the point of this code *is* to be generic, and other users will > > > arrive eventually. > > > > Hmm, so it probably should use sg_miter_start/stop() too? Looking > > at the flush routine doing in PAGE_SIZE for each iteration, would > > be possible to map and memset contiguous pages together? Actually > > the flush routine might be also optimized if we can map contiguous > > pages. > > I suppose the ideal point at which to do it would be after the remapping > when we have the entire buffer contiguous in vmalloc space and can make best > use of prefetchers etc. - DMA_ATTR_NO_KERNEL_MAPPING is a bit of a spanner > in the works, but we could probably accommodate a special case for that. As > Christoph points out, this isn't really the place to be looking for > performance anyway (unless it's pathologically bad as per the I would understand the point. So probably it'd be more plausible to have the change if it reflects on some practical benchmark. I might need to re-run some tests with heavier use cases. > DMA_ATTR_ALLOC_SINGLE_PAGES fun), but if we're looking at pulling the > remapping out of the arch code, maybe we could aim to rework the zeroing > completely as part of that. That'd be nice. I believe it'd be good to have. Thanks Nicolin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On Tue, Nov 6, 2018 at 3:54 AM, Robin Murphy wrote: > of_dma_configure() was *supposed* to be following the same logic as > acpi_dma_configure() and only setting bus_dma_mask if some range was > specified by the firmware. However, it seems that subtlety got lost in > the process of fitting it into the differently-shaped control flow, and > as a result the force_dma==true case ends up always setting the bus mask > to the 32-bit default, which is not what anyone wants. > > Make sure we only touch it if the DT actually said so. > > Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") > Reported-by: Aaro Koskinen > Reported-by: Jean-Philippe Brucker > Signed-off-by: Robin Murphy > --- > > Sorry about that... I guess I only have test setups that either have > dma-ranges or where a 32-bit bus mask goes unnoticed :( > > The Octeon and SMMU issues sound like they're purely down to this, and > it's probably related to at least one of John's Hikey woes. Yep! This does seem to resolve the mali bifrost dma address warn-ons I was seeing, and makes the board seem to function more consistently, so that's great! Tested-by: John Stultz Though I still find I have to revert "swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs" still to boot to UI successfully with AOSP. Still not sure whats going on there (its sort of soft hangs where some userland runs ok, but other bits seem to jam up, even console commands sometimes hang - almost seems like io stalls). Anyway, thanks so much again for this one! -john ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device
On Mon, 5 Nov 2018 15:34:06 +0800 Lu Baolu wrote: > A parent device might create different types of mediated > devices. For example, a mediated device could be created > by the parent device with full isolation and protection > provided by the IOMMU. One usage case could be found on > Intel platforms where a mediated device is an assignable > subset of a PCI, the DMA requests on behalf of it are all > tagged with a PASID. Since IOMMU supports PASID-granular > translations (scalable mode in vt-d 3.0), this mediated > device could be individually protected and isolated by an > IOMMU. > > This patch adds two new members in struct mdev_device: > * iommu_device > - This, if set, indicates that the mediated device could > be fully isolated and protected by IOMMU via attaching > an iommu domain to this device. If empty, it indicates > using vendor defined isolation. > > * iommu_domain > - This is a place holder for an iommu domain. A domain > could be store here for later use once it has been > attached to the iommu_device of this mdev. > > Below helpers are added to set and get above iommu device > and iommu domain pointers. > > * mdev_set/get_iommu_device(dev, iommu_device) > - Set or get the iommu device which represents this mdev > in IOMMU's device scope. Drivers don't need to set the > iommu device if it uses vendor defined isolation. > > * mdev_set/get_iommu_domain(domain) > - A iommu domain which has been attached to the iommu > device in order to protect and isolate the mediated > device will be kept in the mdev data structure and > could be retrieved later. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Suggested-by: Kevin Tian > Suggested-by: Alex Williamson > Signed-off-by: Lu Baolu > --- > drivers/vfio/mdev/mdev_core.c| 36 > drivers/vfio/mdev/mdev_private.h | 2 ++ > include/linux/mdev.h | 23 > 3 files changed, 61 insertions(+) > > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index 0212f0ee8aea..5119809225c5 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool > force_remove) > return 0; > } > > +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + mdev->iommu_device = iommu_device; > + > + return 0; > +} > +EXPORT_SYMBOL(mdev_set_iommu_device); > + > +struct device *mdev_get_iommu_device(struct device *dev) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + return mdev->iommu_device; > +} > +EXPORT_SYMBOL(mdev_get_iommu_device); > + > +int mdev_set_iommu_domain(struct device *dev, void *domain) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + mdev->iommu_domain = domain; > + > + return 0; > +} > +EXPORT_SYMBOL(mdev_set_iommu_domain); > + > +void *mdev_get_iommu_domain(struct device *dev) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + return mdev->iommu_domain; > +} > +EXPORT_SYMBOL(mdev_get_iommu_domain); > + > static int __init mdev_init(void) > { > return mdev_bus_register(); > diff --git a/drivers/vfio/mdev/mdev_private.h > b/drivers/vfio/mdev/mdev_private.h > index b5819b7d7ef7..c01518068e84 100644 > --- a/drivers/vfio/mdev/mdev_private.h > +++ b/drivers/vfio/mdev/mdev_private.h > @@ -34,6 +34,8 @@ struct mdev_device { > struct list_head next; > struct kobject *type_kobj; > bool active; > + struct device *iommu_device; > + void *iommu_domain; > }; > > #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) > diff --git a/include/linux/mdev.h b/include/linux/mdev.h > index b6e048e1045f..c46777d3e568 100644 > --- a/include/linux/mdev.h > +++ b/include/linux/mdev.h > @@ -14,6 +14,29 @@ > #define MDEV_H > > struct mdev_device; > +struct iommu_domain; > + > +/* > + * Called by the parent device driver to set the PCI device which represents s/PCI // There is no requirement or expectation that the device is PCI. > + * this mdev in iommu protection scope. By default, the iommu device is NULL, > + * that indicates using vendor defined isolation. > + * > + * @dev: the mediated device that iommu will isolate. > + * @iommu_device: a pci device which represents the iommu for @dev. > + * > + * Return 0 for success, otherwise negative error value. > + */ > +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device); > + > +struct device *mdev_get_iommu_device(struct device *dev); > + > +/* > + * Called by vfio iommu modules to save the iommu domain after a domain being > + * attached to the mediated device. > + */ > +int mdev_set_iommu_domain(struct device *dev, void *domain); > + > +void *mdev_get_iommu_domain(struct device *dev); I can't say I really understand
Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist
Hi Christoph, On Sun, Nov 04, 2018 at 07:50:01AM -0800, Christoph Hellwig wrote: > On Thu, Nov 01, 2018 at 02:35:00PM -0700, Nicolin Chen wrote: > > The __GFP_ZERO will be passed down to the generic page allocation > > routine which zeros everything page by page. This is safe to be a > > generic way but not efficient for iommu allocation that organizes > > contiguous pages using scatterlist. > > > > So this changes drops __GFP_ZERO from the flag, and adds a manual > > memset after page/sg allocations, using the length of scatterlist. > > > > My test result of a 2.5MB size allocation shows iommu_dma_alloc() > > takes 46% less time, reduced from averagely 925 usec to 500 usec. > > And in what case does dma_alloc_* performance even matter? Honestly, this was amplified by running a local iommu benchmark test. Practically dma_alloc/free() should not be that stressful, but we cannot say the performance doesn't matter at all, right? Though many device drivers pre-allocte memory for DMA usage, it could matter where a driver dynamically allocates and releases. And actually I have a related question for you: I saw that the dma_direct_alloc() cancels the __GFP_ZERO flag and does manual memset() after allocation. Might that be possibly related to a performance concern? Though I don't see any performance keyword for that part of code, especially seems that memset() was there from the beginning. Thanks Nicolin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/vt-d: respect max guest address width in agaw
Supported guest address witdh (SGAW) only indicates what the iommu's capabilities are wrt page table levels for second level page-tables. IOMMU should pick the right level depending on the Maximum Guest Address Width (MGAW). For pass-through translation type, address width must be programmed with the largest AGAW supported by the HW. Reported-by: Ramos Falcon, Ernesto R Signed-off-by: Ashok Raj Signed-off-by: Jacob Pan --- drivers/iommu/intel-iommu.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 140d6ab..f16db3b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -719,7 +719,11 @@ int iommu_calculate_max_sagaw(struct intel_iommu *iommu) */ int iommu_calculate_agaw(struct intel_iommu *iommu) { - return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH); + unsigned long mgaw; + + /* Respect Max Guest Address Width */ + mgaw = min(cap_mgaw(iommu->cap), DEFAULT_DOMAIN_ADDRESS_WIDTH); + return __iommu_calculate_agaw(iommu, mgaw); } /* This functionin only returns single iommu in a domain */ -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: IOMMU breakage on arm64
Hi Geert, On 2018-11-06 7:44 pm, Geert Uytterhoeven wrote: Hi Christoph et al, On Tue, Oct 23, 2018 at 1:40 AM Linux Kernel Mailing List wrote: Commit: b4ebe6063204da58e48600b810a97c29ae9e5d12 Parent: 7d21ee4c719f00896767ce19c4c01a56374c2ced Refname:refs/heads/master Web: https://git.kernel.org/torvalds/c/b4ebe6063204da58e48600b810a97c29ae9e5d12 Author: Christoph Hellwig AuthorDate: Thu Sep 20 14:04:08 2018 +0200 Committer: Christoph Hellwig CommitDate: Mon Oct 1 07:28:03 2018 -0700 dma-direct: implement complete bus_dma_mask handling Instead of rejecting devices with a too small bus_dma_mask we can handle by taking the bus dma_mask into account for allocations and bounce buffering decisions. Signed-off-by: Christoph Hellwig I have bisected the following crash to the above commit: I think that commit mostly just changes the presentation of my underlying cockup - see here for what should fix it: https://patchwork.kernel.org/patch/10670177/ I have a feeling we've ironed out crash-on-early-domain-free bugs in the SMMU drivers already - arm-smmu certainly has an early return in arm_smmu_destroy_domain_context() which should behave exactly like your diff below, while I think arm-smmu-v3 gets away with it by virtue of smmu_domain->cfg being unset, but I'll double-check that when I'm fresh tomorrow (Jean-Philippe reported SMMUv3 hitting the DMA thing to me internally, but didn't mention any crash). Thanks for the report, Robin. ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for IOMMU page tables sata_rcar ee30.sata: Unable to initialize IPMMU context iommu: Failed to add device ee30.sata to group 0: -22 Unable to handle kernel NULL pointer dereference at virtual address 0038 Mem abort info: ESR = 0x9604 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0004 CM = 0, WnR = 0 [0038] user address but active_mm is swapper Internal error: Oops: 9604 [#1] PREEMPT SMP CPU: 2 PID: 51 Comm: kworker/2:1 Not tainted 4.20.0-rc1-arm64-renesas-dirty #74 Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT) Workqueue: events deferred_probe_work_func pstate: 6005 (nZCv daif -PAN -UAO) pc : ipmmu_domain_free+0x1c/0xa0 lr : ipmmu_domain_free+0x14/0xa0 sp : 09c9b990 x29: 09c9b990 x28: x27: 08dff000 x26: x25: 08dd9000 x24: 0014 x23: 8006fffdbb20 x22: 08dff000 x21: 8006f898e680 x20: 8006f8fa6c00 x19: 8006f8fa6c08 x18: 0037 x17: 0020 x16: 08a8f780 x15: 8006fb096f10 x14: 8006f93731c8 x13: x12: 8006fb096f10 x11: 8006f9372fe8 x10: 08dff708 x9 : 8006fb096f50 x8 : 08dff708 x7 : 089f1858 x6 : x5 : x4 : 8006f89a3000 x3 : 8006f7131000 x2 : 8006fb2b5700 x1 : x0 : 0028 Process kworker/2:1 (pid: 51, stack limit = 0x(ptrval)) Call trace: ipmmu_domain_free+0x1c/0xa0 iommu_group_release+0x48/0x68 kobject_put+0x74/0xe8 kobject_del.part.0+0x3c/0x50 kobject_put+0x60/0xe8 iommu_group_get_for_dev+0xa8/0x1f0 ipmmu_add_device+0x1c/0x40 of_iommu_configure+0x118/0x190 of_dma_configure+0xcc/0x1f0 platform_dma_configure+0x18/0x28 really_probe+0x94/0x2a8 driver_probe_device+0x58/0x100 __device_attach_driver+0x90/0xd0 bus_for_each_drv+0x64/0xc8 __device_attach+0xd8/0x130 device_initial_probe+0x10/0x18 bus_probe_device+0x98/0xa0 deferred_probe_work_func+0x74/0xb0 process_one_work+0x294/0x6f0 worker_thread+0x238/0x460 kthread+0x120/0x128 ret_from_fork+0x10/0x1c Code: aa0003f3 97ffeb1a d1002274 f85f8261 (f9401c20) ---[ end trace 4c46c7fd7cd07245 ]--- Reproducing on v4.20-rc1 requires CONFIG_IPMMU_VMSA=y, and whitelisting SATA for IOMMU use (https://patchwork.kernel.org/patch/10544059/). For older versions you also have to backport commit 3a0832d093693ede ("arm64: dts: renesas: salvator-xs: enable SATA"). Actually there are two issues at hand: 1) drivers/iommu/io-pgtable-arm.c:__arm_lpae_alloc_pages() allocates memory (above 4 GiB) and maps it for DMA use, but it is rejected due to: dma_map_single(dev, pages, size, DMA_TO_DEVICE) != virt_to_phys(pages) --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -27,7 +27,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) if (!dev->dma_mask) return false; - return addr +
IOMMU breakage on arm64 (was: Re: dma-direct: implement complete bus_dma_mask handling)
Hi Christoph et al, On Tue, Oct 23, 2018 at 1:40 AM Linux Kernel Mailing List wrote: > Commit: b4ebe6063204da58e48600b810a97c29ae9e5d12 > Parent: 7d21ee4c719f00896767ce19c4c01a56374c2ced > Refname:refs/heads/master > Web: > https://git.kernel.org/torvalds/c/b4ebe6063204da58e48600b810a97c29ae9e5d12 > Author: Christoph Hellwig > AuthorDate: Thu Sep 20 14:04:08 2018 +0200 > Committer: Christoph Hellwig > CommitDate: Mon Oct 1 07:28:03 2018 -0700 > > dma-direct: implement complete bus_dma_mask handling > > Instead of rejecting devices with a too small bus_dma_mask we can handle > by taking the bus dma_mask into account for allocations and bounce > buffering decisions. > > Signed-off-by: Christoph Hellwig I have bisected the following crash to the above commit: ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for IOMMU page tables sata_rcar ee30.sata: Unable to initialize IPMMU context iommu: Failed to add device ee30.sata to group 0: -22 Unable to handle kernel NULL pointer dereference at virtual address 0038 Mem abort info: ESR = 0x9604 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0004 CM = 0, WnR = 0 [0038] user address but active_mm is swapper Internal error: Oops: 9604 [#1] PREEMPT SMP CPU: 2 PID: 51 Comm: kworker/2:1 Not tainted 4.20.0-rc1-arm64-renesas-dirty #74 Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT) Workqueue: events deferred_probe_work_func pstate: 6005 (nZCv daif -PAN -UAO) pc : ipmmu_domain_free+0x1c/0xa0 lr : ipmmu_domain_free+0x14/0xa0 sp : 09c9b990 x29: 09c9b990 x28: x27: 08dff000 x26: x25: 08dd9000 x24: 0014 x23: 8006fffdbb20 x22: 08dff000 x21: 8006f898e680 x20: 8006f8fa6c00 x19: 8006f8fa6c08 x18: 0037 x17: 0020 x16: 08a8f780 x15: 8006fb096f10 x14: 8006f93731c8 x13: x12: 8006fb096f10 x11: 8006f9372fe8 x10: 08dff708 x9 : 8006fb096f50 x8 : 08dff708 x7 : 089f1858 x6 : x5 : x4 : 8006f89a3000 x3 : 8006f7131000 x2 : 8006fb2b5700 x1 : x0 : 0028 Process kworker/2:1 (pid: 51, stack limit = 0x(ptrval)) Call trace: ipmmu_domain_free+0x1c/0xa0 iommu_group_release+0x48/0x68 kobject_put+0x74/0xe8 kobject_del.part.0+0x3c/0x50 kobject_put+0x60/0xe8 iommu_group_get_for_dev+0xa8/0x1f0 ipmmu_add_device+0x1c/0x40 of_iommu_configure+0x118/0x190 of_dma_configure+0xcc/0x1f0 platform_dma_configure+0x18/0x28 really_probe+0x94/0x2a8 driver_probe_device+0x58/0x100 __device_attach_driver+0x90/0xd0 bus_for_each_drv+0x64/0xc8 __device_attach+0xd8/0x130 device_initial_probe+0x10/0x18 bus_probe_device+0x98/0xa0 deferred_probe_work_func+0x74/0xb0 process_one_work+0x294/0x6f0 worker_thread+0x238/0x460 kthread+0x120/0x128 ret_from_fork+0x10/0x1c Code: aa0003f3 97ffeb1a d1002274 f85f8261 (f9401c20) ---[ end trace 4c46c7fd7cd07245 ]--- Reproducing on v4.20-rc1 requires CONFIG_IPMMU_VMSA=y, and whitelisting SATA for IOMMU use (https://patchwork.kernel.org/patch/10544059/). For older versions you also have to backport commit 3a0832d093693ede ("arm64: dts: renesas: salvator-xs: enable SATA"). Actually there are two issues at hand: 1) drivers/iommu/io-pgtable-arm.c:__arm_lpae_alloc_pages() allocates memory (above 4 GiB) and maps it for DMA use, but it is rejected due to: dma_map_single(dev, pages, size, DMA_TO_DEVICE) != virt_to_phys(pages) > --- a/include/linux/dma-direct.h > +++ b/include/linux/dma-direct.h > @@ -27,7 +27,8 @@ static inline bool dma_capable(struct device *dev, > dma_addr_t addr, size_t size) > if (!dev->dma_mask) > return false; > > - return addr + size - 1 <= *dev->dma_mask; > + return addr + size - 1 <= > + min_not_zero(*dev->dma_mask, dev->bus_dma_mask); *dev->dma_mask = 0xff (40-bit) dev->bus_dma_mask = 0x (32-bit) Hence before, we had (in __arm_lpae_alloc_pages()): arm-lpae io-pgtable: pages = 8006f88f3000 arm-lpae io-pgtable: dma = 0x0007388f3000 arm-lpae io-pgtable: virt_to_phys(pages) = 0x0007388f3000 After this change, we have: arm-lpae io-pgtable: pages = 8006f882b000 arm-lpae io-pgtable: dma = 0x74009000 arm-lpae io-pgtable: virt_to_phys(pages) = 0x00073882b000 And SATA runs without using the IOMMU. 2) The Renesas IPMMU driver doesn't handle the above
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
Hi, On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote: > of_dma_configure() was *supposed* to be following the same logic as > acpi_dma_configure() and only setting bus_dma_mask if some range was > specified by the firmware. However, it seems that subtlety got lost in > the process of fitting it into the differently-shaped control flow, and > as a result the force_dma==true case ends up always setting the bus mask > to the 32-bit default, which is not what anyone wants. > > Make sure we only touch it if the DT actually said so. > > Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") > Reported-by: Aaro Koskinen > Reported-by: Jean-Philippe Brucker > Signed-off-by: Robin Murphy Tested-by: Aaro Koskinen This fixes the MMC driver DMA mask issue on OCTEON. Thanks, A. > --- > > Sorry about that... I guess I only have test setups that either have > dma-ranges or where a 32-bit bus mask goes unnoticed :( > > The Octeon and SMMU issues sound like they're purely down to this, and > it's probably related to at least one of John's Hikey woes. > > Robin. > > drivers/of/device.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/of/device.c b/drivers/of/device.c > index 0f27fad9fe94..757ae867674f 100644 > --- a/drivers/of/device.c > +++ b/drivers/of/device.c > @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct > device_node *np, bool force_dma) >* set by the driver. >*/ > mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1); > - dev->bus_dma_mask = mask; > + if (!ret) > + dev->bus_dma_mask = mask; > dev->coherent_dma_mask &= mask; > *dev->dma_mask &= mask; > > -- > 2.19.1.dirty > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist
On 02/11/2018 23:36, Nicolin Chen wrote: On Fri, Nov 02, 2018 at 04:54:07PM +, Robin Murphy wrote: On 01/11/2018 21:35, Nicolin Chen wrote: The __GFP_ZERO will be passed down to the generic page allocation routine which zeros everything page by page. This is safe to be a generic way but not efficient for iommu allocation that organizes contiguous pages using scatterlist. So this changes drops __GFP_ZERO from the flag, and adds a manual memset after page/sg allocations, using the length of scatterlist. My test result of a 2.5MB size allocation shows iommu_dma_alloc() takes 46% less time, reduced from averagely 925 usec to 500 usec. Assuming this is for arm64, I'm somewhat surprised that memset() could be that much faster than clear_page(), since they should effectively amount to the same thing (a DC ZVA loop). What hardware is this on? Profiling to try I am running with tegra186-p2771-.dtb so it's arm64 yes. and see exactly where the extra time goes would be interesting too. I re-ran the test to get some accuracy within the function and got: 1) pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); // reduced from 422 usec to 56 usec == 366 usec less 2) if (!(prot & IOMMU_CACHE)) {...} //flush routine // reduced from 439 usec to 236 usec == 203 usec less Note: new memset takes about 164 usec, resulting in 400 usec diff for the entire iommu_dma_alloc() function call. It looks like this might be more than the diff between clear_page and memset, and might be related to mapping and cache. Any idea? Hmm, I guess it might not be so much clear_page() itself as all the gubbins involved in getting there from prep_new_page(). I could perhaps make some vague guesses about how the A57 cores might get tickled by the different code patterns, but the Denver cores are well beyond my ability to reason about. Out of even further curiosity, how does the quick hack below compare? @@ -568,6 +571,15 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, if (attrs & DMA_ATTR_ALLOC_SINGLE_PAGES) alloc_sizes = min_size; + /* +* The generic zeroing in a length of one page size is slow, +* so do it manually in a length of scatterlist size instead +*/ + if (gfp & __GFP_ZERO) { + gfp &= ~__GFP_ZERO; + gfp_zero = true; + } Or just mask it out in __iommu_dma_alloc_pages()? Yea, the change here would be neater then. @@ -581,6 +593,12 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL)) goto out_free_iova; + if (gfp_zero) { + /* Now zero all the pages in the scatterlist */ + for_each_sg(sgt.sgl, s, sgt.orig_nents, i) + memset(sg_virt(s), 0, s->length); What if the pages came from highmem? I know that doesn't happen on arm64 today, but the point of this code *is* to be generic, and other users will arrive eventually. Hmm, so it probably should use sg_miter_start/stop() too? Looking at the flush routine doing in PAGE_SIZE for each iteration, would be possible to map and memset contiguous pages together? Actually the flush routine might be also optimized if we can map contiguous pages. I suppose the ideal point at which to do it would be after the remapping when we have the entire buffer contiguous in vmalloc space and can make best use of prefetchers etc. - DMA_ATTR_NO_KERNEL_MAPPING is a bit of a spanner in the works, but we could probably accommodate a special case for that. As Christoph points out, this isn't really the place to be looking for performance anyway (unless it's pathologically bad as per the DMA_ATTR_ALLOC_SINGLE_PAGES fun), but if we're looking at pulling the remapping out of the arch code, maybe we could aim to rework the zeroing completely as part of that. Robin. ->8- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b04753b204..7d28db3bf4bf 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -569,7 +569,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, alloc_sizes = min_size; count = PAGE_ALIGN(size) >> PAGE_SHIFT; - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); + pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp & ~__GFP_ZERO); if (!pages) return NULL; @@ -581,15 +581,18 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL)) goto out_free_iova; - if (!(prot & IOMMU_CACHE)) { + { struct sg_mapping_iter miter; /* * The CPU-centric flushing implied by SG_MITER_TO_SG isn't *
Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
On Mon, Oct 22, 2018 at 12:50:56PM +0100, Robin Murphy wrote: > To me, that sounds like a very good argument for having separate "attach as > primary domain" and "attach as aux domain" APIs. I agree with that, overloading iommu_attach_device() to support aux-domains is not going to be a maintainable solution. I'd like this to be a two-level approach, where the aux-domains are sub-domains of a primary domain (naming is still to-be-improved): struct iommu_domain *domain; domain = iommu_alloc_aux_domain(); iommu_attach_device(pdev, domain); /* Fails if device has not support for this domain-type */ /* Bind a process address space to the aux-domain */ sva_handle = iommu_sva_bind_mm(domain, current, ...); pasid = iommu_sva_get_pasid(sva_handle); mdev_handle = iommu_mdev_alloc_domain(domain); iommu_mdev_map(mdev_handle, ...); iommu_mdev_unmap(mdev_handle, ...); /* Do more work */ iommu_sva_unbind_mm(sva_handle); iommu_mdev_free_domain(mdev_handle); iommu_detach_device(domain, pdev); Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/vtd: Fix NULL pointer dereference in prq_event_thread()
On Mon, Nov 05, 2018 at 10:18:58AM +0800, Lu Baolu wrote: > When handling page request without pasid event, go to "no_pasid" > branch instead of "bad_req". Otherwise, a NULL pointer deference > will happen there. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Sohil Mehta > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel-svm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied for v4.20, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/ipmmu-vmsa: Hook up r8a77990 DT matching code
On Wed, Oct 17, 2018 at 11:13:22AM +0200, Simon Horman wrote: > From: Hai Nguyen Pham > > Support the R-Car E3 (r8a77990) IPMMU. > > Signed-off-by: Hai Nguyen Pham > Signed-off-by: Kazuya Mizuguchi > [simon: rebased; dropped no longer required IOMMU_OF_DECLARE hunk] > Signed-off-by: Simon Horman Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: Do physical merging in iommu_map_sg()
On Thu, Oct 11, 2018 at 04:56:42PM +0100, Robin Murphy wrote: > The original motivation for iommu_map_sg() was to give IOMMU drivers the > chance to map an IOVA-contiguous scatterlist as efficiently as they > could. It turns out that there isn't really much driver-specific > business involved there, so now that the default implementation is > mandatory let's just improve that - the main thing we're after is to use > larger pages wherever possible, and as long as domain->pgsize_bitmap > reflects reality, iommu_map() can already do that in a generic way. All > we need to do is detect physically-contiguous segments and batch them > into a single map operation, since whatever we do here is transparent to > our caller and not bound by any segment-length restrictions on the list > itself. > > Speaking of efficiency, there's really very little point in duplicating > the checks that iommu_map() is going to do anyway, so those get cleared > up in the process. > > Signed-off-by: Robin Murphy Looks correct, applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist
On 05/11/2018 14:58, Christoph Hellwig wrote: On Fri, Nov 02, 2018 at 04:36:13PM -0700, Nicolin Chen wrote: What if the pages came from highmem? I know that doesn't happen on arm64 today, but the point of this code *is* to be generic, and other users will arrive eventually. Hmm, so it probably should use sg_miter_start/stop() too? Looking at the flush routine doing in PAGE_SIZE for each iteration, would be possible to map and memset contiguous pages together? Actually the flush routine might be also optimized if we can map contiguous pages. FYI, I have patches I plan to submit soon that gets rid of the struct scatterlist use in this code to simplify it: ...and I have some significant objections to that simplification which I plan to respond with ;) (namely that it defaults the whole higher-order page allocation business which will have varying degrees of performance impact on certain cases) Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] of/device: Really only set bus DMA mask when appropriate
of_dma_configure() was *supposed* to be following the same logic as acpi_dma_configure() and only setting bus_dma_mask if some range was specified by the firmware. However, it seems that subtlety got lost in the process of fitting it into the differently-shaped control flow, and as a result the force_dma==true case ends up always setting the bus mask to the 32-bit default, which is not what anyone wants. Make sure we only touch it if the DT actually said so. Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") Reported-by: Aaro Koskinen Reported-by: Jean-Philippe Brucker Signed-off-by: Robin Murphy --- Sorry about that... I guess I only have test setups that either have dma-ranges or where a 32-bit bus mask goes unnoticed :( The Octeon and SMMU issues sound like they're purely down to this, and it's probably related to at least one of John's Hikey woes. Robin. drivers/of/device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/of/device.c b/drivers/of/device.c index 0f27fad9fe94..757ae867674f 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) * set by the driver. */ mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1); - dev->bus_dma_mask = mask; + if (!ret) + dev->bus_dma_mask = mask; dev->coherent_dma_mask &= mask; *dev->dma_mask &= mask; -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] iommu/vt-d: Group and domain relationship
Hey Lu, Would you be able to go into more detail about the issues with allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc? Cheers, James. On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu wrote: > > Hi, > > On 10/30/18 10:18 PM, James Sewart via iommu wrote: > > Hey, > > > > I’ve been investigating the relationship between iommu groups and domains > > on our systems and have a few question. Why does the intel iommu code not > > allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain > > type has the side effect that the default_domain for an iommu group is not > > set, which, when using for e.g. dma_map_ops.map_page, means a domain is > > allocated per device. > > Intel vt-d driver doesn't implement the default domain and allocates > domain only on demanded. There are lots of things to do before we allow > iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED. > > Best regards, > Lu Baolu > > > > > This seems to be the opposite behaviour to the AMD iommu code which > > supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu > > group if a domain is not attached to the device rather than allocating a > > new one. On AMD every device in an iommu group will share the same domain. > > > > Appended is what I think a patch to implement domain_alloc for > > IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing > > shows each device in a group will share a domain by default, it also > > allows allocating a new dma domain that can be successfully attached to a > > group with iommu_attach_group. > > > > Looking for comment on why the behaviour is how it is currently and if > > there are any issues with the solution I’ve been testing. > > > > Cheers, > > James. > > > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > index bff2abd6..3a58389f 100644 > > --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5170,10 +5170,15 @@ static struct iommu_domain > > *intel_iommu_domain_alloc(unsigned type) > > struct dmar_domain *dmar_domain; > > struct iommu_domain *domain; > > > > - if (type != IOMMU_DOMAIN_UNMANAGED) > > + if (type == IOMMU_DOMAIN_UNMANAGED) > > + dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); > > + else if(type == IOMMU_DOMAIN_DMA) > > + dmar_domain = alloc_domain(0); > > + else if(type == IOMMU_DOMAIN_IDENTITY) > > + dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY); > > + else > > return NULL; > > > > - dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); > > if (!dmar_domain) { > > pr_err("Can't allocate dmar_domain\n"); > > return NULL; > > @@ -5186,9 +5191,12 @@ static struct iommu_domain > > *intel_iommu_domain_alloc(unsigned type) > > domain_update_iommu_cap(dmar_domain); > > > > domain = _domain->domain; > > - domain->geometry.aperture_start = 0; > > - domain->geometry.aperture_end = __DOMAIN_MAX_ADDR(dmar_domain->gaw); > > - domain->geometry.force_aperture = true; > > + > > + if (type == IOMMU_DOMAIN_UNMANAGED) { > > + domain->geometry.aperture_start = 0; > > + domain->geometry.aperture_end = > > __DOMAIN_MAX_ADDR(dmar_domain->gaw); > > + domain->geometry.force_aperture = true; > > + } > > > > return domain; > > } > > ___ > > iommu mailing list > > iommu@lists.linux-foundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/iommu > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu