RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Monday, March 27, 2017 6:14 PM > To: Liu, Yi L ; Alex Williamson > > Cc: Shanker Donthineni ; k...@vger.kernel.org; > Catalin Marinas ; Sinan Kaya > ; Will Deacon ; > iommu@lists.linux-foundation.org; Harv Abdulhamid ; > linux-...@vger.kernel.org; Bjorn Helgaas ; David > Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate > Watterson ; Tian, Kevin ; > Lan, Tianyu ; Raj, Ashok ; Pan, > Jacob > jun ; Joerg Roedel ; Robin Murphy > > Subject: Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory > > On 24/03/17 07:46, Liu, Yi L wrote: > [...] > > So we need some kind of high-level classification that the vIOMMU > must communicate to the physical one. Each IOMMU flavor would get a > unique, global identifier, simply to make sure that vIOMMU and > pIOMMU speak > >> the same language. > For example: > > 0x65776886 "AMDV" AMD IOMMU > 0x73788476 "INTL" Intel IOMMU > 0x83515748 "S390" s390 IOMMU > 0x8385 "SMMU" ARM SMMU > etc. > > It needs to be a global magic number that everyone can recognize. > Could be as simple as 32-bit numbers allocated from 0. Once we have > a global magic number, we can use it to differentiate > architecture-specific > details. > > > > I prefer simple numbers to stand for each vendor. > > Sure, I don't have any preference. Simple numbers could be easier to allocate. > > >>> I may need to think more on this part. > >>> > struct pasid_table_info { > __u64 ptr; > __u64 size; /* Is it number of entry or size in > bytes? */ > >>> > >>> For Intel platform, it's encoded. But I can make it in bytes. Here, > >>> I'd like to check with you if whole guest PASID info is also needed on > >>> ARM? > >> > >> It will be needed on ARM if someone ever emulates the SMMU with SVM. > >> Though I'm not planning on doing that myself, it is unavoidable. And > >> it would be a shame for the next SVM virtualization solution to have > >> to introduce a new flag "VFIO_SVM_BIND_PASIDPT_2" if they could reuse > >> most of the BIND_PASIDPT interface but simply needed to add one or > >> two configuration fields specific to their IOMMU. > > > > So you are totally fine with putting PASID table ptr and size in the > > generic part? Maybe we have different usage for it. For me, it's a > > guest PASID table ptr. For you, it may be different. > > It's the same for SMMU, with some added format specifiers that would go in > 'opaque[]'. I think that table pointer and size (in bytes, or number of > entries) is generic enough for a "bind table" call and can be reused by future > implementations. > > > __u32 model;/* magic number */ > __u32 variant; /* version of the IOMMU architecture, > maybe? IOMMU-specific. */ > > > > For variant, it will be combined with model to do sanity check. Am I right? > > Maybe it could be moved to opaque. > > Yes I guess it could be moved to opaque. It would be a version of the model > used, so > we wouldn't have to allocate a new model number whenever an architecture > updates the fields of its PASID descriptors, but we can let IOMMU drivers > decide if > they need it and what to put in there. > > __u8 opaque[]; /* IOMMU-specific details */ > }; > > [...] > >> > >> Yes, that seems sensible. I could add an explicit VFIO_BIND_PASID > >> flags to make it explicit that data[] is "u32 pasid" and avoid having any > >> default. > > > > Add it in the comment I suppose. The length is 4 byes, it could be deduced > > from > argsz. > > > >> > > > #define VFIO_SVM_PASSDOWN_INVALIDATE(1 << 1) > > Using the vfio_device_svm structure for invalidate operations is a > bit odd, it might be nicer to add a new VFIO_SVM_INVALIDATE ioctl, > that takes the above iommu_svm_tlb_invalidate_info as argument > (with an added argsz.) > >>> > >>> Agree, would add a separate IOCTL for invalidation. > >>> > > > #define VFIO_SVM_PASID_RELEASE_FLUSHED (1 << 2) > > #define VFIO_SVM_PASID_RELEASE_CLEAN (1 << 3) > >__u32 flags; > >__u32 length; > > If length is the size of data[], I guess we can already deduce this info > from argsz. > >>> > >>> yes, it is size of data. Maybe remove argsz. How about your opinion? > >> > >> Argsz as first argument is standard across all VFIO ioctls, it allows > >> the kernel to check that the structure passed by the guest is > >> consistent with the structure it knows (see the comment at the > >> beginning of > >> include/uapi/linux/vfio.h) So argsz would be the preferred way to > >> communicate the size of data[] > > > > yes, it makes sense. BTW. I think it is still possible to leave the > > "length" since there is similar usage.
Re: [RFC PATCH] of: Fix DMA configuration for non-DT masters
On Wed, Mar 29, 2017 at 10:23 AM, Oza Oza wrote: > On Wed, Mar 29, 2017 at 12:27 AM, Robin Murphy wrote: >> For PCI masters not represented in DT, we pass the OF node of their >> associated host bridge to of_dma_configure(), such that they can inherit >> the appropriate DMA configuration from whatever is described there. >> Unfortunately, whilst this has worked for the "dma-coherent" property, >> it turns out to miss the case where the host bridge node has a non-empty >> "dma-ranges", since nobody is expecting the 'device' to be a bus itself. >> >> It transpires, though, that the de-facto interface since the prototype >> change in 1f5c69aa51f9 ("of: Move of_dma_configure() to device.c to help >> re-use") is very clear-cut: either the master_np argument is redundant >> with dev->of_node, or dev->of_node is NULL and master_np is the relevant >> parent bus. Let's ratify that behaviour, then teach the whole >> of_dma_configure() pipeline to cope with both cases properly. >> >> Signed-off-by: Robin Murphy >> --- >> >> This is what I'd consider the better fix - rather than adding yet more >> special cases - which will also make it simple to handle multiple >> "dma-ranges" entries with minimal further disruption. The callers now >> left passing dev->of_node as 'parent' are harmless, but look a bit silly >> and clearly want cleaning up - I'd be partial to renaming the existing >> function and having a single-argument wrapper for the 'normal' case, e.g.: >> >> static inline int of_dma_configure(struct device_node *dev) >> { >> return of_dma_configure_parent(dev, NULL); >> } >> >> Thoughts? >> >> Robin. >> >> drivers/iommu/of_iommu.c | 7 --- >> drivers/of/address.c | 37 + >> drivers/of/device.c| 12 +++- >> include/linux/of_address.h | 7 --- >> include/linux/of_device.h | 4 ++-- >> include/linux/of_iommu.h | 4 ++-- >> 6 files changed, 44 insertions(+), 27 deletions(-) >> >> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c >> index 2683e9fc0dcf..35aff07bb5eb 100644 >> --- a/drivers/iommu/of_iommu.c >> +++ b/drivers/iommu/of_iommu.c >> @@ -138,21 +138,22 @@ static const struct iommu_ops >> } >> >> const struct iommu_ops *of_iommu_configure(struct device *dev, >> - struct device_node *master_np) >> + struct device_node *parent) >> { >> struct of_phandle_args iommu_spec; >> - struct device_node *np; >> + struct device_node *np, *master_np; >> const struct iommu_ops *ops = NULL; >> int idx = 0; >> >> if (dev_is_pci(dev)) >> - return of_pci_iommu_configure(to_pci_dev(dev), master_np); >> + return of_pci_iommu_configure(to_pci_dev(dev), parent); >> >> /* >> * We don't currently walk up the tree looking for a parent IOMMU. >> * See the `Notes:' section of >> * Documentation/devicetree/bindings/iommu/iommu.txt >> */ >> + master_np = dev->of_node ? dev->of_node : parent; >> while (!of_parse_phandle_with_args(master_np, "iommus", >>"#iommu-cells", idx, >>&iommu_spec)) { >> diff --git a/drivers/of/address.c b/drivers/of/address.c >> index 02b2903fe9d2..833bc17f5e55 100644 >> --- a/drivers/of/address.c >> +++ b/drivers/of/address.c >> @@ -808,6 +808,7 @@ EXPORT_SYMBOL(of_io_request_and_map); >> /** >> * of_dma_get_range - Get DMA range info >> * @np:device node to get DMA range info >> + * @parent:node of device's parent bus, if @np is NULL >> * @dma_addr: pointer to store initial DMA address of DMA range >> * @paddr: pointer to store initial CPU address of DMA range >> * @size: pointer to store size of DMA range >> @@ -822,36 +823,48 @@ EXPORT_SYMBOL(of_io_request_and_map); >> * It returns -ENODEV if "dma-ranges" property was not found >> * for this device in DT. >> */ >> -int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 >> *size) >> +int of_dma_get_range(struct device_node *np, struct device_node *parent, >> +u64 *dma_addr, u64 *paddr, u64 *size) >> { >> - struct device_node *node = of_node_get(np); >> + struct device_node *node; >> const __be32 *ranges = NULL; >> int len, naddr, nsize, pna; >> int ret = 0; >> u64 dmaaddr; >> >> - if (!node) >> - return -EINVAL; >> - >> - while (1) { >> + if (np) { >> + node = of_node_get(np); >> naddr = of_n_addr_cells(node); >> nsize = of_n_size_cells(node); >> node = of_get_next_parent(node); >> - if (!node) >> - break; >> + } else if (parent) { >> + node = of_node_get(parent); >> + np = p
Re: [RFC PATCH] of: Fix DMA configuration for non-DT masters
On Wed, Mar 29, 2017 at 12:27 AM, Robin Murphy wrote: > For PCI masters not represented in DT, we pass the OF node of their > associated host bridge to of_dma_configure(), such that they can inherit > the appropriate DMA configuration from whatever is described there. > Unfortunately, whilst this has worked for the "dma-coherent" property, > it turns out to miss the case where the host bridge node has a non-empty > "dma-ranges", since nobody is expecting the 'device' to be a bus itself. > > It transpires, though, that the de-facto interface since the prototype > change in 1f5c69aa51f9 ("of: Move of_dma_configure() to device.c to help > re-use") is very clear-cut: either the master_np argument is redundant > with dev->of_node, or dev->of_node is NULL and master_np is the relevant > parent bus. Let's ratify that behaviour, then teach the whole > of_dma_configure() pipeline to cope with both cases properly. > > Signed-off-by: Robin Murphy > --- > > This is what I'd consider the better fix - rather than adding yet more > special cases - which will also make it simple to handle multiple > "dma-ranges" entries with minimal further disruption. The callers now > left passing dev->of_node as 'parent' are harmless, but look a bit silly > and clearly want cleaning up - I'd be partial to renaming the existing > function and having a single-argument wrapper for the 'normal' case, e.g.: > > static inline int of_dma_configure(struct device_node *dev) > { > return of_dma_configure_parent(dev, NULL); > } > > Thoughts? > > Robin. > > drivers/iommu/of_iommu.c | 7 --- > drivers/of/address.c | 37 + > drivers/of/device.c| 12 +++- > include/linux/of_address.h | 7 --- > include/linux/of_device.h | 4 ++-- > include/linux/of_iommu.h | 4 ++-- > 6 files changed, 44 insertions(+), 27 deletions(-) > > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c > index 2683e9fc0dcf..35aff07bb5eb 100644 > --- a/drivers/iommu/of_iommu.c > +++ b/drivers/iommu/of_iommu.c > @@ -138,21 +138,22 @@ static const struct iommu_ops > } > > const struct iommu_ops *of_iommu_configure(struct device *dev, > - struct device_node *master_np) > + struct device_node *parent) > { > struct of_phandle_args iommu_spec; > - struct device_node *np; > + struct device_node *np, *master_np; > const struct iommu_ops *ops = NULL; > int idx = 0; > > if (dev_is_pci(dev)) > - return of_pci_iommu_configure(to_pci_dev(dev), master_np); > + return of_pci_iommu_configure(to_pci_dev(dev), parent); > > /* > * We don't currently walk up the tree looking for a parent IOMMU. > * See the `Notes:' section of > * Documentation/devicetree/bindings/iommu/iommu.txt > */ > + master_np = dev->of_node ? dev->of_node : parent; > while (!of_parse_phandle_with_args(master_np, "iommus", >"#iommu-cells", idx, >&iommu_spec)) { > diff --git a/drivers/of/address.c b/drivers/of/address.c > index 02b2903fe9d2..833bc17f5e55 100644 > --- a/drivers/of/address.c > +++ b/drivers/of/address.c > @@ -808,6 +808,7 @@ EXPORT_SYMBOL(of_io_request_and_map); > /** > * of_dma_get_range - Get DMA range info > * @np:device node to get DMA range info > + * @parent:node of device's parent bus, if @np is NULL > * @dma_addr: pointer to store initial DMA address of DMA range > * @paddr: pointer to store initial CPU address of DMA range > * @size: pointer to store size of DMA range > @@ -822,36 +823,48 @@ EXPORT_SYMBOL(of_io_request_and_map); > * It returns -ENODEV if "dma-ranges" property was not found > * for this device in DT. > */ > -int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 > *size) > +int of_dma_get_range(struct device_node *np, struct device_node *parent, > +u64 *dma_addr, u64 *paddr, u64 *size) > { > - struct device_node *node = of_node_get(np); > + struct device_node *node; > const __be32 *ranges = NULL; > int len, naddr, nsize, pna; > int ret = 0; > u64 dmaaddr; > > - if (!node) > - return -EINVAL; > - > - while (1) { > + if (np) { > + node = of_node_get(np); > naddr = of_n_addr_cells(node); > nsize = of_n_size_cells(node); > node = of_get_next_parent(node); > - if (!node) > - break; > + } else if (parent) { > + node = of_node_get(parent); > + np = parent; > + if (of_property_read_u32(node, "#address-cells", &naddr)) > + naddr = of_n_addr_cells(node); > + if (of_
Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
On Tue, Mar 28, 2017 at 7:59 PM, Robin Murphy wrote: > On 28/03/17 06:27, Oza Oza wrote: >> On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring wrote: >>> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep >>> wrote: it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; >>> >>> If you implement a common function, then I expect to see other users >>> converted to use it. There's also PCI hosts in arch/powerpc that parse >>> dma-ranges. >> >> the common function should be similar to what >> of_pci_get_host_bridge_resources is doing right now. >> it parses ranges property right now. >> >> the new function would look look following. >> >> of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources) >> where resources would return the dma-ranges. >> >> but right now if you see the patch, of_dma_configure calls the new >> function, which actually returns the largest possible size. >> so this new function has to be generic in a way where other PCI hosts >> can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can >> use it for sure. >> >> although having powerpc using it; is a separate exercise, since I do >> not have any access to other PCI hosts such as powerpc. but we can >> workout with them on thsi forum if required. >> >> so overall, of_pci_get_dma_ranges has to serve following 2 purposes. >> >> 1) it has to return largest possible size to of_dma_configure to >> generate largest possible dma_mask. >> >> 2) it also has to return resources (dma-ranges) parsed, to the users. >> >> so to address above needs >> >> of_pci_get_dma_ranges(struct device_node *dev, struct list_head >> *resources, u64 *size) >> >> dev -> device node. >> resources -> dma-ranges in allocated list. >> size -> highest possible size to generate possible dma_mask for >> of_dma_configure. >> >> let em know how this sounds. > > Note that the point of passing PCI host bridges into of_dma_configure() > in the first place was to avoid having some separate PCI-specific path > for DMA configuration. I worry that introducing bus-specific dma-ranges > parsing largely defeats that, since we end up with the worst of both > worlds; effectively-duplicated code, and/or a load of extra complexity > to then attempt to reconverge the divergent paths (there really > shouldn't be any need to allocate a list of anything). Given that > of_translate_dma_address() is already bus-agnostic, it hardly seems > justifiable for its caller not to be so as well, especially when it > mostly just comes down to getting the right #address-cells value. > > The patch below is actually enough to make typical cases work, but is > vile, so I'm not seriously considering it (hence I've not bothered > making IOMMU configuration handle all circumstances). What it has served > to do, though, is give me a clear idea of how to properly sort out the > not-quite-right device/parent assumptions between of_dma_configure() and > of_dma_get_range() rather than bodging around them any further - stay tuned. > > Robin. > > ->8- > From: Robin Murphy > Subject: [PATCH] of/pci: Use child node for DMA configuration > > of_dma_configure() expects to be passed an OF node representing the > device being configured - for PCI devices we currently pass the node of > the appropriate host controller, which sort of works for inherited > properties which may appear at any level, like "dma-coherent", but falls > apart for properties which actually care about specific device-parent > relationships, like "dma-ranges". > > Solve this by attempting to find a suitable child node if the PCI > hierarchy is actually represented in DT, and if not then faking one up > as a last resort, to make all of DMA configuration work as expected. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/of_iommu.c | 3 ++- > drivers/pci/of.c | 24 > drivers/pci/probe.c | 14 +- > include/linux/pci.h | 3
[RFC PATCH] of: Fix DMA configuration for non-DT masters
For PCI masters not represented in DT, we pass the OF node of their associated host bridge to of_dma_configure(), such that they can inherit the appropriate DMA configuration from whatever is described there. Unfortunately, whilst this has worked for the "dma-coherent" property, it turns out to miss the case where the host bridge node has a non-empty "dma-ranges", since nobody is expecting the 'device' to be a bus itself. It transpires, though, that the de-facto interface since the prototype change in 1f5c69aa51f9 ("of: Move of_dma_configure() to device.c to help re-use") is very clear-cut: either the master_np argument is redundant with dev->of_node, or dev->of_node is NULL and master_np is the relevant parent bus. Let's ratify that behaviour, then teach the whole of_dma_configure() pipeline to cope with both cases properly. Signed-off-by: Robin Murphy --- This is what I'd consider the better fix - rather than adding yet more special cases - which will also make it simple to handle multiple "dma-ranges" entries with minimal further disruption. The callers now left passing dev->of_node as 'parent' are harmless, but look a bit silly and clearly want cleaning up - I'd be partial to renaming the existing function and having a single-argument wrapper for the 'normal' case, e.g.: static inline int of_dma_configure(struct device_node *dev) { return of_dma_configure_parent(dev, NULL); } Thoughts? Robin. drivers/iommu/of_iommu.c | 7 --- drivers/of/address.c | 37 + drivers/of/device.c| 12 +++- include/linux/of_address.h | 7 --- include/linux/of_device.h | 4 ++-- include/linux/of_iommu.h | 4 ++-- 6 files changed, 44 insertions(+), 27 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 2683e9fc0dcf..35aff07bb5eb 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -138,21 +138,22 @@ static const struct iommu_ops } const struct iommu_ops *of_iommu_configure(struct device *dev, - struct device_node *master_np) + struct device_node *parent) { struct of_phandle_args iommu_spec; - struct device_node *np; + struct device_node *np, *master_np; const struct iommu_ops *ops = NULL; int idx = 0; if (dev_is_pci(dev)) - return of_pci_iommu_configure(to_pci_dev(dev), master_np); + return of_pci_iommu_configure(to_pci_dev(dev), parent); /* * We don't currently walk up the tree looking for a parent IOMMU. * See the `Notes:' section of * Documentation/devicetree/bindings/iommu/iommu.txt */ + master_np = dev->of_node ? dev->of_node : parent; while (!of_parse_phandle_with_args(master_np, "iommus", "#iommu-cells", idx, &iommu_spec)) { diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903fe9d2..833bc17f5e55 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -808,6 +808,7 @@ EXPORT_SYMBOL(of_io_request_and_map); /** * of_dma_get_range - Get DMA range info * @np:device node to get DMA range info + * @parent:node of device's parent bus, if @np is NULL * @dma_addr: pointer to store initial DMA address of DMA range * @paddr: pointer to store initial CPU address of DMA range * @size: pointer to store size of DMA range @@ -822,36 +823,48 @@ EXPORT_SYMBOL(of_io_request_and_map); * It returns -ENODEV if "dma-ranges" property was not found * for this device in DT. */ -int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size) +int of_dma_get_range(struct device_node *np, struct device_node *parent, +u64 *dma_addr, u64 *paddr, u64 *size) { - struct device_node *node = of_node_get(np); + struct device_node *node; const __be32 *ranges = NULL; int len, naddr, nsize, pna; int ret = 0; u64 dmaaddr; - if (!node) - return -EINVAL; - - while (1) { + if (np) { + node = of_node_get(np); naddr = of_n_addr_cells(node); nsize = of_n_size_cells(node); node = of_get_next_parent(node); - if (!node) - break; + } else if (parent) { + node = of_node_get(parent); + np = parent; + if (of_property_read_u32(node, "#address-cells", &naddr)) + naddr = of_n_addr_cells(node); + if (of_property_read_u32(node, "#size-cells", &nsize)) + nsize = of_n_size_cells(node); + } else { + return -EINVAL; + } + while (node) { ranges = of_get_property(node, "dma-ranges", &len); -
Re: [PATCH V9 00/11] IOMMU probe deferral support
Hi, On 3/28/2017 7:45 PM, Shameerali Kolothum Thodi wrote: -Original Message- From: Sricharan R [mailto:sricha...@codeaurora.org] Sent: Tuesday, March 28, 2017 5:54 AM To: Robin Murphy; Shameerali Kolothum Thodi; Wangzhou (B); will.dea...@arm.com; j...@8bytes.org; lorenzo.pieral...@arm.com; iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org; linux-arm-...@vger.kernel.org; m.szyprow...@samsung.com; bhelg...@google.com; linux-...@vger.kernel.org; linux- a...@vger.kernel.org; t...@semihalf.com; hanjun@linaro.org; ok...@codeaurora.org Subject: Re: [PATCH V9 00/11] IOMMU probe deferral support Hi, [...] From the logs its clear that when ixgbevf driver originally probes and adds the device to smmu the dma mask is 32, but when it binds to vfio-pci, it becomes 64 bit. Just to add to that, the mask is set to 64 bit in the ixgebvf driver probe[1] Aha, but of course it's still the same struct device getting bound to VFIO later, so whatever mask the first driver set is still in there when we go through of_dma_configure() the second time (and the fact that we go through more than once being the new behaviour). So yes, this is a legitimate problem and we really do need to be robust against size overflow. I reckon the below tweak of your fix is probably the way to go; cleaning up the arch_setup_dma_ops() interface can happen later. ok, i will add this fix separately and also the acpi fix that lorenzo has suggested in patch #8 in to the series after testing confirmation. I can confirm that the patches fixes the issues reported here . Both DT and ACPI works now. Thanks for the testing. Will repost with the fixes. Regards, Sricharan -- "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [V9, 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error
Hi, On 3/28/2017 8:30 PM, Rob Herring wrote: On Fri, Mar 10, 2017 at 12:30:57AM +0530, Sricharan R wrote: From: Laurent Pinchart Failures to look up an IOMMU when parsing the DT iommus property need to be handled separately from the .of_xlate() failures to support deferred probing. The lack of a registered IOMMU can be caused by the lack of a driver for the IOMMU, the IOMMU device probe not having been performed yet, having been deferred, or having failed. The first case occurs when the device tree describes the bus master and IOMMU topology correctly but no device driver exists for the IOMMU yet or the device driver has not been compiled in. Return NULL, the caller will configure the device without an IOMMU. The second and third cases are handled by deferring the probe of the bus master device which will eventually get reprobed after the IOMMU. The last case is currently handled by deferring the probe of the bus master device as well. A mechanism to either configure the bus master device without an IOMMU or to fail the bus master device probe depending on whether the IOMMU is optional or mandatory would be a good enhancement. Tested-by: Marek Szyprowski Signed-off-by: Laurent Pichart Signed-off-by: Sricharan R --- drivers/base/dma-mapping.c | 5 +++-- drivers/iommu/of_iommu.c | 4 ++-- drivers/of/device.c| 7 ++- include/linux/of_device.h | 9 ++--- 4 files changed, 17 insertions(+), 8 deletions(-) Maybe it is the same issue reported for VFIO, but virtio-pci is broken with v8 of this series. Bisecting blames this commit which looks like it hasn't changeed. Ya, as Robin mentioned it was broken at patch #2 and comes out after this patch. I am going to repost this series with couple of more fixes added as well. Rob P.S. Doesn't look like you have copied the DT maintainers nor list for the DT changes. Ha, really sorry about that. will add it. Regards, Sricharan -- "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [V9, 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error
On 28/03/17 16:00, Rob Herring wrote: > On Fri, Mar 10, 2017 at 12:30:57AM +0530, Sricharan R wrote: >> From: Laurent Pinchart >> >> Failures to look up an IOMMU when parsing the DT iommus property need to >> be handled separately from the .of_xlate() failures to support deferred >> probing. >> >> The lack of a registered IOMMU can be caused by the lack of a driver for >> the IOMMU, the IOMMU device probe not having been performed yet, having >> been deferred, or having failed. >> >> The first case occurs when the device tree describes the bus master and >> IOMMU topology correctly but no device driver exists for the IOMMU yet >> or the device driver has not been compiled in. Return NULL, the caller >> will configure the device without an IOMMU. >> >> The second and third cases are handled by deferring the probe of the bus >> master device which will eventually get reprobed after the IOMMU. >> >> The last case is currently handled by deferring the probe of the bus >> master device as well. A mechanism to either configure the bus master >> device without an IOMMU or to fail the bus master device probe depending >> on whether the IOMMU is optional or mandatory would be a good >> enhancement. >> >> Tested-by: Marek Szyprowski >> Signed-off-by: Laurent Pichart >> Signed-off-by: Sricharan R >> --- >> drivers/base/dma-mapping.c | 5 +++-- >> drivers/iommu/of_iommu.c | 4 ++-- >> drivers/of/device.c| 7 ++- >> include/linux/of_device.h | 9 ++--- >> 4 files changed, 17 insertions(+), 8 deletions(-) > > Maybe it is the same issue reported for VFIO, but virtio-pci is broken > with v8 of this series. Bisecting blames this commit which looks like it > hasn't changeed. v8 managed to break *all* PCI devices which didn't have an associated "iommu-map". The problem was actually in patch 2 (and is fixed in this version), but it only hit at this point once we start propagating the erroneous error code back to the driver probe. Robin. > > Rob > > P.S. Doesn't look like you have copied the DT maintainers nor list for > the DT changes. > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [V9, 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error
On Fri, Mar 10, 2017 at 12:30:57AM +0530, Sricharan R wrote: > From: Laurent Pinchart > > Failures to look up an IOMMU when parsing the DT iommus property need to > be handled separately from the .of_xlate() failures to support deferred > probing. > > The lack of a registered IOMMU can be caused by the lack of a driver for > the IOMMU, the IOMMU device probe not having been performed yet, having > been deferred, or having failed. > > The first case occurs when the device tree describes the bus master and > IOMMU topology correctly but no device driver exists for the IOMMU yet > or the device driver has not been compiled in. Return NULL, the caller > will configure the device without an IOMMU. > > The second and third cases are handled by deferring the probe of the bus > master device which will eventually get reprobed after the IOMMU. > > The last case is currently handled by deferring the probe of the bus > master device as well. A mechanism to either configure the bus master > device without an IOMMU or to fail the bus master device probe depending > on whether the IOMMU is optional or mandatory would be a good > enhancement. > > Tested-by: Marek Szyprowski > Signed-off-by: Laurent Pichart > Signed-off-by: Sricharan R > --- > drivers/base/dma-mapping.c | 5 +++-- > drivers/iommu/of_iommu.c | 4 ++-- > drivers/of/device.c| 7 ++- > include/linux/of_device.h | 9 ++--- > 4 files changed, 17 insertions(+), 8 deletions(-) Maybe it is the same issue reported for VFIO, but virtio-pci is broken with v8 of this series. Bisecting blames this commit which looks like it hasn't changeed. Rob P.S. Doesn't look like you have copied the DT maintainers nor list for the DT changes. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
On 28/03/17 06:27, Oza Oza wrote: > On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring wrote: >> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep wrote: >>> it is possible that PCI device supports 64-bit DMA addressing, >>> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), >>> however PCI host bridge may have limitations on the inbound >>> transaction addressing. As an example, consider NVME SSD device >>> connected to iproc-PCIe controller. >>> >>> Currently, the IOMMU DMA ops only considers PCI device dma_mask >>> when allocating an IOVA. This is particularly problematic on >>> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to >>> PA for in-bound transactions only after PCI Host has forwarded >>> these transactions on SOC IO bus. This means on such ARM/ARM64 >>> SOCs the IOVA of in-bound transactions has to honor the addressing >>> restrictions of the PCI Host. >>> >>> current pcie frmework and of framework integration assumes dma-ranges >>> in a way where memory-mapped devices define their dma-ranges. >>> dma-ranges: (child-bus-address, parent-bus-address, length). >>> >>> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. >>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; >> >> If you implement a common function, then I expect to see other users >> converted to use it. There's also PCI hosts in arch/powerpc that parse >> dma-ranges. > > the common function should be similar to what > of_pci_get_host_bridge_resources is doing right now. > it parses ranges property right now. > > the new function would look look following. > > of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources) > where resources would return the dma-ranges. > > but right now if you see the patch, of_dma_configure calls the new > function, which actually returns the largest possible size. > so this new function has to be generic in a way where other PCI hosts > can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can > use it for sure. > > although having powerpc using it; is a separate exercise, since I do > not have any access to other PCI hosts such as powerpc. but we can > workout with them on thsi forum if required. > > so overall, of_pci_get_dma_ranges has to serve following 2 purposes. > > 1) it has to return largest possible size to of_dma_configure to > generate largest possible dma_mask. > > 2) it also has to return resources (dma-ranges) parsed, to the users. > > so to address above needs > > of_pci_get_dma_ranges(struct device_node *dev, struct list_head > *resources, u64 *size) > > dev -> device node. > resources -> dma-ranges in allocated list. > size -> highest possible size to generate possible dma_mask for > of_dma_configure. > > let em know how this sounds. Note that the point of passing PCI host bridges into of_dma_configure() in the first place was to avoid having some separate PCI-specific path for DMA configuration. I worry that introducing bus-specific dma-ranges parsing largely defeats that, since we end up with the worst of both worlds; effectively-duplicated code, and/or a load of extra complexity to then attempt to reconverge the divergent paths (there really shouldn't be any need to allocate a list of anything). Given that of_translate_dma_address() is already bus-agnostic, it hardly seems justifiable for its caller not to be so as well, especially when it mostly just comes down to getting the right #address-cells value. The patch below is actually enough to make typical cases work, but is vile, so I'm not seriously considering it (hence I've not bothered making IOMMU configuration handle all circumstances). What it has served to do, though, is give me a clear idea of how to properly sort out the not-quite-right device/parent assumptions between of_dma_configure() and of_dma_get_range() rather than bodging around them any further - stay tuned. Robin. ->8- From: Robin Murphy Subject: [PATCH] of/pci: Use child node for DMA configuration of_dma_configure() expects to be passed an OF node representing the device being configured - for PCI devices we currently pass the node of the appropriate host controller, which sort of works for inherited properties which may appear at any level, like "dma-coherent", but falls apart for properties which actually care about specific device-parent relationships, like "dma-ranges". Solve this by attempting to find a suitable child node if the PCI hierarchy is actually represented in DT, and if not then faking one up as a last resort, to make all of DMA configuration work as expected. Signed-off-by: Robin Murphy --- drivers/iommu/of_iommu.c | 3 ++- drivers/pci/of.c | 24 drivers/pci/probe.c | 14 +- include/linux/pci.h | 3 +++ 4 files changed, 42 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 2683e9fc0dcf..35c97b945c15 100644 --- a/drivers/iommu/of_iommu
RE: [PATCH V9 00/11] IOMMU probe deferral support
> -Original Message- > From: Sricharan R [mailto:sricha...@codeaurora.org] > Sent: Tuesday, March 28, 2017 5:54 AM > To: Robin Murphy; Shameerali Kolothum Thodi; Wangzhou (B); > will.dea...@arm.com; j...@8bytes.org; lorenzo.pieral...@arm.com; > iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org; > linux-arm-...@vger.kernel.org; m.szyprow...@samsung.com; > bhelg...@google.com; linux-...@vger.kernel.org; linux- > a...@vger.kernel.org; t...@semihalf.com; hanjun@linaro.org; > ok...@codeaurora.org > Subject: Re: [PATCH V9 00/11] IOMMU probe deferral support > > Hi, > [...] > >>> From the logs its clear that when ixgbevf driver originally probes > >>> and adds the device to smmu the dma mask is 32, but when it binds > >>> to vfio-pci, it becomes 64 bit. > >> > >> Just to add to that, the mask is set to 64 bit in the ixgebvf driver > >> probe[1] > > > > Aha, but of course it's still the same struct device getting bound to > > VFIO later, so whatever mask the first driver set is still in there > > when we go through of_dma_configure() the second time (and the fact > > that we go through more than once being the new behaviour). So yes, > > this is a legitimate problem and we really do need to be robust > > against size overflow. I reckon the below tweak of your fix is > > probably the way to go; cleaning up the arch_setup_dma_ops() interface > can happen later. > > > > ok, i will add this fix separately and also the acpi fix that lorenzo has > suggested in patch #8 in to the series after testing confirmation. > I can confirm that the patches fixes the issues reported here . Both DT and ACPI works now. Cheers, Shameer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
On Tue, Mar 28, 2017 at 12:27 AM, Oza Oza wrote: > On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring wrote: >> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep wrote: >>> it is possible that PCI device supports 64-bit DMA addressing, >>> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), >>> however PCI host bridge may have limitations on the inbound >>> transaction addressing. As an example, consider NVME SSD device >>> connected to iproc-PCIe controller. >>> >>> Currently, the IOMMU DMA ops only considers PCI device dma_mask >>> when allocating an IOVA. This is particularly problematic on >>> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to >>> PA for in-bound transactions only after PCI Host has forwarded >>> these transactions on SOC IO bus. This means on such ARM/ARM64 >>> SOCs the IOVA of in-bound transactions has to honor the addressing >>> restrictions of the PCI Host. >>> >>> current pcie frmework and of framework integration assumes dma-ranges >>> in a way where memory-mapped devices define their dma-ranges. >>> dma-ranges: (child-bus-address, parent-bus-address, length). >>> >>> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. >>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; >> >> If you implement a common function, then I expect to see other users >> converted to use it. There's also PCI hosts in arch/powerpc that parse >> dma-ranges. > > the common function should be similar to what > of_pci_get_host_bridge_resources is doing right now. > it parses ranges property right now. > > the new function would look look following. > > of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources) > where resources would return the dma-ranges. > > but right now if you see the patch, of_dma_configure calls the new > function, which actually returns the largest possible size. > so this new function has to be generic in a way where other PCI hosts > can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can > use it for sure. > > although having powerpc using it; is a separate exercise, since I do > not have any access to other PCI hosts such as powerpc. but we can > workout with them on thsi forum if required. You don't need h/w. You can analyze what parts are common, write patches to convert to common implementation, and build test. The PPC and rcar folks can test on h/w. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu: Fix 16bit ASID configuration
From: Sunil Goutham 16bit ASID should be enabled before initializing TTBR0/1, otherwise only LSB 8bit ASID will be considered. Hence moving configuration of TTBCR register ahead of TTBR0/1 while initializing context bank. Signed-off-by: Sunil Goutham --- drivers/iommu/arm-smmu.c | 41 ++--- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 9b33700..2845d73 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -758,6 +758,28 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, } writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(cfg->cbndx)); + /* TTBCR */ + if (stage1) { + if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) { + reg = pgtbl_cfg->arm_v7s_cfg.tcr; + reg2 = 0; + } else { + reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr; + reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32; + reg2 |= TTBCR2_SEP_UPSTREAM; + /* 16bit ASID should be enabled before write to TTBR, +* otherwise only LSB 8bit will be considered. +*/ + if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) + reg2 |= TTBCR2_AS; + } + if (smmu->version > ARM_SMMU_V1) + writel_relaxed(reg2, cb_base + ARM_SMMU_CB_TTBCR2); + } else { + reg = pgtbl_cfg->arm_lpae_s2_cfg.vtcr; + } + writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR); + /* TTBRs */ if (stage1) { u16 asid = ARM_SMMU_CB_ASID(smmu, cfg); @@ -781,25 +803,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR0); } - /* TTBCR */ - if (stage1) { - if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) { - reg = pgtbl_cfg->arm_v7s_cfg.tcr; - reg2 = 0; - } else { - reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr; - reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32; - reg2 |= TTBCR2_SEP_UPSTREAM; - if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) - reg2 |= TTBCR2_AS; - } - if (smmu->version > ARM_SMMU_V1) - writel_relaxed(reg2, cb_base + ARM_SMMU_CB_TTBCR2); - } else { - reg = pgtbl_cfg->arm_lpae_s2_cfg.vtcr; - } - writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR); - /* MAIRs (stage-1 only) */ if (stage1) { if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu