RE: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-03-28 Thread Liu, Yi L
> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Monday, March 27, 2017 6:14 PM
> To: Liu, Yi L ; Alex Williamson 
> 
> Cc: Shanker Donthineni ; k...@vger.kernel.org;
> Catalin Marinas ; Sinan Kaya
> ; Will Deacon ;
> iommu@lists.linux-foundation.org; Harv Abdulhamid ;
> linux-...@vger.kernel.org; Bjorn Helgaas ; David
> Woodhouse ; linux-arm-ker...@lists.infradead.org; Nate
> Watterson ; Tian, Kevin ;
> Lan, Tianyu ; Raj, Ashok ; Pan, 
> Jacob
> jun ; Joerg Roedel ; Robin Murphy
> 
> Subject: Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> 
> On 24/03/17 07:46, Liu, Yi L wrote:
> [...]
> 
>  So we need some kind of high-level classification that the vIOMMU
>  must communicate to the physical one. Each IOMMU flavor would get a
>  unique, global identifier, simply to make sure that vIOMMU and
>  pIOMMU speak
> >> the same language.
>  For example:
> 
>  0x65776886 "AMDV" AMD IOMMU
>  0x73788476 "INTL" Intel IOMMU
>  0x83515748 "S390" s390 IOMMU
>  0x8385 "SMMU" ARM SMMU
>  etc.
> 
>  It needs to be a global magic number that everyone can recognize.
>  Could be as simple as 32-bit numbers allocated from 0. Once we have
>  a global magic number, we can use it to differentiate 
>  architecture-specific
> details.
> >
> > I prefer simple numbers to stand for each vendor.
> 
> Sure, I don't have any preference. Simple numbers could be easier to allocate.
> 
> >>> I may need to think more on this part.
> >>>
>  struct pasid_table_info {
>   __u64 ptr;
>   __u64 size; /* Is it number of entry or size in
>  bytes? */
> >>>
> >>> For Intel platform, it's encoded. But I can make it in bytes. Here,
> >>> I'd like to check with you if whole guest PASID info is also needed on 
> >>> ARM?
> >>
> >> It will be needed on ARM if someone ever emulates the SMMU with SVM.
> >> Though I'm not planning on doing that myself, it is unavoidable. And
> >> it would be a shame for the next SVM virtualization solution to have
> >> to introduce a new flag "VFIO_SVM_BIND_PASIDPT_2" if they could reuse
> >> most of the BIND_PASIDPT interface but simply needed to add one or
> >> two configuration fields specific to their IOMMU.
> >
> > So you are totally fine with putting PASID table ptr and size in the
> > generic part? Maybe we have different usage for it. For me, it's a
> > guest PASID table ptr. For you, it may be different.
> 
> It's the same for SMMU, with some added format specifiers that would go in
> 'opaque[]'. I think that table pointer and size (in bytes, or number of
> entries) is generic enough for a "bind table" call and can be reused by future
> implementations.
> 
> 
>   __u32 model;/* magic number */
>   __u32 variant;  /* version of the IOMMU architecture,
>  maybe? IOMMU-specific. */
> >
> > For variant, it will be combined with model to do sanity check. Am I right?
> > Maybe it could be moved to opaque.
> 
> Yes I guess it could be moved to opaque. It would be a version of the model 
> used, so
> we wouldn't have to allocate a new model number whenever an architecture
> updates the fields of its PASID descriptors, but we can let IOMMU drivers 
> decide if
> they need it and what to put in there.
> 
>   __u8 opaque[];  /* IOMMU-specific details */
>  };
> 
> [...]
> >>
> >> Yes, that seems sensible. I could add an explicit VFIO_BIND_PASID
> >> flags to make it explicit that data[] is "u32 pasid" and avoid having any 
> >> default.
> >
> > Add it in the comment I suppose. The length is 4 byes, it could be deduced 
> > from
> argsz.
> >
> >>
> 
> > #define VFIO_SVM_PASSDOWN_INVALIDATE(1 << 1)
> 
>  Using the vfio_device_svm structure for invalidate operations is a
>  bit odd, it might be nicer to add a new VFIO_SVM_INVALIDATE ioctl,
>  that takes the above iommu_svm_tlb_invalidate_info as argument
>  (with an added argsz.)
> >>>
> >>> Agree, would add a separate IOCTL for invalidation.
> >>>
> 
> > #define VFIO_SVM_PASID_RELEASE_FLUSHED  (1 << 2)
> > #define VFIO_SVM_PASID_RELEASE_CLEAN  (1 << 3)
> >__u32   flags;
> >__u32   length;
> 
>  If length is the size of data[], I guess we can already deduce this info 
>  from argsz.
> >>>
> >>> yes, it is size of data. Maybe remove argsz. How about your opinion?
> >>
> >> Argsz as first argument is standard across all VFIO ioctls, it allows
> >> the kernel to check that the structure passed by the guest is
> >> consistent with the structure it knows (see the comment at the
> >> beginning of
> >> include/uapi/linux/vfio.h) So argsz would be the preferred way to
> >> communicate the size of data[]
> >
> > yes, it makes sense. BTW. I think it is still possible to leave the
> > "length" since there is similar usage. 

Re: [RFC PATCH] of: Fix DMA configuration for non-DT masters

2017-03-28 Thread Oza Oza via iommu
On Wed, Mar 29, 2017 at 10:23 AM, Oza Oza  wrote:
> On Wed, Mar 29, 2017 at 12:27 AM, Robin Murphy  wrote:
>> For PCI masters not represented in DT, we pass the OF node of their
>> associated host bridge to of_dma_configure(), such that they can inherit
>> the appropriate DMA configuration from whatever is described there.
>> Unfortunately, whilst this has worked for the "dma-coherent" property,
>> it turns out to miss the case where the host bridge node has a non-empty
>> "dma-ranges", since nobody is expecting the 'device' to be a bus itself.
>>
>> It transpires, though, that the de-facto interface since the prototype
>> change in 1f5c69aa51f9 ("of: Move of_dma_configure() to device.c to help
>> re-use") is very clear-cut: either the master_np argument is redundant
>> with dev->of_node, or dev->of_node is NULL and master_np is the relevant
>> parent bus. Let's ratify that behaviour, then teach the whole
>> of_dma_configure() pipeline to cope with both cases properly.
>>
>> Signed-off-by: Robin Murphy 
>> ---
>>
>> This is what I'd consider the better fix - rather than adding yet more
>> special cases - which will also make it simple to handle multiple
>> "dma-ranges" entries with minimal further disruption. The callers now
>> left passing dev->of_node as 'parent' are harmless, but look a bit silly
>> and clearly want cleaning up - I'd be partial to renaming the existing
>> function and having a single-argument wrapper for the 'normal' case, e.g.:
>>
>> static inline int of_dma_configure(struct device_node *dev)
>> {
>> return of_dma_configure_parent(dev, NULL);
>> }
>>
>> Thoughts?
>>
>> Robin.
>>
>>  drivers/iommu/of_iommu.c   |  7 ---
>>  drivers/of/address.c   | 37 +
>>  drivers/of/device.c| 12 +++-
>>  include/linux/of_address.h |  7 ---
>>  include/linux/of_device.h  |  4 ++--
>>  include/linux/of_iommu.h   |  4 ++--
>>  6 files changed, 44 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index 2683e9fc0dcf..35aff07bb5eb 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -138,21 +138,22 @@ static const struct iommu_ops
>>  }
>>
>>  const struct iommu_ops *of_iommu_configure(struct device *dev,
>> -  struct device_node *master_np)
>> +  struct device_node *parent)
>>  {
>> struct of_phandle_args iommu_spec;
>> -   struct device_node *np;
>> +   struct device_node *np, *master_np;
>> const struct iommu_ops *ops = NULL;
>> int idx = 0;
>>
>> if (dev_is_pci(dev))
>> -   return of_pci_iommu_configure(to_pci_dev(dev), master_np);
>> +   return of_pci_iommu_configure(to_pci_dev(dev), parent);
>>
>> /*
>>  * We don't currently walk up the tree looking for a parent IOMMU.
>>  * See the `Notes:' section of
>>  * Documentation/devicetree/bindings/iommu/iommu.txt
>>  */
>> +   master_np = dev->of_node ? dev->of_node : parent;
>> while (!of_parse_phandle_with_args(master_np, "iommus",
>>"#iommu-cells", idx,
>>&iommu_spec)) {
>> diff --git a/drivers/of/address.c b/drivers/of/address.c
>> index 02b2903fe9d2..833bc17f5e55 100644
>> --- a/drivers/of/address.c
>> +++ b/drivers/of/address.c
>> @@ -808,6 +808,7 @@ EXPORT_SYMBOL(of_io_request_and_map);
>>  /**
>>   * of_dma_get_range - Get DMA range info
>>   * @np:device node to get DMA range info
>> + * @parent:node of device's parent bus, if @np is NULL
>>   * @dma_addr:  pointer to store initial DMA address of DMA range
>>   * @paddr: pointer to store initial CPU address of DMA range
>>   * @size:  pointer to store size of DMA range
>> @@ -822,36 +823,48 @@ EXPORT_SYMBOL(of_io_request_and_map);
>>   * It returns -ENODEV if "dma-ranges" property was not found
>>   * for this device in DT.
>>   */
>> -int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 
>> *size)
>> +int of_dma_get_range(struct device_node *np, struct device_node *parent,
>> +u64 *dma_addr, u64 *paddr, u64 *size)
>>  {
>> -   struct device_node *node = of_node_get(np);
>> +   struct device_node *node;
>> const __be32 *ranges = NULL;
>> int len, naddr, nsize, pna;
>> int ret = 0;
>> u64 dmaaddr;
>>
>> -   if (!node)
>> -   return -EINVAL;
>> -
>> -   while (1) {
>> +   if (np) {
>> +   node = of_node_get(np);
>> naddr = of_n_addr_cells(node);
>> nsize = of_n_size_cells(node);
>> node = of_get_next_parent(node);
>> -   if (!node)
>> -   break;
>> +   } else if (parent) {
>> +   node = of_node_get(parent);
>> +   np = p

Re: [RFC PATCH] of: Fix DMA configuration for non-DT masters

2017-03-28 Thread Oza Oza via iommu
On Wed, Mar 29, 2017 at 12:27 AM, Robin Murphy  wrote:
> For PCI masters not represented in DT, we pass the OF node of their
> associated host bridge to of_dma_configure(), such that they can inherit
> the appropriate DMA configuration from whatever is described there.
> Unfortunately, whilst this has worked for the "dma-coherent" property,
> it turns out to miss the case where the host bridge node has a non-empty
> "dma-ranges", since nobody is expecting the 'device' to be a bus itself.
>
> It transpires, though, that the de-facto interface since the prototype
> change in 1f5c69aa51f9 ("of: Move of_dma_configure() to device.c to help
> re-use") is very clear-cut: either the master_np argument is redundant
> with dev->of_node, or dev->of_node is NULL and master_np is the relevant
> parent bus. Let's ratify that behaviour, then teach the whole
> of_dma_configure() pipeline to cope with both cases properly.
>
> Signed-off-by: Robin Murphy 
> ---
>
> This is what I'd consider the better fix - rather than adding yet more
> special cases - which will also make it simple to handle multiple
> "dma-ranges" entries with minimal further disruption. The callers now
> left passing dev->of_node as 'parent' are harmless, but look a bit silly
> and clearly want cleaning up - I'd be partial to renaming the existing
> function and having a single-argument wrapper for the 'normal' case, e.g.:
>
> static inline int of_dma_configure(struct device_node *dev)
> {
> return of_dma_configure_parent(dev, NULL);
> }
>
> Thoughts?
>
> Robin.
>
>  drivers/iommu/of_iommu.c   |  7 ---
>  drivers/of/address.c   | 37 +
>  drivers/of/device.c| 12 +++-
>  include/linux/of_address.h |  7 ---
>  include/linux/of_device.h  |  4 ++--
>  include/linux/of_iommu.h   |  4 ++--
>  6 files changed, 44 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 2683e9fc0dcf..35aff07bb5eb 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -138,21 +138,22 @@ static const struct iommu_ops
>  }
>
>  const struct iommu_ops *of_iommu_configure(struct device *dev,
> -  struct device_node *master_np)
> +  struct device_node *parent)
>  {
> struct of_phandle_args iommu_spec;
> -   struct device_node *np;
> +   struct device_node *np, *master_np;
> const struct iommu_ops *ops = NULL;
> int idx = 0;
>
> if (dev_is_pci(dev))
> -   return of_pci_iommu_configure(to_pci_dev(dev), master_np);
> +   return of_pci_iommu_configure(to_pci_dev(dev), parent);
>
> /*
>  * We don't currently walk up the tree looking for a parent IOMMU.
>  * See the `Notes:' section of
>  * Documentation/devicetree/bindings/iommu/iommu.txt
>  */
> +   master_np = dev->of_node ? dev->of_node : parent;
> while (!of_parse_phandle_with_args(master_np, "iommus",
>"#iommu-cells", idx,
>&iommu_spec)) {
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 02b2903fe9d2..833bc17f5e55 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -808,6 +808,7 @@ EXPORT_SYMBOL(of_io_request_and_map);
>  /**
>   * of_dma_get_range - Get DMA range info
>   * @np:device node to get DMA range info
> + * @parent:node of device's parent bus, if @np is NULL
>   * @dma_addr:  pointer to store initial DMA address of DMA range
>   * @paddr: pointer to store initial CPU address of DMA range
>   * @size:  pointer to store size of DMA range
> @@ -822,36 +823,48 @@ EXPORT_SYMBOL(of_io_request_and_map);
>   * It returns -ENODEV if "dma-ranges" property was not found
>   * for this device in DT.
>   */
> -int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 
> *size)
> +int of_dma_get_range(struct device_node *np, struct device_node *parent,
> +u64 *dma_addr, u64 *paddr, u64 *size)
>  {
> -   struct device_node *node = of_node_get(np);
> +   struct device_node *node;
> const __be32 *ranges = NULL;
> int len, naddr, nsize, pna;
> int ret = 0;
> u64 dmaaddr;
>
> -   if (!node)
> -   return -EINVAL;
> -
> -   while (1) {
> +   if (np) {
> +   node = of_node_get(np);
> naddr = of_n_addr_cells(node);
> nsize = of_n_size_cells(node);
> node = of_get_next_parent(node);
> -   if (!node)
> -   break;
> +   } else if (parent) {
> +   node = of_node_get(parent);
> +   np = parent;
> +   if (of_property_read_u32(node, "#address-cells", &naddr))
> +   naddr = of_n_addr_cells(node);
> +   if (of_

Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-28 Thread Oza Oza via iommu
On Tue, Mar 28, 2017 at 7:59 PM, Robin Murphy  wrote:
> On 28/03/17 06:27, Oza Oza wrote:
>> On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring  wrote:
>>> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  
>>> wrote:
 it is possible that PCI device supports 64-bit DMA addressing,
 and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
 however PCI host bridge may have limitations on the inbound
 transaction addressing. As an example, consider NVME SSD device
 connected to iproc-PCIe controller.

 Currently, the IOMMU DMA ops only considers PCI device dma_mask
 when allocating an IOVA. This is particularly problematic on
 ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
 PA for in-bound transactions only after PCI Host has forwarded
 these transactions on SOC IO bus. This means on such ARM/ARM64
 SOCs the IOVA of in-bound transactions has to honor the addressing
 restrictions of the PCI Host.

 current pcie frmework and of framework integration assumes dma-ranges
 in a way where memory-mapped devices define their dma-ranges.
 dma-ranges: (child-bus-address, parent-bus-address, length).

 but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
 dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>>>
>>> If you implement a common function, then I expect to see other users
>>> converted to use it. There's also PCI hosts in arch/powerpc that parse
>>> dma-ranges.
>>
>> the common function should be similar to what
>> of_pci_get_host_bridge_resources is doing right now.
>> it parses ranges property right now.
>>
>> the new function would look look following.
>>
>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources)
>> where resources would return the dma-ranges.
>>
>> but right now if you see the patch, of_dma_configure calls the new
>> function, which actually returns the largest possible size.
>> so this new function has to be generic in a way where other PCI hosts
>> can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can
>> use it for sure.
>>
>> although having powerpc using it;  is a separate exercise, since I do
>> not have any access to other PCI hosts such as powerpc. but we can
>> workout with them on thsi forum if required.
>>
>> so overall, of_pci_get_dma_ranges has to serve following 2 purposes.
>>
>> 1) it has to return largest possible size to of_dma_configure to
>> generate largest possible dma_mask.
>>
>> 2) it also has to return resources (dma-ranges) parsed, to the users.
>>
>> so to address above needs
>>
>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head
>> *resources, u64 *size)
>>
>> dev -> device node.
>> resources -> dma-ranges in allocated list.
>> size -> highest possible size to generate possible dma_mask for
>> of_dma_configure.
>>
>> let em know how this sounds.
>
> Note that the point of passing PCI host bridges into of_dma_configure()
> in the first place was to avoid having some separate PCI-specific path
> for DMA configuration. I worry that introducing bus-specific dma-ranges
> parsing largely defeats that, since we end up with the worst of both
> worlds; effectively-duplicated code, and/or a load of extra complexity
> to then attempt to reconverge the divergent paths (there really
> shouldn't be any need to allocate a list of anything). Given that
> of_translate_dma_address() is already bus-agnostic, it hardly seems
> justifiable for its caller not to be so as well, especially when it
> mostly just comes down to getting the right #address-cells value.
>
> The patch below is actually enough to make typical cases work, but is
> vile, so I'm not seriously considering it (hence I've not bothered
> making IOMMU configuration handle all circumstances). What it has served
> to do, though, is give me a clear idea of how to properly sort out the
> not-quite-right device/parent assumptions between of_dma_configure() and
> of_dma_get_range() rather than bodging around them any further - stay tuned.
>
> Robin.
>
> ->8-
> From: Robin Murphy 
> Subject: [PATCH] of/pci: Use child node for DMA configuration
>
> of_dma_configure() expects to be passed an OF node representing the
> device being configured - for PCI devices we currently pass the node of
> the appropriate host controller, which sort of works for inherited
> properties which may appear at any level, like "dma-coherent", but falls
> apart for properties which actually care about specific device-parent
> relationships, like "dma-ranges".
>
> Solve this by attempting to find a suitable child node if the PCI
> hierarchy is actually represented in DT, and if not then faking one up
> as a last resort, to make all of DMA configuration work as expected.
>
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/of_iommu.c |  3 ++-
>  drivers/pci/of.c | 24 
>  drivers/pci/probe.c  | 14 +-
>  include/linux/pci.h  |  3

[RFC PATCH] of: Fix DMA configuration for non-DT masters

2017-03-28 Thread Robin Murphy
For PCI masters not represented in DT, we pass the OF node of their
associated host bridge to of_dma_configure(), such that they can inherit
the appropriate DMA configuration from whatever is described there.
Unfortunately, whilst this has worked for the "dma-coherent" property,
it turns out to miss the case where the host bridge node has a non-empty
"dma-ranges", since nobody is expecting the 'device' to be a bus itself.

It transpires, though, that the de-facto interface since the prototype
change in 1f5c69aa51f9 ("of: Move of_dma_configure() to device.c to help
re-use") is very clear-cut: either the master_np argument is redundant
with dev->of_node, or dev->of_node is NULL and master_np is the relevant
parent bus. Let's ratify that behaviour, then teach the whole
of_dma_configure() pipeline to cope with both cases properly.

Signed-off-by: Robin Murphy 
---

This is what I'd consider the better fix - rather than adding yet more
special cases - which will also make it simple to handle multiple
"dma-ranges" entries with minimal further disruption. The callers now
left passing dev->of_node as 'parent' are harmless, but look a bit silly
and clearly want cleaning up - I'd be partial to renaming the existing
function and having a single-argument wrapper for the 'normal' case, e.g.:

static inline int of_dma_configure(struct device_node *dev)
{
return of_dma_configure_parent(dev, NULL);
}

Thoughts?

Robin.

 drivers/iommu/of_iommu.c   |  7 ---
 drivers/of/address.c   | 37 +
 drivers/of/device.c| 12 +++-
 include/linux/of_address.h |  7 ---
 include/linux/of_device.h  |  4 ++--
 include/linux/of_iommu.h   |  4 ++--
 6 files changed, 44 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 2683e9fc0dcf..35aff07bb5eb 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -138,21 +138,22 @@ static const struct iommu_ops
 }
 
 const struct iommu_ops *of_iommu_configure(struct device *dev,
-  struct device_node *master_np)
+  struct device_node *parent)
 {
struct of_phandle_args iommu_spec;
-   struct device_node *np;
+   struct device_node *np, *master_np;
const struct iommu_ops *ops = NULL;
int idx = 0;
 
if (dev_is_pci(dev))
-   return of_pci_iommu_configure(to_pci_dev(dev), master_np);
+   return of_pci_iommu_configure(to_pci_dev(dev), parent);
 
/*
 * We don't currently walk up the tree looking for a parent IOMMU.
 * See the `Notes:' section of
 * Documentation/devicetree/bindings/iommu/iommu.txt
 */
+   master_np = dev->of_node ? dev->of_node : parent;
while (!of_parse_phandle_with_args(master_np, "iommus",
   "#iommu-cells", idx,
   &iommu_spec)) {
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903fe9d2..833bc17f5e55 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -808,6 +808,7 @@ EXPORT_SYMBOL(of_io_request_and_map);
 /**
  * of_dma_get_range - Get DMA range info
  * @np:device node to get DMA range info
+ * @parent:node of device's parent bus, if @np is NULL
  * @dma_addr:  pointer to store initial DMA address of DMA range
  * @paddr: pointer to store initial CPU address of DMA range
  * @size:  pointer to store size of DMA range
@@ -822,36 +823,48 @@ EXPORT_SYMBOL(of_io_request_and_map);
  * It returns -ENODEV if "dma-ranges" property was not found
  * for this device in DT.
  */
-int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 
*size)
+int of_dma_get_range(struct device_node *np, struct device_node *parent,
+u64 *dma_addr, u64 *paddr, u64 *size)
 {
-   struct device_node *node = of_node_get(np);
+   struct device_node *node;
const __be32 *ranges = NULL;
int len, naddr, nsize, pna;
int ret = 0;
u64 dmaaddr;
 
-   if (!node)
-   return -EINVAL;
-
-   while (1) {
+   if (np) {
+   node = of_node_get(np);
naddr = of_n_addr_cells(node);
nsize = of_n_size_cells(node);
node = of_get_next_parent(node);
-   if (!node)
-   break;
+   } else if (parent) {
+   node = of_node_get(parent);
+   np = parent;
+   if (of_property_read_u32(node, "#address-cells", &naddr))
+   naddr = of_n_addr_cells(node);
+   if (of_property_read_u32(node, "#size-cells", &nsize))
+   nsize = of_n_size_cells(node);
+   } else {
+   return -EINVAL;
+   }
 
+   while (node) {
ranges = of_get_property(node, "dma-ranges", &len);
 
- 

Re: [PATCH V9 00/11] IOMMU probe deferral support

2017-03-28 Thread Sricharan R

Hi,

On 3/28/2017 7:45 PM, Shameerali Kolothum Thodi wrote:




-Original Message-
From: Sricharan R [mailto:sricha...@codeaurora.org]
Sent: Tuesday, March 28, 2017 5:54 AM
To: Robin Murphy; Shameerali Kolothum Thodi; Wangzhou (B);
will.dea...@arm.com; j...@8bytes.org; lorenzo.pieral...@arm.com;
iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org;
linux-arm-...@vger.kernel.org; m.szyprow...@samsung.com;
bhelg...@google.com; linux-...@vger.kernel.org; linux-
a...@vger.kernel.org; t...@semihalf.com; hanjun@linaro.org;
ok...@codeaurora.org
Subject: Re: [PATCH V9 00/11] IOMMU probe deferral support

Hi,


[...]


 From the logs its clear that  when ixgbevf driver originally probes
and adds the device  to smmu  the dma mask is 32, but when it binds
to vfio-pci, it becomes 64 bit.


Just to add to that, the mask is set to 64 bit in the ixgebvf driver
probe[1]


Aha, but of course it's still the same struct device getting bound to
VFIO later, so whatever mask the first driver set is still in there
when we go through of_dma_configure() the second time (and the fact
that we go through more than once being the new behaviour). So yes,
this is a legitimate problem and we really do need to be robust
against size overflow. I reckon the below tweak of your fix is
probably the way to go; cleaning up the arch_setup_dma_ops() interface

can happen later.




ok, i will add this fix separately and also the acpi fix that lorenzo has
suggested in patch #8 in to the series after testing confirmation.


I can confirm that the patches fixes the issues reported here . Both
DT and ACPI works now.



Thanks for the testing. Will repost with the fixes.

Regards,
 Sricharan

--
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V9, 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error

2017-03-28 Thread Sricharan R

Hi,

On 3/28/2017 8:30 PM, Rob Herring wrote:

On Fri, Mar 10, 2017 at 12:30:57AM +0530, Sricharan R wrote:

From: Laurent Pinchart 

Failures to look up an IOMMU when parsing the DT iommus property need to
be handled separately from the .of_xlate() failures to support deferred
probing.

The lack of a registered IOMMU can be caused by the lack of a driver for
the IOMMU, the IOMMU device probe not having been performed yet, having
been deferred, or having failed.

The first case occurs when the device tree describes the bus master and
IOMMU topology correctly but no device driver exists for the IOMMU yet
or the device driver has not been compiled in. Return NULL, the caller
will configure the device without an IOMMU.

The second and third cases are handled by deferring the probe of the bus
master device which will eventually get reprobed after the IOMMU.

The last case is currently handled by deferring the probe of the bus
master device as well. A mechanism to either configure the bus master
device without an IOMMU or to fail the bus master device probe depending
on whether the IOMMU is optional or mandatory would be a good
enhancement.

Tested-by: Marek Szyprowski 
Signed-off-by: Laurent Pichart 
Signed-off-by: Sricharan R 
---
 drivers/base/dma-mapping.c | 5 +++--
 drivers/iommu/of_iommu.c   | 4 ++--
 drivers/of/device.c| 7 ++-
 include/linux/of_device.h  | 9 ++---
 4 files changed, 17 insertions(+), 8 deletions(-)


Maybe it is the same issue reported for VFIO, but virtio-pci is broken
with v8 of this series. Bisecting blames this commit which looks like it
hasn't changeed.


Ya, as Robin mentioned it was broken at patch #2 and
comes out after this patch. I am going to repost this series with
couple of more fixes added as well.



Rob

P.S. Doesn't look like you have copied the DT maintainers nor list for
the DT changes.


Ha, really sorry about that. will add it.

Regards,
 Sricharan

--
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V9, 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error

2017-03-28 Thread Robin Murphy
On 28/03/17 16:00, Rob Herring wrote:
> On Fri, Mar 10, 2017 at 12:30:57AM +0530, Sricharan R wrote:
>> From: Laurent Pinchart 
>>
>> Failures to look up an IOMMU when parsing the DT iommus property need to
>> be handled separately from the .of_xlate() failures to support deferred
>> probing.
>>
>> The lack of a registered IOMMU can be caused by the lack of a driver for
>> the IOMMU, the IOMMU device probe not having been performed yet, having
>> been deferred, or having failed.
>>
>> The first case occurs when the device tree describes the bus master and
>> IOMMU topology correctly but no device driver exists for the IOMMU yet
>> or the device driver has not been compiled in. Return NULL, the caller
>> will configure the device without an IOMMU.
>>
>> The second and third cases are handled by deferring the probe of the bus
>> master device which will eventually get reprobed after the IOMMU.
>>
>> The last case is currently handled by deferring the probe of the bus
>> master device as well. A mechanism to either configure the bus master
>> device without an IOMMU or to fail the bus master device probe depending
>> on whether the IOMMU is optional or mandatory would be a good
>> enhancement.
>>
>> Tested-by: Marek Szyprowski 
>> Signed-off-by: Laurent Pichart 
>> Signed-off-by: Sricharan R 
>> ---
>>  drivers/base/dma-mapping.c | 5 +++--
>>  drivers/iommu/of_iommu.c   | 4 ++--
>>  drivers/of/device.c| 7 ++-
>>  include/linux/of_device.h  | 9 ++---
>>  4 files changed, 17 insertions(+), 8 deletions(-)
> 
> Maybe it is the same issue reported for VFIO, but virtio-pci is broken 
> with v8 of this series. Bisecting blames this commit which looks like it 
> hasn't changeed. 

v8 managed to break *all* PCI devices which didn't have an associated
"iommu-map". The problem was actually in patch 2 (and is fixed in this
version), but it only hit at this point once we start propagating the
erroneous error code back to the driver probe.

Robin.

> 
> Rob
> 
> P.S. Doesn't look like you have copied the DT maintainers nor list for 
> the DT changes.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V9, 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error

2017-03-28 Thread Rob Herring
On Fri, Mar 10, 2017 at 12:30:57AM +0530, Sricharan R wrote:
> From: Laurent Pinchart 
> 
> Failures to look up an IOMMU when parsing the DT iommus property need to
> be handled separately from the .of_xlate() failures to support deferred
> probing.
> 
> The lack of a registered IOMMU can be caused by the lack of a driver for
> the IOMMU, the IOMMU device probe not having been performed yet, having
> been deferred, or having failed.
> 
> The first case occurs when the device tree describes the bus master and
> IOMMU topology correctly but no device driver exists for the IOMMU yet
> or the device driver has not been compiled in. Return NULL, the caller
> will configure the device without an IOMMU.
> 
> The second and third cases are handled by deferring the probe of the bus
> master device which will eventually get reprobed after the IOMMU.
> 
> The last case is currently handled by deferring the probe of the bus
> master device as well. A mechanism to either configure the bus master
> device without an IOMMU or to fail the bus master device probe depending
> on whether the IOMMU is optional or mandatory would be a good
> enhancement.
> 
> Tested-by: Marek Szyprowski 
> Signed-off-by: Laurent Pichart 
> Signed-off-by: Sricharan R 
> ---
>  drivers/base/dma-mapping.c | 5 +++--
>  drivers/iommu/of_iommu.c   | 4 ++--
>  drivers/of/device.c| 7 ++-
>  include/linux/of_device.h  | 9 ++---
>  4 files changed, 17 insertions(+), 8 deletions(-)

Maybe it is the same issue reported for VFIO, but virtio-pci is broken 
with v8 of this series. Bisecting blames this commit which looks like it 
hasn't changeed. 

Rob

P.S. Doesn't look like you have copied the DT maintainers nor list for 
the DT changes.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-28 Thread Robin Murphy
On 28/03/17 06:27, Oza Oza wrote:
> On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring  wrote:
>> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  wrote:
>>> it is possible that PCI device supports 64-bit DMA addressing,
>>> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
>>> however PCI host bridge may have limitations on the inbound
>>> transaction addressing. As an example, consider NVME SSD device
>>> connected to iproc-PCIe controller.
>>>
>>> Currently, the IOMMU DMA ops only considers PCI device dma_mask
>>> when allocating an IOVA. This is particularly problematic on
>>> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
>>> PA for in-bound transactions only after PCI Host has forwarded
>>> these transactions on SOC IO bus. This means on such ARM/ARM64
>>> SOCs the IOVA of in-bound transactions has to honor the addressing
>>> restrictions of the PCI Host.
>>>
>>> current pcie frmework and of framework integration assumes dma-ranges
>>> in a way where memory-mapped devices define their dma-ranges.
>>> dma-ranges: (child-bus-address, parent-bus-address, length).
>>>
>>> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
>>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> If you implement a common function, then I expect to see other users
>> converted to use it. There's also PCI hosts in arch/powerpc that parse
>> dma-ranges.
> 
> the common function should be similar to what
> of_pci_get_host_bridge_resources is doing right now.
> it parses ranges property right now.
> 
> the new function would look look following.
> 
> of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources)
> where resources would return the dma-ranges.
> 
> but right now if you see the patch, of_dma_configure calls the new
> function, which actually returns the largest possible size.
> so this new function has to be generic in a way where other PCI hosts
> can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can
> use it for sure.
> 
> although having powerpc using it;  is a separate exercise, since I do
> not have any access to other PCI hosts such as powerpc. but we can
> workout with them on thsi forum if required.
> 
> so overall, of_pci_get_dma_ranges has to serve following 2 purposes.
> 
> 1) it has to return largest possible size to of_dma_configure to
> generate largest possible dma_mask.
> 
> 2) it also has to return resources (dma-ranges) parsed, to the users.
> 
> so to address above needs
> 
> of_pci_get_dma_ranges(struct device_node *dev, struct list_head
> *resources, u64 *size)
> 
> dev -> device node.
> resources -> dma-ranges in allocated list.
> size -> highest possible size to generate possible dma_mask for
> of_dma_configure.
> 
> let em know how this sounds.

Note that the point of passing PCI host bridges into of_dma_configure()
in the first place was to avoid having some separate PCI-specific path
for DMA configuration. I worry that introducing bus-specific dma-ranges
parsing largely defeats that, since we end up with the worst of both
worlds; effectively-duplicated code, and/or a load of extra complexity
to then attempt to reconverge the divergent paths (there really
shouldn't be any need to allocate a list of anything). Given that
of_translate_dma_address() is already bus-agnostic, it hardly seems
justifiable for its caller not to be so as well, especially when it
mostly just comes down to getting the right #address-cells value.

The patch below is actually enough to make typical cases work, but is
vile, so I'm not seriously considering it (hence I've not bothered
making IOMMU configuration handle all circumstances). What it has served
to do, though, is give me a clear idea of how to properly sort out the
not-quite-right device/parent assumptions between of_dma_configure() and
of_dma_get_range() rather than bodging around them any further - stay tuned.

Robin.

->8-
From: Robin Murphy 
Subject: [PATCH] of/pci: Use child node for DMA configuration

of_dma_configure() expects to be passed an OF node representing the
device being configured - for PCI devices we currently pass the node of
the appropriate host controller, which sort of works for inherited
properties which may appear at any level, like "dma-coherent", but falls
apart for properties which actually care about specific device-parent
relationships, like "dma-ranges".

Solve this by attempting to find a suitable child node if the PCI
hierarchy is actually represented in DT, and if not then faking one up
as a last resort, to make all of DMA configuration work as expected.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/of_iommu.c |  3 ++-
 drivers/pci/of.c | 24 
 drivers/pci/probe.c  | 14 +-
 include/linux/pci.h  |  3 +++
 4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 2683e9fc0dcf..35c97b945c15 100644
--- a/drivers/iommu/of_iommu

RE: [PATCH V9 00/11] IOMMU probe deferral support

2017-03-28 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Sricharan R [mailto:sricha...@codeaurora.org]
> Sent: Tuesday, March 28, 2017 5:54 AM
> To: Robin Murphy; Shameerali Kolothum Thodi; Wangzhou (B);
> will.dea...@arm.com; j...@8bytes.org; lorenzo.pieral...@arm.com;
> iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org;
> linux-arm-...@vger.kernel.org; m.szyprow...@samsung.com;
> bhelg...@google.com; linux-...@vger.kernel.org; linux-
> a...@vger.kernel.org; t...@semihalf.com; hanjun@linaro.org;
> ok...@codeaurora.org
> Subject: Re: [PATCH V9 00/11] IOMMU probe deferral support
> 
> Hi,
> 
[...]

> >>>  From the logs its clear that  when ixgbevf driver originally probes
> >>> and adds the device  to smmu  the dma mask is 32, but when it binds
> >>> to vfio-pci, it becomes 64 bit.
> >>
> >> Just to add to that, the mask is set to 64 bit in the ixgebvf driver
> >> probe[1]
> >
> > Aha, but of course it's still the same struct device getting bound to
> > VFIO later, so whatever mask the first driver set is still in there
> > when we go through of_dma_configure() the second time (and the fact
> > that we go through more than once being the new behaviour). So yes,
> > this is a legitimate problem and we really do need to be robust
> > against size overflow. I reckon the below tweak of your fix is
> > probably the way to go; cleaning up the arch_setup_dma_ops() interface
> can happen later.
> >
> 
> ok, i will add this fix separately and also the acpi fix that lorenzo has
> suggested in patch #8 in to the series after testing confirmation.
> 
I can confirm that the patches fixes the issues reported here . Both 
DT and ACPI works now.
 
Cheers,
Shameer 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-28 Thread Rob Herring
On Tue, Mar 28, 2017 at 12:27 AM, Oza Oza  wrote:
> On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring  wrote:
>> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  wrote:
>>> it is possible that PCI device supports 64-bit DMA addressing,
>>> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
>>> however PCI host bridge may have limitations on the inbound
>>> transaction addressing. As an example, consider NVME SSD device
>>> connected to iproc-PCIe controller.
>>>
>>> Currently, the IOMMU DMA ops only considers PCI device dma_mask
>>> when allocating an IOVA. This is particularly problematic on
>>> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
>>> PA for in-bound transactions only after PCI Host has forwarded
>>> these transactions on SOC IO bus. This means on such ARM/ARM64
>>> SOCs the IOVA of in-bound transactions has to honor the addressing
>>> restrictions of the PCI Host.
>>>
>>> current pcie frmework and of framework integration assumes dma-ranges
>>> in a way where memory-mapped devices define their dma-ranges.
>>> dma-ranges: (child-bus-address, parent-bus-address, length).
>>>
>>> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
>>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> If you implement a common function, then I expect to see other users
>> converted to use it. There's also PCI hosts in arch/powerpc that parse
>> dma-ranges.
>
> the common function should be similar to what
> of_pci_get_host_bridge_resources is doing right now.
> it parses ranges property right now.
>
> the new function would look look following.
>
> of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources)
> where resources would return the dma-ranges.
>
> but right now if you see the patch, of_dma_configure calls the new
> function, which actually returns the largest possible size.
> so this new function has to be generic in a way where other PCI hosts
> can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can
> use it for sure.
>
> although having powerpc using it;  is a separate exercise, since I do
> not have any access to other PCI hosts such as powerpc. but we can
> workout with them on thsi forum if required.

You don't need h/w. You can analyze what parts are common, write
patches to convert to common implementation, and build test. The PPC
and rcar folks can test on h/w.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: Fix 16bit ASID configuration

2017-03-28 Thread sunil . kovvuri
From: Sunil Goutham 

16bit ASID should be enabled before initializing TTBR0/1,
otherwise only LSB 8bit ASID will be considered. Hence
moving configuration of TTBCR register ahead of TTBR0/1
while initializing context bank.

Signed-off-by: Sunil Goutham 
---
 drivers/iommu/arm-smmu.c | 41 ++---
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9b33700..2845d73 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -758,6 +758,28 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
}
writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(cfg->cbndx));
 
+   /* TTBCR */
+   if (stage1) {
+   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
+   reg = pgtbl_cfg->arm_v7s_cfg.tcr;
+   reg2 = 0;
+   } else {
+   reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
+   reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
+   reg2 |= TTBCR2_SEP_UPSTREAM;
+   /* 16bit ASID should be enabled before write to TTBR,
+* otherwise only LSB 8bit will be considered.
+*/
+   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   reg2 |= TTBCR2_AS;
+   }
+   if (smmu->version > ARM_SMMU_V1)
+   writel_relaxed(reg2, cb_base + ARM_SMMU_CB_TTBCR2);
+   } else {
+   reg = pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
+   }
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR);
+
/* TTBRs */
if (stage1) {
u16 asid = ARM_SMMU_CB_ASID(smmu, cfg);
@@ -781,25 +803,6 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR0);
}
 
-   /* TTBCR */
-   if (stage1) {
-   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
-   reg = pgtbl_cfg->arm_v7s_cfg.tcr;
-   reg2 = 0;
-   } else {
-   reg = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-   reg2 = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
-   reg2 |= TTBCR2_SEP_UPSTREAM;
-   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
-   reg2 |= TTBCR2_AS;
-   }
-   if (smmu->version > ARM_SMMU_V1)
-   writel_relaxed(reg2, cb_base + ARM_SMMU_CB_TTBCR2);
-   } else {
-   reg = pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
-   }
-   writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR);
-
/* MAIRs (stage-1 only) */
if (stage1) {
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu