[PATCH 1/1] iommu/vt-d: Use real PASID for flush in caching mode

2018-03-15 Thread Lu Baolu
If caching mode is supported, the hardware will cache
none-present or erroneous translation entries. Hence,
software should explicitly invalidate the PASID cache
after a PASID table entry becomes present. We should
issue such invalidation with the PASID value that we
have changed. PASID 0 is not reserved for this case.

Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Sankaran Rajesh 
Suggested-by: Ashok Raj 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 99bc9bd..b7b88b5 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -422,17 +422,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
iommu->pasid_table[svm->pasid].val = pasid_entry_val;
 
wmb();
-   /* In caching mode, we still have to flush with PASID 0 when
-* a PASID table entry becomes present. Not entirely clear
-* *why* that would be the case — surely we could just issue
-* a flush with the PASID value that we've changed? The PASID
-* is the index into the table, after all. It's not like domain
-* IDs in the case of the equivalent context-entry change in
-* caching mode. And for that matter it's not entirely clear why
-* a VMM would be in the business of caching the PASID table
-* anyway. Surely that can be left entirely to the guest? */
+
+   /*
+* Flush PASID cache when a PASID table entry becomes
+* present.
+*/
if (cap_caching_mode(iommu->cap))
-   intel_flush_pasid_dev(svm, sdev, 0);
+   intel_flush_pasid_dev(svm, sdev, svm->pasid);
}
list_add_rcu(>list, >devs);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 2/5] dt-bindings: brcm: Add reserved-dma-region for iPROC

2018-03-15 Thread Robin Murphy

On 15/03/18 12:03, Jitendra Bhivare wrote:

On Thu, Mar 15, 2018 at 12:15 AM, Robin Murphy  wrote:

On 12/03/18 07:03, Jitendra Bhivare wrote:


On Tue, Mar 6, 2018 at 5:12 PM, Robin Murphy  wrote:


On 06/03/18 04:59, Jitendra Bhivare wrote:



With SoC wide DMA mask of 40-bit, the mappings for entire IOVA space
can't
be specified in the PAXBv2 PCIe RC of SoC. The holes in IOVA space needs
to
be reserved to prevent any IOVA allocations in those spaces.




Can you clarify why? If this is the PCI inbound window thing again, let
me
say once again that "dma-ranges" is the appropriate way for DT to
describe
the hardware.

Robin.


dma-ranges = < \
0x4300 0x00 0x 0x00 0x8000 0x00 0x8000 \
0x4300 0x08 0x 0x08 0x 0x08 0x \
0x4300 0x80 0x 0x80 0x 0x80 0x>

Yes, its for PCI inbound windows. In our HW, they are limited by sizes
specified in
ARM memory maps which was done for non-IOMMU cases to catch any transfer
outside the ranges.
dma-ranges are already being used to program these inbound windows and SoC
wide DMA mask is already specified but IOMMU code can still allocate IOVAs
in the gaps for which translation will fail in PCIe RC.



Right, so make iommu-dma reserve the gaps. No need to clutter up the DT with
redundant information which gets handled pretty much identically anyway.

Robin.

There was a mistake in pasting dma-ranges mentioned. This is what we use it in
IOMMU case.
#define PCIE_DMA_RANGES dma-ranges = < \
0x4300 0x00 0x 0x00 0x 0x04 0x \
0x4300 0x08 0x 0x08 0x 0x08 0x \
0x4300 0x80 0x 0x80 0x 0x80 0x>

#define PCIE_RESERVED_DMA_REGION reserved-dma-region = < \
0x0 0x0 0x04 0x0 0x04 0x0 \
0x0 0x0 0x10 0x0 0x70 0x0>

This is non-iommu case which is per ARM memory map.
#define PCIE_DMA_RANGES dma-ranges = < \
0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \
0x4300 0x08 0x 0x08 0x 0x08 0x \
0x4300 0x80 0x 0x80 0x 0x40 0x>

In IOMMU case, we had to increase first in-bound window i.e.
dma-ranges first entry
from 2GB-4GB (in non-iommu case) to 0-16GB because out-bound window specified
from 2GB-4GB implicitly gets reserved by the PCI iommu code. This allowed IOVA
allocations from 0-2GB and 4-16GB mapped through in-bound windows.

Basically, my point is that dma-ranges specified is being used to
program our in-bound
windows and defines PCI space to CPU address space supported mappings of HW.
In the same way, it would be better to explicitly specify
reserved-dma-region to clarify
this as HW hole than implicity reserving the regions.


Why? If DMA traffic can only pass through inbound windows, and you 
program the inbound windows based on dma-ranges, then by definition 
dma-ranges tells you exactly the set of addresses that are usable for 
DMA, and by extension, IOMMU remapping. I fail to see how it's then 
"better" to add a second slightly different description of the exact 
same information with the bonus maintenance burden of then having to 
ensure the two actually match.


Robin.


We can at least have single property 'reserved-dma-window' for MSI
region and DMA regions
to help describe the holes in the HW than let SW compute it.





reserved-dma-region property is added to specify the ranges which should
never be mapped and given to devices sitting behind.

Reviewed-by: Ray Jui 
Reviewed-by: Vikram Prakash 
Reviewed-by: Scott Branden 
Signed-off-by: Jitendra Bhivare 
---
Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt | 3 +++
1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
index b8e48b4..3be0fe3 100644
--- a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
@@ -30,6 +30,9 @@ Optional properties:
- dma-ranges: Some PAXB-based root complexes do not have inbound
mapping
done
  by the ASIC after power on reset.  In this case, SW is required to
configure
the mapping, based on inbound memory regions specified by this
property.
+- reserved-dma-region: PAXBv2 with IOMMU enabled cannot provide
mappings
for
+  entire IOVA space specified by DMA mask. Hence this is used to
reserve
the
+  gaps in dma-ranges.
  - brcm,pcie-ob: Some iProc SoCs do not have the outbound address
mapping done
by the ASIC after power on reset. In this case, SW needs to configure
it






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 3/5] dt-bindings: arm-smmu: Add reserved-msi-region

2018-03-15 Thread Robin Murphy

On 12/03/18 07:19, Jitendra Bhivare wrote:

On Tue, Mar 6, 2018 at 5:17 PM, Robin Murphy  wrote:

On 06/03/18 04:59, Jitendra Bhivare wrote:


iPROC SoC has a special device window to treat GICv3 ITS MSI.



Ugh, really? After preferably printing out 100 copies of the SBSA, binding
them all together and dropping the lot onto the hardware designers from a
great height, could you clarify what exactly the special behaviour is here?

Robin.


The special device window needs to programmed so that writes to
specified address
region helps for specific ordering of traffic prior to it.


OK, I know of PCIe root complexes having address matching to give the 
MSI write the appropriate AXI memory type attributes when the SMMU is 
bypassed, but ordering is a new one. I guess you have some glue logic on 
the root complex master interface which injects an ACE barrier 
transaction in front of any write to the ITS doorbell address?

Current code maps MSI to IOVA space. Add SMMU node property to use
direct mappings for MSI region.

This property is read and reserved when arm_smmu_get_resv_regions
gets called.


Either way, this should be a generic, not SMMU-specific, property - 
there are other platforms that would also make use of it to describe a 
similar hardware situation (which we currently only support via ACPI). 
The big question is where to put it: hardware-wise, the property of 
"MSIs won't work properly if the doorbell is remapped to a different 
address" belongs firmly to the root complex, not the IOMMU, while the 
address itself is already a property of the MSI controller. The IOMMU is 
just the innocent guy in the middle who has to discover and respect the 
constraints.


I'd like to think we could just have some flag to say "you can't remap 
my MSIs!" then go and chase the msi-parent/msi-map phandle to the MSI 
controller to get the address from its reg property, which is more or 
less the equivalent of what the current ACPI workaround does - see 
commit 8b4282e6b8e2 ("ACPI/IORT: Add msi address regions reservation 
helper") in linux-next - but I can already think of ways in which that 
might not work. For one, there's not necessarily any guarantee that the 
MSI controller's programming interface and doorbell are in the same 
place, so we probably do need to describe the specific MSI address(es) 
that the root complex cares about, from the root complex end :/


Robin.



Signed-off-by: Jitendra Bhivare 
---
   Documentation/devicetree/bindings/iommu/arm,smmu.txt | 12 
   1 file changed, 12 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce..13fa2b9 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -71,6 +71,15 @@ conditions.
 or using stream matching with #iommu-cells = <2>, and
 may be ignored if present in such cases.
   +- reserved-msi-region: MSI region to be reserved with specific prot in
IOVA
+ space for direct mapping. The region is specified in
tuple
+ of (busno,prot,bus_addr,size).
+
+- #region-address-cells: specifies number of cells needed to encode
bus_addr
+
+- #region-size-cells: specifies number of cells needed to encode size
+
+
   ** Deprecated properties:
 - mmu-masters (deprecated in favour of the generic "iommus" binding) :
@@ -95,6 +104,9 @@ conditions.
<0 36 4>,
<0 37 4>;
   #iommu-cells = <1>;
+   #region-address-cells = <1>;
+   #region-size-cells = <1>;
+   reserved-msi-region = <0x0 0x1a 0x63c3000 0x8000>;
   };
 /* device with two stream IDs, 0 and 7 */




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/14] swiotlb: remove swiotlb_set_mem_attributes

2018-03-15 Thread Konrad Rzeszutek Wilk
On Thu, Mar 15, 2018 at 01:51:59PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 14, 2018 at 06:52:10PM +0100, Christoph Hellwig wrote:
> > Now that set_memory_decrypted is always available we can just call it
> > directly.
> > 
> 
> Won't this break ARM build?

 And of course right after asking that question I went back to
the previous patch as I had that in my mind it was arch/x86/include, not the 
global
one.

So pls ignore my query and instead:

Reviewed-by: Konrad Rzeszutek Wilk 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/14] swiotlb: remove swiotlb_set_mem_attributes

2018-03-15 Thread Konrad Rzeszutek Wilk
On Wed, Mar 14, 2018 at 06:52:10PM +0100, Christoph Hellwig wrote:
> Now that set_memory_decrypted is always available we can just call it
> directly.
> 

Won't this break ARM build?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/14] set_memory.h: provide set_memory_{en,de}crypted stubs

2018-03-15 Thread Konrad Rzeszutek Wilk
On Wed, Mar 14, 2018 at 06:52:09PM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
Reviewed-by: Konrad Rzeszutek Wilk 

Thank you!
> ---
>  include/linux/set_memory.h | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
> index e5140648f638..da5178216da5 100644
> --- a/include/linux/set_memory.h
> +++ b/include/linux/set_memory.h
> @@ -17,4 +17,16 @@ static inline int set_memory_x(unsigned long addr,  int 
> numpages) { return 0; }
>  static inline int set_memory_nx(unsigned long addr, int numpages) { return 
> 0; }
>  #endif
>  
> +#ifndef CONFIG_ARCH_HAS_MEM_ENCRYPT
> +static inline int set_memory_encrypted(unsigned long addr, int numpages)
> +{
> + return 0;
> +}
> +
> +static inline int set_memory_decrypted(unsigned long addr, int numpages)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
> +
>  #endif /* _LINUX_SET_MEMORY_H_ */
> -- 
> 2.14.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/14] x86/amd_gart: look at coherent_dma_mask instead of GFP_DMA

2018-03-15 Thread Konrad Rzeszutek Wilk
On Wed, Mar 14, 2018 at 06:52:04PM +0100, Christoph Hellwig wrote:
> We want to phase out looking at the magic GFP_DMA flag in the dma mapping
> routines, so switch the gart driver to use the coherent_dma_mask instead,
> which is used to select the GFP_DMA flag in the caller.
> 
Reviewed-by: Konrad Rzeszutek Wilk 

> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/kernel/amd_gart_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> index 52e3abcf3e70..79ac6cbb 100644
> --- a/arch/x86/kernel/amd_gart_64.c
> +++ b/arch/x86/kernel/amd_gart_64.c
> @@ -484,7 +484,7 @@ gart_alloc_coherent(struct device *dev, size_t size, 
> dma_addr_t *dma_addr,
>   unsigned long align_mask;
>   struct page *page;
>  
> - if (force_iommu && !(flag & GFP_DMA)) {
> + if (force_iommu && dev->coherent_dma_mask > DMA_BIT_MASK(24)) {
>   flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
>   page = alloc_pages(flag | __GFP_ZERO, get_order(size));
>   if (!page)
> -- 
> 2.14.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 6/7] vfio/type1: remove duplicate retrieval of reserved regions

2018-03-15 Thread Shameer Kolothum
As we now already have the reserved regions list, just pass that into
vfio_iommu_has_sw_msi() fn.

Signed-off-by: Shameer Kolothum 
---
 drivers/vfio/vfio_iommu_type1.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 90f195d..65c13e7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1204,15 +1204,13 @@ static struct vfio_group *find_iommu_group(struct 
vfio_domain *domain,
return NULL;
 }
 
-static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
+static bool vfio_iommu_has_sw_msi(struct list_head *group_resv_regions,
+   phys_addr_t *base)
 {
-   struct list_head group_resv_regions;
-   struct iommu_resv_region *region, *next;
+   struct iommu_resv_region *region;
bool ret = false;
 
-   INIT_LIST_HEAD(_resv_regions);
-   iommu_get_group_resv_regions(group, _resv_regions);
-   list_for_each_entry(region, _resv_regions, list) {
+   list_for_each_entry(region, group_resv_regions, list) {
/*
 * The presence of any 'real' MSI regions should take
 * precedence over the software-managed one if the
@@ -1228,8 +1226,7 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group 
*group, phys_addr_t *base)
ret = true;
}
}
-   list_for_each_entry_safe(region, next, _resv_regions, list)
-   kfree(region);
+
return ret;
 }
 
@@ -1564,7 +1561,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_detach;
 
-   resv_msi = vfio_iommu_has_sw_msi(iommu_group, _msi_base);
+   resv_msi = vfio_iommu_has_sw_msi(_resv_regions, _msi_base);
 
INIT_LIST_HEAD(>group_list);
list_add(>next, >group_list);
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/7] vfio/type1: check dma map request is within a valid iova range

2018-03-15 Thread Shameer Kolothum
This checks and rejects any dma map request outside valid iova
range.

Signed-off-by: Shameer Kolothum 
---
 drivers/vfio/vfio_iommu_type1.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 25e6920..d59db31 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -982,6 +982,23 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
return ret;
 }
 
+/*
+ * Check dma map request is within a valid iova range
+ */
+static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
+   dma_addr_t start, dma_addr_t end)
+{
+   struct list_head *iova = >iova_list;
+   struct vfio_iova *node;
+
+   list_for_each_entry(node, iova, list) {
+   if ((start >= node->start) && (end <= node->end))
+   return true;
+   }
+
+   return false;
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1020,6 +1037,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
goto out_unlock;
}
 
+   if (!vfio_iommu_iova_dma_valid(iommu, iova, iova + size - 1)) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
dma = kzalloc(sizeof(*dma), GFP_KERNEL);
if (!dma) {
ret = -ENOMEM;
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list

2018-03-15 Thread Shameer Kolothum
This retrieves the reserved regions associated with dev group and
checks for conflicts with any existing dma mappings. Also update
the iova list excluding the reserved regions.

Signed-off-by: Shameer Kolothum 
---
 drivers/vfio/vfio_iommu_type1.c | 90 +
 1 file changed, 90 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1123c74..cfe2bb2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct list_head *iova,
return 0;
 }
 
+/*
+ * Check reserved region conflicts with existing dma mappings
+ */
+static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
+   struct list_head *resv_regions)
+{
+   struct iommu_resv_region *region;
+
+   /* Check for conflict with existing dma mappings */
+   list_for_each_entry(region, resv_regions, list) {
+   if (vfio_find_dma(iommu, region->start, region->length))
+   return true;
+   }
+
+   return false;
+}
+
+/*
+ * Check iova region overlap with  reserved regions and
+ * exclude them from the iommu iova range
+ */
+static int vfio_iommu_resv_exclude(struct list_head *iova,
+   struct list_head *resv_regions)
+{
+   struct iommu_resv_region *resv;
+   struct vfio_iova *n, *next;
+
+   list_for_each_entry(resv, resv_regions, list) {
+   phys_addr_t start, end;
+
+   start = resv->start;
+   end = resv->start + resv->length - 1;
+
+   list_for_each_entry_safe(n, next, iova, list) {
+   int ret = 0;
+
+   /* No overlap */
+   if ((start > n->end) || (end < n->start))
+   continue;
+   /*
+* Insert a new node if current node overlaps with the
+* reserve region to exlude that from valid iova range.
+* Note that, new node is inserted before the current
+* node and finally the current node is deleted keeping
+* the list updated and sorted.
+*/
+   if (start > n->start)
+   ret = vfio_iommu_iova_insert(>list,
+   n->start, start - 1);
+   if (!ret && end < n->end)
+   ret = vfio_iommu_iova_insert(>list,
+   end + 1, n->end);
+   if (ret)
+   return ret;
+
+   list_del(>list);
+   kfree(n);
+   }
+   }
+
+   if (list_empty(iova))
+   return -EINVAL;
+
+   return 0;
+}
+
+static void vfio_iommu_resv_free(struct list_head *resv_regions)
+{
+   struct iommu_resv_region *n, *next;
+
+   list_for_each_entry_safe(n, next, resv_regions, list) {
+   list_del(>list);
+   kfree(n);
+   }
+}
+
 static void vfio_iommu_iova_free(struct list_head *iova)
 {
struct vfio_iova *n, *next;
@@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
phys_addr_t resv_msi_base;
struct iommu_domain_geometry geo;
LIST_HEAD(iova_copy);
+   LIST_HEAD(group_resv_regions);
 
mutex_lock(>lock);
 
@@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
goto out_detach;
}
 
+   iommu_get_group_resv_regions(iommu_group, _resv_regions);
+
+   if (vfio_iommu_resv_conflict(iommu, _resv_regions)) {
+   ret = -EINVAL;
+   goto out_detach;
+   }
+
/* Get a copy of the current iova list and work on it */
ret = vfio_iommu_iova_get_copy(iommu, _copy);
if (ret)
@@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
if (ret)
goto out_detach;
 
+   ret = vfio_iommu_resv_exclude(_copy, _resv_regions);
+   if (ret)
+   goto out_detach;
+
resv_msi = vfio_iommu_has_sw_msi(iommu_group, _msi_base);
 
INIT_LIST_HEAD(>group_list);
@@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
/* Delete the old one and insert new iova list */
vfio_iommu_iova_insert_copy(iommu, _copy);
mutex_unlock(>lock);
+   vfio_iommu_resv_free(_resv_regions);
 
return 0;
 
@@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 out_domain:
iommu_domain_free(domain->domain);
vfio_iommu_iova_free(_copy);
+   vfio_iommu_resv_free(_resv_regions);
 out_free:
kfree(domain);

[PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.

2018-03-15 Thread Shameer Kolothum
This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
window reservation generic")  by moving the PCI window region
reservation back into the dma specific path so that these regions
doesn't get exposed via the IOMMU API interface. With this change,
the vfio interface will report only iommu specific reserved regions
to the user space.

Cc: Robin Murphy 
Cc: Joerg Roedel 
Signed-off-by: Shameer Kolothum 
---
 drivers/iommu/dma-iommu.c | 54 ++-
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f05f3cf..ddcbbdb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  * @list: Reserved region list from iommu_get_resv_regions()
  *
  * IOMMU drivers can use this to implement their .get_resv_regions callback
- * for general non-IOMMU-specific reservations. Currently, this covers host
- * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
- * based ARM platforms that may require HW MSI reservation.
+ * for general non-IOMMU-specific reservations. Currently, this covers GICv3
+ * ITS region reservation on ACPI based ARM platforms that may require HW MSI
+ * reservation.
  */
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
-   struct pci_host_bridge *bridge;
-   struct resource_entry *window;
-
-   if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
-   iort_iommu_msi_get_resv_regions(dev, list) < 0)
-   return;
-
-   if (!dev_is_pci(dev))
-   return;
-
-   bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
-   resource_list_for_each_entry(window, >windows) {
-   struct iommu_resv_region *region;
-   phys_addr_t start;
-   size_t length;
-
-   if (resource_type(window->res) != IORESOURCE_MEM)
-   continue;
 
-   start = window->res->start - window->offset;
-   length = window->res->end - window->res->start + 1;
-   region = iommu_alloc_resv_region(start, length, 0,
-   IOMMU_RESV_RESERVED);
-   if (!region)
-   return;
+   if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
+   iort_iommu_msi_get_resv_regions(dev, list);
 
-   list_add_tail(>list, list);
-   }
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
 
@@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct 
iommu_dma_cookie *cookie,
return 0;
 }
 
+static void iova_reserve_pci_windows(struct pci_dev *dev,
+   struct iova_domain *iovad)
+{
+   struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+   struct resource_entry *window;
+   unsigned long lo, hi;
+
+   resource_list_for_each_entry(window, >windows) {
+   if (resource_type(window->res) != IORESOURCE_MEM)
+   continue;
+
+   lo = iova_pfn(iovad, window->res->start - window->offset);
+   hi = iova_pfn(iovad, window->res->end - window->offset);
+   reserve_iova(iovad, lo, hi);
+   }
+}
+
 static int iova_reserve_iommu_regions(struct device *dev,
struct iommu_domain *domain)
 {
@@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
LIST_HEAD(resv_regions);
int ret = 0;
 
+   if (dev_is_pci(dev))
+   iova_reserve_pci_windows(to_pci_dev(dev), iovad);
+
iommu_get_resv_regions(dev, _regions);
list_for_each_entry(region, _regions, list) {
unsigned long lo, hi;
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/7] vfio/type1: Add IOVA range capability support

2018-03-15 Thread Shameer Kolothum
This  allows the user-space to retrieve the supported IOVA
range(s), excluding any reserved regions. The implementation
is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl.

Signed-off-by: Shameer Kolothum 
---
 drivers/vfio/vfio_iommu_type1.c | 96 +
 include/uapi/linux/vfio.h   | 23 ++
 2 files changed, 119 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d59db31..90f195d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1929,6 +1929,68 @@ static int vfio_domains_have_iommu_cache(struct 
vfio_iommu *iommu)
return ret;
 }
 
+static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
+struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
+size_t size)
+{
+   struct vfio_info_cap_header *header;
+   struct vfio_iommu_type1_info_cap_iova_range *iova_cap;
+
+   header = vfio_info_cap_add(caps, size,
+   VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE, 1);
+   if (IS_ERR(header))
+   return PTR_ERR(header);
+
+   iova_cap = container_of(header,
+   struct vfio_iommu_type1_info_cap_iova_range, header);
+   iova_cap->nr_iovas = cap_iovas->nr_iovas;
+   memcpy(iova_cap->iova_ranges, cap_iovas->iova_ranges,
+   cap_iovas->nr_iovas * sizeof(*cap_iovas->iova_ranges));
+   return 0;
+}
+
+static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu,
+   struct vfio_info_cap *caps)
+{
+   struct vfio_iommu_type1_info_cap_iova_range *cap_iovas;
+   struct vfio_iova *iova;
+   size_t size;
+   int iovas = 0, i = 0, ret;
+
+   mutex_lock(>lock);
+
+   list_for_each_entry(iova, >iova_list, list)
+   iovas++;
+
+   if (!iovas) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   size = sizeof(*cap_iovas) + (iovas * sizeof(*cap_iovas->iova_ranges));
+
+   cap_iovas = kzalloc(size, GFP_KERNEL);
+   if (!cap_iovas) {
+   ret = -ENOMEM;
+   goto out_unlock;
+   }
+
+   cap_iovas->nr_iovas = iovas;
+
+   list_for_each_entry(iova, >iova_list, list) {
+   cap_iovas->iova_ranges[i].start = iova->start;
+   cap_iovas->iova_ranges[i].end = iova->end;
+   i++;
+   }
+
+   ret = vfio_iommu_iova_add_cap(caps, cap_iovas, size);
+
+   kfree(cap_iovas);
+out_unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -1950,19 +2012,53 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
}
} else if (cmd == VFIO_IOMMU_GET_INFO) {
struct vfio_iommu_type1_info info;
+   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+   unsigned long capsz;
+   int ret;
 
minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
+   /* For backward compatibility, cannot require this */
+   capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
if (info.argsz < minsz)
return -EINVAL;
 
+   if (info.argsz >= capsz) {
+   minsz = capsz;
+   info.cap_offset = 0; /* output, no-recopy necessary */
+   }
+
info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+   ret = vfio_iommu_iova_build_caps(iommu, );
+   if (ret)
+   return ret;
+
+   if (caps.size) {
+   info.flags |= VFIO_IOMMU_INFO_CAPS;
+
+   if (info.argsz < sizeof(info) + caps.size) {
+   info.argsz = sizeof(info) + caps.size;
+   } else {
+   vfio_info_cap_shift(, sizeof(info));
+   if (copy_to_user((void __user *)arg +
+   sizeof(info), caps.buf,
+   caps.size)) {
+   kfree(caps.buf);
+   return -EFAULT;
+   }
+   info.cap_offset = sizeof(info);
+   }
+
+   kfree(caps.buf);
+   }
+
return copy_to_user((void __user *)arg, , minsz) ?
-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c743721..46b49e9 

[PATCH v5 0/7] vfio/type1: Add support for valid iova list management

2018-03-15 Thread Shameer Kolothum
This series introduces an iova list associated with a vfio 
iommu. The list is kept updated taking care of iommu apertures,
and reserved regions. Also this series adds checks for any conflict
with existing dma mappings whenever a new device group is attached to
the domain.

User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
ioctl capability chains. Any dma map request outside the valid iova
range will be rejected.


v4 --> v5
Rebased to next-20180315.
 
 -Incorporated the corner case bug fix suggested by Alex to patch #5.
 -Based on suggestions by Alex and Robin, added patch#7. This
  moves the PCI window  reservation back in to DMA specific path.
  This is to fix the issue reported by Eric[1].

Note:
The patch #7 has dependency with [2][3]

1. https://patchwork.kernel.org/patch/10232043/
2. https://patchwork.kernel.org/patch/10216553/
3. https://patchwork.kernel.org/patch/10216555/

v3 --> v4
 Addressed comments received for v3.
 -dma_addr_t instead of phys_addr_t
 -LIST_HEAD() usage.
 -Free up iova_copy list in case of error.
 -updated logic in filling the iova caps info(patch #5)

RFCv2 --> v3
 Removed RFC tag.
 Addressed comments from Alex and Eric:
 - Added comments to make iova list management logic more clear.
 - Use of iova list copy so that original is not altered in
   case of failure.

RFCv1 --> RFCv2
 Addressed comments from Alex:
-Introduced IOVA list management and added checks for conflicts with 
 existing dma map entries during attach/detach.

Shameer Kolothum (2):
  vfio/type1: Add IOVA range capability support
  iommu/dma: Move PCI window region reservation back into dma specific
path.

Shameerali Kolothum Thodi (5):
  vfio/type1: Introduce iova list and add iommu aperture validity check
  vfio/type1: Check reserve region conflict and update iova list
  vfio/type1: Update iova list on detach
  vfio/type1: check dma map request is within a valid iova range
  vfio/type1: remove duplicate retrieval of reserved regions

 drivers/iommu/dma-iommu.c   |  54 ++---
 drivers/vfio/vfio_iommu_type1.c | 497 +++-
 include/uapi/linux/vfio.h   |  23 ++
 3 files changed, 533 insertions(+), 41 deletions(-)

-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/7] vfio/type1: Update iova list on detach

2018-03-15 Thread Shameer Kolothum
Get a copy of iova list on _group_detach and try to update the list.
On success replace the current one with the copy. Leave the list as
it is if update fails.

Signed-off-by: Shameer Kolothum 
---
 drivers/vfio/vfio_iommu_type1.c | 91 +
 1 file changed, 91 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cfe2bb2..25e6920 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1667,12 +1667,88 @@ static void vfio_sanity_check_pfn_list(struct 
vfio_iommu *iommu)
WARN_ON(iommu->notifier.head);
 }
 
+/*
+ * Called when a domain is removed in detach. It is possible that
+ * the removed domain decided the iova aperture window. Modify the
+ * iova aperture with the smallest window among existing domains.
+ */
+static void vfio_iommu_aper_expand(struct vfio_iommu *iommu,
+  struct list_head *iova_copy)
+{
+   struct vfio_domain *domain;
+   struct iommu_domain_geometry geo;
+   struct vfio_iova *node;
+   dma_addr_t start = 0;
+   dma_addr_t end = (dma_addr_t)~0;
+
+   list_for_each_entry(domain, >domain_list, next) {
+   iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
+ );
+   if (geo.aperture_start > start)
+   start = geo.aperture_start;
+   if (geo.aperture_end < end)
+   end = geo.aperture_end;
+   }
+
+   /* Modify aperture limits. The new aper is either same or bigger */
+   node = list_first_entry(iova_copy, struct vfio_iova, list);
+   node->start = start;
+   node = list_last_entry(iova_copy, struct vfio_iova, list);
+   node->end = end;
+}
+
+/*
+ * Called when a group is detached. The reserved regions for that
+ * group can be part of valid iova now. But since reserved regions
+ * may be duplicated among groups, populate the iova valid regions
+ * list again.
+ */
+static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
+  struct list_head *iova_copy)
+{
+   struct vfio_domain *d;
+   struct vfio_group *g;
+   struct vfio_iova *node;
+   dma_addr_t start, end;
+   LIST_HEAD(resv_regions);
+   int ret;
+
+   list_for_each_entry(d, >domain_list, next) {
+   list_for_each_entry(g, >group_list, next)
+   iommu_get_group_resv_regions(g->iommu_group,
+_regions);
+   }
+
+   if (list_empty(_regions))
+   return 0;
+
+   node = list_first_entry(iova_copy, struct vfio_iova, list);
+   start = node->start;
+   node = list_last_entry(iova_copy, struct vfio_iova, list);
+   end = node->end;
+
+   /* purge the iova list and create new one */
+   vfio_iommu_iova_free(iova_copy);
+
+   ret = vfio_iommu_aper_resize(iova_copy, start, end);
+   if (ret)
+   goto done;
+
+   /* Exclude current reserved regions from iova ranges */
+   ret = vfio_iommu_resv_exclude(iova_copy, _regions);
+done:
+   vfio_iommu_resv_free(_regions);
+   return ret;
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
  struct iommu_group *iommu_group)
 {
struct vfio_iommu *iommu = iommu_data;
struct vfio_domain *domain;
struct vfio_group *group;
+   bool iova_copy_fail;
+   LIST_HEAD(iova_copy);
 
mutex_lock(>lock);
 
@@ -1695,6 +1771,12 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
}
}
 
+   /*
+* Get a copy of iova list. If success, use copy to update the
+* list and to replace the current one.
+*/
+   iova_copy_fail = !!vfio_iommu_iova_get_copy(iommu, _copy);
+
list_for_each_entry(domain, >domain_list, next) {
group = find_iommu_group(domain, iommu_group);
if (!group)
@@ -1720,10 +1802,19 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
iommu_domain_free(domain->domain);
list_del(>next);
kfree(domain);
+   if (!iova_copy_fail && !list_empty(>domain_list))
+   vfio_iommu_aper_expand(iommu, _copy);
}
break;
}
 
+   if (!iova_copy_fail && !list_empty(>domain_list)) {
+   if (!vfio_iommu_resv_refresh(iommu, _copy))
+   vfio_iommu_iova_insert_copy(iommu, _copy);
+   else
+   vfio_iommu_iova_free(_copy);
+   }
+
 detach_group_done:
mutex_unlock(>lock);
 }
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH v5 1/7] vfio/type1: Introduce iova list and add iommu aperture validity check

2018-03-15 Thread Shameer Kolothum
This introduces an iova list that is valid for dma mappings. Make
sure the new iommu aperture window doesn't conflict with the current
one or with any existing dma mappings during attach.

Signed-off-by: Shameer Kolothum 
---
 drivers/vfio/vfio_iommu_type1.c | 183 +++-
 1 file changed, 180 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 45657e2..1123c74 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -60,6 +60,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
struct list_headdomain_list;
+   struct list_headiova_list;
struct vfio_domain  *external_domain; /* domain for external user */
struct mutexlock;
struct rb_root  dma_list;
@@ -92,6 +93,12 @@ struct vfio_group {
struct list_headnext;
 };
 
+struct vfio_iova {
+   struct list_headlist;
+   dma_addr_t  start;
+   dma_addr_t  end;
+};
+
 /*
  * Guest RAM pinning working set or DMA target
  */
@@ -1204,6 +1211,149 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group 
*group, phys_addr_t *base)
return ret;
 }
 
+/*
+ * This is a helper function to insert an address range to iova list.
+ * The list starts with a single entry corresponding to the IOMMU
+ * domain geometry to which the device group is attached. The list
+ * aperture gets modified when a new domain is added to the container
+ * if the new aperture doesn't conflict with the current one or with
+ * any existing dma mappings. The list is also modified to exclude
+ * any reserved regions associated with the device group.
+ */
+static int vfio_iommu_iova_insert(struct list_head *head,
+ dma_addr_t start, dma_addr_t end)
+{
+   struct vfio_iova *region;
+
+   region = kmalloc(sizeof(*region), GFP_KERNEL);
+   if (!region)
+   return -ENOMEM;
+
+   INIT_LIST_HEAD(>list);
+   region->start = start;
+   region->end = end;
+
+   list_add_tail(>list, head);
+   return 0;
+}
+
+/*
+ * Check the new iommu aperture conflicts with existing aper or with any
+ * existing dma mappings.
+ */
+static bool vfio_iommu_aper_conflict(struct vfio_iommu *iommu,
+dma_addr_t start, dma_addr_t end)
+{
+   struct vfio_iova *first, *last;
+   struct list_head *iova = >iova_list;
+
+   if (list_empty(iova))
+   return false;
+
+   /* Disjoint sets, return conflict */
+   first = list_first_entry(iova, struct vfio_iova, list);
+   last = list_last_entry(iova, struct vfio_iova, list);
+   if ((start > last->end) || (end < first->start))
+   return true;
+
+   /* Check for any existing dma mappings outside the new start */
+   if (start > first->start) {
+   if (vfio_find_dma(iommu, first->start, start - first->start))
+   return true;
+   }
+
+   /* Check for any existing dma mappings outside the new end */
+   if (end < last->end) {
+   if (vfio_find_dma(iommu, end + 1, last->end - end))
+   return true;
+   }
+
+   return false;
+}
+
+/*
+ * Resize iommu iova aperture window. This is called only if the new
+ * aperture has no conflict with existing aperture and dma mappings.
+ */
+static int vfio_iommu_aper_resize(struct list_head *iova,
+ dma_addr_t start,
+ dma_addr_t end)
+{
+   struct vfio_iova *node, *next;
+
+   if (list_empty(iova))
+   return vfio_iommu_iova_insert(iova, start, end);
+
+   /* Adjust iova list start */
+   list_for_each_entry_safe(node, next, iova, list) {
+   if (start < node->start)
+   break;
+   if ((start >= node->start) && (start < node->end)) {
+   node->start = start;
+   break;
+   }
+   /* Delete nodes before new start */
+   list_del(>list);
+   kfree(node);
+   }
+
+   /* Adjust iova list end */
+   list_for_each_entry_safe(node, next, iova, list) {
+   if (end > node->end)
+   continue;
+   if ((end > node->start) && (end <= node->end)) {
+   node->end = end;
+   continue;
+   }
+   /* Delete nodes after new end */
+   list_del(>list);
+   kfree(node);
+   }
+
+   return 0;
+}
+
+static void vfio_iommu_iova_free(struct list_head *iova)
+{
+   struct vfio_iova *n, *next;
+
+   list_for_each_entry_safe(n, next, iova, list) {
+   list_del(>list);
+   kfree(n);
+   }
+}
+
+static int 

Re: [PATCH 07/14] iommu/amd_iommu: use dma_direct_{alloc,free}

2018-03-15 Thread Joerg Roedel
On Wed, Mar 14, 2018 at 06:52:06PM +0100, Christoph Hellwig wrote:
> This cleans up the code a lot by removing duplicate logic.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/Kconfig |  1 +
>  drivers/iommu/amd_iommu.c | 68 
> +++
>  2 files changed, 22 insertions(+), 47 deletions(-)

Tested-by: Joerg Roedel 
Acked-by: Joerg Roedel 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Joerg Roedel
On Thu, Mar 15, 2018 at 02:42:00PM +, Dmitry Safonov wrote:
> But even with loop-limit we will need ratelimit each printk() *also*.
> Otherwise loop-limit will be based on time spent printing, not on
> anything else..
> The patch makes sense even with loop-limit in my opinion.

Looks like I mis-read your patch, somehow it looked to me as if you
replace all 'ratelimited' usages with a call to __ratelimit(), but you
just move 'ratelimited' into the loop, which actually makes sense.

But still, this alone is no proper fix for the soft-lockups you are
seeing.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()

2018-03-15 Thread Sebastian Andrzej Siewior
On 2018-03-15 16:19:17 [+0100], Joerg Roedel wrote:
> Okay, if the irq-layer does the needed locking, then we don't need
> another lock here. There is the modify_irte_ga() path for the
> virtualized irq routing into KVM guests, but there should be no KVM
> guests running when setup the ioapic routing entries.

okay then. Let me rebase the queue on top whatever you have now and then
if the box still boots I will post them again.

>   Joerg

Sebastian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()

2018-03-15 Thread Joerg Roedel
On Thu, Mar 15, 2018 at 03:15:41PM +0100, Sebastian Andrzej Siewior wrote:
> ->set_allocated() operates only on 0…31 and other could be used at the
> same time. However 0…31 should be accessed by other user before they are
> ready.
> 
> irq_remapping_alloc() is that ->alloc() callback invoked via
> irq_domain_alloc_irqs_hierarchy() and each caller has to hold the
> _domain_mutex mutex. So we should not have those in parallel.
> 
> Is it possible to have those entries accessed before the setup is
> complete? My understanding is that this setup is performed once during
> boot (especially that ioapic part) and not again.
> 
> From looking at that hunk, it should not hurt to add that lock, just
> wanted to check it is really needed.

Okay, if the irq-layer does the needed locking, then we don't need
another lock here. There is the modify_irte_ga() path for the
virtualized irq routing into KVM guests, but there should be no KVM
guests running when setup the ioapic routing entries.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/omap: Increase group ref in .device_group()

2018-03-15 Thread Joerg Roedel
On Thu, Mar 01, 2018 at 07:22:08PM +0800, Jeffy Chen wrote:
> Increase group refcounting in omap_iommu_device_group().
> 
> Signed-off-by: Jeffy Chen 

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Dmitry Safonov via iommu
On Thu, 2018-03-15 at 14:34 +, Dmitry Safonov wrote:
> On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote:
> > On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote:
> > > So, you suggest to remove ratelimit at all?
> > > Do we really need printk flood for each happened fault?
> > > Imagine, you've hundreds of mappings and then PCI link flapped..
> > > Wouldn't it be better to keep ratelimiting?
> > > I don't mind, just it looks a bit strange to me.
> > 
> > I never said you should remove the ratelimiting, after all you are
> > trying to fix a soft-lockup, no?
> > 
> > And that should not be fixed by changes to the ratelimiting, but
> > with
> > proper irq handling.
> 
> Uh, I'm a bit confused then.
> - Isn't it better to ratelimit each printk() instead of bunch of
> printks inside irq handler?
> - I can limit the number of loops, but the most of the time is spent
> in
> the loop on printk() (on my machine ~170msec per loop), while
> everything else takes much lesser time (on my machine < 1 usec per
> loop). So, if I will limit number of loops per-irq, that cycle-limit
> will be based on limiting time spent on printk (e.g., how many
> printks
> to do in atomic context so that node will not lockup). It smells like
> ratelimiting, no?
> 
> I must be misunderstanding something, but why introducing another
> limit
> for number of printk() called when there is ratelimit which may be
> tuned..
> 

So I agree, that maybe better to have another limit to the cycle
*also*, because if we clean faults with the same speed as they're
generated by hw, we may stuck in the loop..
By on my measures clearing fault is so fast (< 1 usec), that I'm not
sure that it may happen with hw. By that reason I didn't introduce
loop-limit.

But even with loop-limit we will need ratelimit each printk() *also*.
Otherwise loop-limit will be based on time spent printing, not on
anything else..
The patch makes sense even with loop-limit in my opinion.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Dmitry Safonov via iommu
On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote:
> On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote:
> > So, you suggest to remove ratelimit at all?
> > Do we really need printk flood for each happened fault?
> > Imagine, you've hundreds of mappings and then PCI link flapped..
> > Wouldn't it be better to keep ratelimiting?
> > I don't mind, just it looks a bit strange to me.
> 
> I never said you should remove the ratelimiting, after all you are
> trying to fix a soft-lockup, no?
> 
> And that should not be fixed by changes to the ratelimiting, but with
> proper irq handling.

Uh, I'm a bit confused then.
- Isn't it better to ratelimit each printk() instead of bunch of
printks inside irq handler?
- I can limit the number of loops, but the most of the time is spent in
the loop on printk() (on my machine ~170msec per loop), while
everything else takes much lesser time (on my machine < 1 usec per
loop). So, if I will limit number of loops per-irq, that cycle-limit
will be based on limiting time spent on printk (e.g., how many printks
to do in atomic context so that node will not lockup). It smells like
ratelimiting, no?

I must be misunderstanding something, but why introducing another limit
for number of printk() called when there is ratelimit which may be
tuned..

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Joerg Roedel
On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote:
> So, you suggest to remove ratelimit at all?
> Do we really need printk flood for each happened fault?
> Imagine, you've hundreds of mappings and then PCI link flapped..
> Wouldn't it be better to keep ratelimiting?
> I don't mind, just it looks a bit strange to me.

I never said you should remove the ratelimiting, after all you are
trying to fix a soft-lockup, no?

And that should not be fixed by changes to the ratelimiting, but with
proper irq handling.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Fix a potential memory leak

2018-03-15 Thread Joerg Roedel
On Sat, Feb 24, 2018 at 01:42:27PM +0800, Lu Baolu wrote:
> A memory block was allocated in intel_svm_bind_mm() but never freed
> in a failure path. This patch fixes this by free it to avoid memory
> leakage.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc:  # v4.4+
> Signed-off-by: Lu Baolu 

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Dmitry Safonov via iommu
On Thu, 2018-03-15 at 14:46 +0100, Joerg Roedel wrote:
> On Thu, Feb 15, 2018 at 07:17:29PM +, Dmitry Safonov wrote:
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index accf58388bdb..6c4ea32ee6a9 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void
> > *dev_id)
> > int reg, fault_index;
> > u32 fault_status;
> > unsigned long flag;
> > -   bool ratelimited;
> > static DEFINE_RATELIMIT_STATE(rs,
> >   DEFAULT_RATELIMIT_INTERVAL,
> >   DEFAULT_RATELIMIT_BURST);
> >  
> > -   /* Disable printing, simply clear the fault when
> > ratelimited */
> > -   ratelimited = !__ratelimit();
> > -
> > raw_spin_lock_irqsave(>register_lock, flag);
> > fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> > -   if (fault_status && !ratelimited)
> > +   if (fault_status && __ratelimit())
> > pr_err("DRHD: handling fault status reg %x\n",
> > fault_status);
> 
> This looks aweful. Have you tried to limit the number of loops in
> this
> function and returning? You can handle the next faults by the next
> interrupt. This ensures that the cpu visits a scheduling point from
> time
> to time so that you don't see soft-lockups.

So, you suggest to remove ratelimit at all?
Do we really need printk flood for each happened fault?
Imagine, you've hundreds of mappings and then PCI link flapped..
Wouldn't it be better to keep ratelimiting?
I don't mind, just it looks a bit strange to me.

-- 
Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/rockchip: Perform a reset on shutdown

2018-03-15 Thread Joerg Roedel
On Tue, Feb 20, 2018 at 08:25:04PM +, Marc Zyngier wrote:
> Trying to do a kexec whilst the iommus are still on is proving to be
> a challenging exercise. It is terribly unsafe, as we're reusing the
> memory allocated for the page tables, leading to a likely crash.
> 
> Let's implement a shutdown method that will at least try to stop
> DMA from going crazy behind our back. Note that we need to be
> extra cautious when doing so, as the IOMMU may not be clocked
> if controlled by a another master, as typical on Rockchip system.
> 
> Suggested-by: Robin Murphy 
> Signed-off-by: Marc Zyngier 

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/5] Add debugfs info for the AMD IOMMU

2018-03-15 Thread Joerg Roedel
On Wed, Mar 14, 2018 at 06:04:44PM -0500, Gary R Hook wrote:
> Gary R Hook (5):
>   iommu/amd - Add debugfs support
>   iommu/amd - Add a 'verbose' switch for IOMMU debugfs
>   iommu/amd - Add a README variable for the IOMMU debugfs
>   iommu/amd - Expose the active IOMMU device table entries
>   iommu/amd - Add a debugfs entry to specify a IOMMU device table entry

Same problem here as with the Intel patches, I don't think it's a good
idea to reveal internal iommu data structures to user-space in this way.

I've debugged iommu issues for around 10 years now and never had the
need for an interface that reveals those internals. How exactly are you
planning to use this information?



Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/14] x86: use generic swiotlb_ops

2018-03-15 Thread Thomas Gleixner
On Thu, 15 Mar 2018, Christoph Hellwig wrote:

> On Thu, Mar 15, 2018 at 01:52:14PM +0100, Thomas Gleixner wrote:
> > Yeah, I know that the standard defines it, but that doesn't mean it makes
> > sense. At least not to me.
> 
> It makes sense in that it logically is a boolean but we only want
> to allocate 1 bit for it, unlike the normal ABI allocations of at
> least a byte.
>
> Either way, tell me what you want it changed to, and I'll do it.

I'd prefer either bool or a regular bitfield, but I can live with the
boolean bitfield as well.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing

2018-03-15 Thread Joerg Roedel
On Thu, Feb 15, 2018 at 07:17:29PM +, Dmitry Safonov wrote:
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index accf58388bdb..6c4ea32ee6a9 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
>   int reg, fault_index;
>   u32 fault_status;
>   unsigned long flag;
> - bool ratelimited;
>   static DEFINE_RATELIMIT_STATE(rs,
> DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
>  
> - /* Disable printing, simply clear the fault when ratelimited */
> - ratelimited = !__ratelimit();
> -
>   raw_spin_lock_irqsave(>register_lock, flag);
>   fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> - if (fault_status && !ratelimited)
> + if (fault_status && __ratelimit())
>   pr_err("DRHD: handling fault status reg %x\n", fault_status);

This looks aweful. Have you tried to limit the number of loops in this
function and returning? You can handle the next faults by the next
interrupt. This ensures that the cpu visits a scheduling point from time
to time so that you don't see soft-lockups.



Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/14] x86: use dma-direct

2018-03-15 Thread Thomas Gleixner
On Thu, 15 Mar 2018, Christoph Hellwig wrote:
> On Thu, Mar 15, 2018 at 01:53:52PM +0100, Thomas Gleixner wrote:
> > > > > The generic dma-direct implementation is now functionally equivalent 
> > > > > to
> > > > > the x86 nommu dma_map implementation, so switch over to using it.
> > > > 
> > > > Can you please convert the various drivers first and then remove the
> > > > unused code?
> > > 
> > > Which various drivers?
> > > 
> > > > > Note that the various iommu drivers are switched from 
> > > > > x86_dma_supported
> > 
> > The various iommu drivers 
> 
> It doesn't really make any sense to switch them separately.  They've
> inherited the ops from x86 which used to override it on an arch level,
> and we've now generalized the exactly same implementation to common code.

Fair enough.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/14] x86: use dma-direct

2018-03-15 Thread Christoph Hellwig
On Thu, Mar 15, 2018 at 01:53:52PM +0100, Thomas Gleixner wrote:
> > > > The generic dma-direct implementation is now functionally equivalent to
> > > > the x86 nommu dma_map implementation, so switch over to using it.
> > > 
> > > Can you please convert the various drivers first and then remove the
> > > unused code?
> > 
> > Which various drivers?
> > 
> > > > Note that the various iommu drivers are switched from x86_dma_supported
> 
> The various iommu drivers 

It doesn't really make any sense to switch them separately.  They've
inherited the ops from x86 which used to override it on an arch level,
and we've now generalized the exactly same implementation to common code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/14] x86: use generic swiotlb_ops

2018-03-15 Thread Christoph Hellwig
On Thu, Mar 15, 2018 at 01:52:14PM +0100, Thomas Gleixner wrote:
> Yeah, I know that the standard defines it, but that doesn't mean it makes
> sense. At least not to me.

It makes sense in that it logically is a boolean but we only want
to allocate 1 bit for it, unlike the normal ABI allocations of at
least a byte.

Either way, tell me what you want it changed to, and I'll do it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 0/5] Add Intel IOMMU debugfs support

2018-03-15 Thread Joerg Roedel
On Thu, Feb 15, 2018 at 08:38:11AM -0800, Jacob Pan wrote:
> Just wondering if your concern is on the implementation or the debugfs
> idea in general. Perhaps have some common IOMMU debugfs?

My concern mainly is that we add interfaces which reveal
potentially security relevant information to user-space and that tools
come up using it so that this also becomes kABI and we can't easily
change it anymore and this whole stuff turns into a maintence nightmare.

So that is definitly not something I'd like to see enabled in the
distros, and its better to avoid it at all and search for better ways to
debug upcoming issues.

BPF tracers and tracing in general comes to mind here...


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-15 Thread Robin Murphy

On 15/03/18 07:58, Christoph Hellwig wrote:

On Wed, Mar 14, 2018 at 05:43:46PM +, Robin Murphy wrote:

Looking back I don't really understand why we even indirect the "classic"
per-device dma_declare_coherent_memory use case through the DMA API.


It certainly makes sense for devices which can exist in both shared-memory
and device-local-memory configurations, so the driver doesn't have to care
about the details (particularly on mobile SoCs where the 'local' memory
might just be a chunk of system RAM reserved by the bootloader, and it's
just a matter of different use-cases on identical hardware).


Well, the "classic" case for me is memory buffers in the device.  Setting
some memory aside, either in a global pool as now done for arm-nommu
or even per-device as on some ARM SOCs is different indeed.

As far as I can tell the few devices that use 'local' memory always
use that.


I was thinking mostly of GPUs, where the same IP core might be available 
in a discrete PCIe chip with its own GDDR controller and on-board local 
RAM, and also in an APU/SoC-type configuration reliant on the CPU memory 
bus. But I guess in practice those drivers are already well beyond the 
generic DMA API and into complicated explicitly-managed stuff like TTM.



It seems like a pretty different use case to me.  In the USB case we
also have the following additional twist in that it doesn't even need
the mapping to be coherent.


I'm pretty sure it does (in the sense that it needs to ensure the arch code
makes the mapping non-cacheable), otherwise I can't see how the bouncing
could work properly. I think the last bit of the comment above
hcd_alloc_coherent() is a bit misleading.


Well, if it isn't marked non-cacheable we'd have to do dma_cache_sync
operations for it.  Which would probably still be faster than non-cacheable
mappings.


Ah, I see, we crossed wires there - the *current* implementation 
definitely requires a coherent mapping, but in general you're right that 
there are other ways to solve this particular problem which wouldn't. 
This case really wants to be able to overload the streaming map/unmap 
calls to automatically bounce through device-local memory, but I'm sure 
that's not worth the complexity or effort in general :)



So maybe for now the quick fix is to move the sleep check as suggested
earlier in this thread, but in the long run we probably need to do some
major rework of how dma_declare_coherent_memory and friends work.


Maybe; I do think the specific hcd_alloc_coherent() case could still be
fixed within the scope of the existing code, but it's not quite as clean
and straightforward as I first thought, and the practical impact of
tweaking the WARN should be effectively zero despite the theoretical edge
cases it opens up. Do you want me to write it up as a proper patch?


Yes.  Including a proper comment on why the might_sleep is placed there.


OK, will do.


My mid-term plan was to actually remove the gfp flags argument from
the dma alloc routines as is creates more confusion than it helps.
I guess this means we'll at least need to introduce a DMA_ATTR_NON_BLOCK
or similar flag instead then unfortunately.


At least we already have DMA_ATTR_NO_WARN, which could be implemented 
consistently to clean up the existing __GFP_NOWARN users.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()

2018-03-15 Thread Joerg Roedel
On Thu, Mar 15, 2018 at 02:01:53PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-03-15 13:53:42 [+0100], Joerg Roedel wrote:
> > On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote:
> > > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain 
> > > *domain, unsigned int virq,
> > >   return ret;
> > >  
> > >   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
> > > - if (get_irq_table(devid, true))
> > > + struct irq_remap_table *table;
> > > + struct amd_iommu *iommu;
> > > +
> > > + table = get_irq_table(devid);
> > > + if (table) {
> > > + if (!table->min_index) {
> > > + /*
> > > +  * Keep the first 32 indexes free for IOAPIC
> > > +  * interrupts.
> > > +  */
> > > + table->min_index = 32;
> > > + iommu = amd_iommu_rlookup_table[devid];
> > > + for (i = 0; i < 32; ++i)
> > > + iommu->irte_ops->set_allocated(table, 
> > > i);
> > > + }
> > 
> > I think this needs to be protected by the table->lock.
> 
> Which part any why? The !->min_index part plus extending(setting to 32)?

In particular the set_allocated part, when get_irq_table() returns the
table is visible to other users as well. I have not checked the
irq-layer locking to be sure that can happen, though.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()

2018-03-15 Thread Sebastian Andrzej Siewior
On 2018-03-15 13:53:42 [+0100], Joerg Roedel wrote:
> On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote:
> > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain 
> > *domain, unsigned int virq,
> > return ret;
> >  
> > if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
> > -   if (get_irq_table(devid, true))
> > +   struct irq_remap_table *table;
> > +   struct amd_iommu *iommu;
> > +
> > +   table = get_irq_table(devid);
> > +   if (table) {
> > +   if (!table->min_index) {
> > +   /*
> > +* Keep the first 32 indexes free for IOAPIC
> > +* interrupts.
> > +*/
> > +   table->min_index = 32;
> > +   iommu = amd_iommu_rlookup_table[devid];
> > +   for (i = 0; i < 32; ++i)
> > +   iommu->irte_ops->set_allocated(table, 
> > i);
> > +   }
> 
> I think this needs to be protected by the table->lock.

Which part any why? The !->min_index part plus extending(setting to 32)?

Sebastian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu/amd: lock splitting & GFP_KERNEL allocation

2018-03-15 Thread Joerg Roedel
Hi Sebastian,

On Fri, Feb 23, 2018 at 11:27:26PM +0100, Sebastian Andrzej Siewior wrote:
> I have no idea why but suddenly my A10 box complained loudly about
> locking and memory allocations within the iommu code under RT. Looking
> at the code it has been like this for a longer time so the iommu must
> have appeared recently (well there was a bios upgrade due to other
> issues so it might have enabled it).
> 
> The goal here is to make the memory allocation in get_irq_table() not
> with disabled interrupts and having as little raw_spin_lock as possible
> while having them if the caller is also holding one (like desc->lock
> during IRQ-affinity changes).
> 
> The patches were boot tested on an AMD EPYC 7601.

Thanks for these patches, I really like the cleanups and the improved
locking. Please rebase against x86/amd in the iommu branch and address
the other comment I have, then I put them into my tree.


Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table

2018-03-15 Thread Joerg Roedel
On Fri, Feb 23, 2018 at 11:27:34PM +0100, Sebastian Andrzej Siewior wrote:
> The irq_remap_table is allocated while the iommu_table_lock is held with
> interrupts disabled. While this works it makes RT scream very loudly.
> >From looking at the call sites, all callers are in the early device
> initialisation (apic_bsp_setup(), pci_enable_device(),
> pci_enable_msi()) so make sense to drop the lock which also enables
> interrupts and try to allocate that memory with GFP_KERNEL instead
> GFP_ATOMIC.
> 
> Since during the allocation the iommu_table_lock is dropped, we need to
> recheck if table exists after the lock has been reacquired. I *think*
> that it is impossible that the "devid" entry appears in irq_lookup_table
> while the lock is dropped since the same device can only be probed once.
> It is more likely that another device added an `alias' entry. However I
> check for both cases, just to be sure.
> 
> Signed-off-by: Sebastian Andrzej Siewior 

This looks like it conflicts with the x86/amd branch as well.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock

2018-03-15 Thread Joerg Roedel
On Fri, Feb 23, 2018 at 11:27:30PM +0100, Sebastian Andrzej Siewior wrote:
> The function get_irq_table() reads/writes irq_lookup_table while holding
> the amd_iommu_devtable_lock. It also modifies
> amd_iommu_dev_table[].data[2].
> set_dte_entry() is using amd_iommu_dev_table[].data[0|1] (under the
> domain->lock) so it should be okay. The access to the iommu is
> serialized with its own (iommu's) lock.
> 
> So split out get_irq_table() out of amd_iommu_devtable_lock's lock. The
> new lock is a raw_spin_lock because modify_irte_ga() is called while
> desc->lock is held (which is raw).
> 
> Signed-off-by: Sebastian Andrzej Siewior 

This patch likely conflicts with changes already in the iommu-tree, as
it contains a patch splitting the get_irq_table() function. Can
you rebase your series against the x86/amd branch?


Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/14] x86: use dma-direct

2018-03-15 Thread Thomas Gleixner
On Thu, 15 Mar 2018, Christoph Hellwig wrote:

> On Thu, Mar 15, 2018 at 09:56:13AM +0100, Thomas Gleixner wrote:
> > On Wed, 14 Mar 2018, Christoph Hellwig wrote:
> > 
> > > The generic dma-direct implementation is now functionally equivalent to
> > > the x86 nommu dma_map implementation, so switch over to using it.
> > 
> > Can you please convert the various drivers first and then remove the
> > unused code?
> 
> Which various drivers?
> 
> > > Note that the various iommu drivers are switched from x86_dma_supported

The various iommu drivers 

> > > to dma_direct_supported to provide identical functionality, although the
> > > checks looks fairly questionable for at least some of them.
> > 
> > Can you please elaborate? From the above it's not clear which checks you
> > are referring to. If you convert these drivers seperately then explicit
> > information about your concerns wants to be in the changelogs.
> 
> This bit:
> 
>   /* Copied from i386. Doesn't make much sense, because it will
>  only work for pci_alloc_coherent.
>  The caller just has to use GFP_DMA in this case. */
>   if (mask < DMA_BIT_MASK(24))
>   return 0;
> 
> in x86_dma_supported, or the equivalent bit in dma_direct_supported.
> Kept for bug to bug compatibility, but I guess I should reword or
> just drop the changelog bit іf it causes confusion.

Reword please.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()

2018-03-15 Thread Joerg Roedel
On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote:
> @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain 
> *domain, unsigned int virq,
>   return ret;
>  
>   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
> - if (get_irq_table(devid, true))
> + struct irq_remap_table *table;
> + struct amd_iommu *iommu;
> +
> + table = get_irq_table(devid);
> + if (table) {
> + if (!table->min_index) {
> + /*
> +  * Keep the first 32 indexes free for IOAPIC
> +  * interrupts.
> +  */
> + table->min_index = 32;
> + iommu = amd_iommu_rlookup_table[devid];
> + for (i = 0; i < 32; ++i)
> + iommu->irte_ops->set_allocated(table, 
> i);
> + }

I think this needs to be protected by the table->lock.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/14] x86: use generic swiotlb_ops

2018-03-15 Thread Thomas Gleixner
On Thu, 15 Mar 2018, Christoph Hellwig wrote:

> On Thu, Mar 15, 2018 at 10:00:57AM +0100, Thomas Gleixner wrote:
> > On Wed, 14 Mar 2018, Christoph Hellwig wrote:
> > >  #if defined(CONFIG_INTEL_IOMMU) || defined(CONFIG_AMD_IOMMU)
> > >   void *iommu; /* hook for IOMMU specific extension */
> > >  #endif
> > > +#ifdef CONFIG_STA2X11
> > > + bool is_sta2x11 : 1;
> > 
> > Huch? Please use either bool or an unsigned int based bitfield. A boolean
> > bitfield doesn't make sense.
> 
> bool bitfields are perfectly valid in C99 and used in various places in
> the kernel. But if you want either bool or an unsigned bitfield let me
> know and I'll switch it over.

Yeah, I know that the standard defines it, but that doesn't mean it makes
sense. At least not to me.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/amd: Add support for fast IOTLB flushing

2018-03-15 Thread Joerg Roedel
Hi Suravee,

On Mon, Mar 05, 2018 at 08:08:21AM +0700, Suravee Suthikulpanit wrote:
> Ping..
> 
> Joerg, when you get a chance, would you please let me know if you have any 
> other concerns for this v4.

Sorry for the long delay. The patch is fine, I applied it to the
iommu-tree.

Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 2/5] dt-bindings: brcm: Add reserved-dma-region for iPROC

2018-03-15 Thread Jitendra Bhivare via iommu
On Thu, Mar 15, 2018 at 12:15 AM, Robin Murphy  wrote:
> On 12/03/18 07:03, Jitendra Bhivare wrote:
>>
>> On Tue, Mar 6, 2018 at 5:12 PM, Robin Murphy  wrote:
>>>
>>> On 06/03/18 04:59, Jitendra Bhivare wrote:


 With SoC wide DMA mask of 40-bit, the mappings for entire IOVA space
 can't
 be specified in the PAXBv2 PCIe RC of SoC. The holes in IOVA space needs
 to
 be reserved to prevent any IOVA allocations in those spaces.
>>>
>>>
>>>
>>> Can you clarify why? If this is the PCI inbound window thing again, let
>>> me
>>> say once again that "dma-ranges" is the appropriate way for DT to
>>> describe
>>> the hardware.
>>>
>>> Robin.
>>
>> dma-ranges = < \
>> 0x4300 0x00 0x 0x00 0x8000 0x00 0x8000 \
>> 0x4300 0x08 0x 0x08 0x 0x08 0x \
>> 0x4300 0x80 0x 0x80 0x 0x80 0x>
>>
>> Yes, its for PCI inbound windows. In our HW, they are limited by sizes
>> specified in
>> ARM memory maps which was done for non-IOMMU cases to catch any transfer
>> outside the ranges.
>> dma-ranges are already being used to program these inbound windows and SoC
>> wide DMA mask is already specified but IOMMU code can still allocate IOVAs
>> in the gaps for which translation will fail in PCIe RC.
>
>
> Right, so make iommu-dma reserve the gaps. No need to clutter up the DT with
> redundant information which gets handled pretty much identically anyway.
>
> Robin.
There was a mistake in pasting dma-ranges mentioned. This is what we use it in
IOMMU case.
#define PCIE_DMA_RANGES dma-ranges = < \
0x4300 0x00 0x 0x00 0x 0x04 0x \
0x4300 0x08 0x 0x08 0x 0x08 0x \
0x4300 0x80 0x 0x80 0x 0x80 0x>

#define PCIE_RESERVED_DMA_REGION reserved-dma-region = < \
0x0 0x0 0x04 0x0 0x04 0x0 \
0x0 0x0 0x10 0x0 0x70 0x0>

This is non-iommu case which is per ARM memory map.
#define PCIE_DMA_RANGES dma-ranges = < \
0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \
0x4300 0x08 0x 0x08 0x 0x08 0x \
0x4300 0x80 0x 0x80 0x 0x40 0x>

In IOMMU case, we had to increase first in-bound window i.e.
dma-ranges first entry
from 2GB-4GB (in non-iommu case) to 0-16GB because out-bound window specified
from 2GB-4GB implicitly gets reserved by the PCI iommu code. This allowed IOVA
allocations from 0-2GB and 4-16GB mapped through in-bound windows.

Basically, my point is that dma-ranges specified is being used to
program our in-bound
windows and defines PCI space to CPU address space supported mappings of HW.
In the same way, it would be better to explicitly specify
reserved-dma-region to clarify
this as HW hole than implicity reserving the regions.

We can at least have single property 'reserved-dma-window' for MSI
region and DMA regions
to help describe the holes in the HW than let SW compute it.

>>>
>>>
 reserved-dma-region property is added to specify the ranges which should
 never be mapped and given to devices sitting behind.

 Reviewed-by: Ray Jui 
 Reviewed-by: Vikram Prakash 
 Reviewed-by: Scott Branden 
 Signed-off-by: Jitendra Bhivare 
 ---
Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt | 3 +++
1 file changed, 3 insertions(+)

 diff --git a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
 b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
 index b8e48b4..3be0fe3 100644
 --- a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
 +++ b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
 @@ -30,6 +30,9 @@ Optional properties:
- dma-ranges: Some PAXB-based root complexes do not have inbound
 mapping
 done
  by the ASIC after power on reset.  In this case, SW is required to
 configure
the mapping, based on inbound memory regions specified by this
 property.
 +- reserved-dma-region: PAXBv2 with IOMMU enabled cannot provide
 mappings
 for
 +  entire IOVA space specified by DMA mask. Hence this is used to
 reserve
 the
 +  gaps in dma-ranges.
  - brcm,pcie-ob: Some iProc SoCs do not have the outbound address
 mapping done
by the ASIC after power on reset. In this case, SW needs to configure
 it

>>>
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/14] x86: use generic swiotlb_ops

2018-03-15 Thread Christoph Hellwig
On Thu, Mar 15, 2018 at 10:00:57AM +0100, Thomas Gleixner wrote:
> On Wed, 14 Mar 2018, Christoph Hellwig wrote:
> >  #if defined(CONFIG_INTEL_IOMMU) || defined(CONFIG_AMD_IOMMU)
> > void *iommu; /* hook for IOMMU specific extension */
> >  #endif
> > +#ifdef CONFIG_STA2X11
> > +   bool is_sta2x11 : 1;
> 
> Huch? Please use either bool or an unsigned int based bitfield. A boolean
> bitfield doesn't make sense.

bool bitfields are perfectly valid in C99 and used in various places in
the kernel. But if you want either bool or an unsigned bitfield let me
know and I'll switch it over.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/14] x86: use dma-direct

2018-03-15 Thread Christoph Hellwig
On Thu, Mar 15, 2018 at 09:56:13AM +0100, Thomas Gleixner wrote:
> On Wed, 14 Mar 2018, Christoph Hellwig wrote:
> 
> > The generic dma-direct implementation is now functionally equivalent to
> > the x86 nommu dma_map implementation, so switch over to using it.
> 
> Can you please convert the various drivers first and then remove the
> unused code?

Which various drivers?

> > Note that the various iommu drivers are switched from x86_dma_supported
> > to dma_direct_supported to provide identical functionality, although the
> > checks looks fairly questionable for at least some of them.
> 
> Can you please elaborate? From the above it's not clear which checks you
> are referring to. If you convert these drivers seperately then explicit
> information about your concerns wants to be in the changelogs.

This bit:

/* Copied from i386. Doesn't make much sense, because it will
   only work for pci_alloc_coherent.
   The caller just has to use GFP_DMA in this case. */
if (mask < DMA_BIT_MASK(24))
return 0;

in x86_dma_supported, or the equivalent bit in dma_direct_supported.
Kept for bug to bug compatibility, but I guess I should reword or
just drop the changelog bit іf it causes confusion.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

2018-03-15 Thread Robin Murphy

On 15/03/18 08:57, Vivek Gautam wrote:

Hi Robin,


On Wed, Mar 14, 2018 at 11:20 PM, Robin Murphy  wrote:

On 13/03/18 08:55, Vivek Gautam wrote:


From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---
   drivers/iommu/arm-smmu.c | 29 +
   1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 56a04ae80bf3..64953ff2281f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev)
 iommu_device_link(>iommu, dev);
   + if (pm_runtime_enabled(smmu->dev)) {
+   struct device_link *link;
+
+   /*
+* Establish the link between smmu and master, so that the
+* smmu gets runtime enabled/disabled as per the master's
+* needs.
+*/
+   link = device_link_add(dev, smmu->dev,
DL_FLAG_PM_RUNTIME);
+   if (!link) {



FWIW, given that we don't really care about link itself, I'd be quite happy
to simplify that lot down to:

 if (pm_runtime_enabled(smmu_dev) &&
 !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) {


Sure, will update this.




+   dev_warn(smmu->dev,
+"Unable to add link to the consumer
%s\n",
+dev_name(dev));



(side note: since device_link_add() already prints a message on success,
maybe it could print its own failure message too?)


Should we make device_link that verbose - to print failure messages at
each step (there are atleast a couple where we return link as NULL),
or we can let the users handle printing the message?


I didn't mean to imply anything more than the idea below (although the 
whole function could of course be refactored further to report explicit 
error values). It just seems a bit unbalanced for the core code to be 
noisy about success yet silent about failure, but ultimately that's an 
entirely separate issue which doesn't have to have any bearing on this 
series.


Robin.

->8-
diff --git a/drivers/base/core.c b/drivers/base/core.c
index b2261f92f2f1..895da95f5cb9 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -275,6 +275,9 @@ struct device_link *device_link_add(struct device 
*consumer,

  out:
device_pm_unlock();
device_links_write_unlock();
+   if (!link)
+		dev_warn(consumer, "Failed to link to supplier %s\n", 
dev_name(supplier));

+
return link;
 }
 EXPORT_SYMBOL_GPL(device_link_add);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

2018-03-15 Thread Robin Murphy

On 15/03/18 06:18, Tomasz Figa wrote:

On Thu, Mar 15, 2018 at 2:50 AM, Robin Murphy  wrote:

On 13/03/18 08:55, Vivek Gautam wrote:


From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---
   drivers/iommu/arm-smmu.c | 29 +
   1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 56a04ae80bf3..64953ff2281f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev)
 iommu_device_link(>iommu, dev);
   + if (pm_runtime_enabled(smmu->dev)) {
+   struct device_link *link;
+
+   /*
+* Establish the link between smmu and master, so that the
+* smmu gets runtime enabled/disabled as per the master's
+* needs.
+*/
+   link = device_link_add(dev, smmu->dev,
DL_FLAG_PM_RUNTIME);
+   if (!link) {



FWIW, given that we don't really care about link itself, I'd be quite happy
to simplify that lot down to:

 if (pm_runtime_enabled(smmu_dev) &&
 !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) {


+   dev_warn(smmu->dev,
+"Unable to add link to the consumer
%s\n",
+dev_name(dev));



(side note: since device_link_add() already prints a message on success,
maybe it could print its own failure message too?)


I think we care whether adding the link succeeded. If it fails to be
added, we might end up with a complete system lockup on a system with
power domains.


Well, yeah, that was implicit - the point is that we *only* care about 
whether it succeeded or not. Thus we may as well just check for NULL 
directly instead of assigning the value as if we were actually going to 
do anything with it.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/14] x86: use generic swiotlb_ops

2018-03-15 Thread Thomas Gleixner
On Wed, 14 Mar 2018, Christoph Hellwig wrote:
>  #if defined(CONFIG_INTEL_IOMMU) || defined(CONFIG_AMD_IOMMU)
>   void *iommu; /* hook for IOMMU specific extension */
>  #endif
> +#ifdef CONFIG_STA2X11
> + bool is_sta2x11 : 1;

Huch? Please use either bool or an unsigned int based bitfield. A boolean
bitfield doesn't make sense.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

2018-03-15 Thread Vivek Gautam
Hi Robin,


On Wed, Mar 14, 2018 at 11:20 PM, Robin Murphy  wrote:
> On 13/03/18 08:55, Vivek Gautam wrote:
>>
>> From: Sricharan R 
>>
>> Finally add the device link between the master device and
>> smmu, so that the smmu gets runtime enabled/disabled only when the
>> master needs it. This is done from add_device callback which gets
>> called once when the master is added to the smmu.
>>
>> Signed-off-by: Sricharan R 
>> Signed-off-by: Vivek Gautam 
>> Reviewed-by: Tomasz Figa 
>> ---
>>   drivers/iommu/arm-smmu.c | 29 +
>>   1 file changed, 29 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 56a04ae80bf3..64953ff2281f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev)
>> iommu_device_link(>iommu, dev);
>>   + if (pm_runtime_enabled(smmu->dev)) {
>> +   struct device_link *link;
>> +
>> +   /*
>> +* Establish the link between smmu and master, so that the
>> +* smmu gets runtime enabled/disabled as per the master's
>> +* needs.
>> +*/
>> +   link = device_link_add(dev, smmu->dev,
>> DL_FLAG_PM_RUNTIME);
>> +   if (!link) {
>
>
> FWIW, given that we don't really care about link itself, I'd be quite happy
> to simplify that lot down to:
>
> if (pm_runtime_enabled(smmu_dev) &&
> !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) {

Sure, will update this.

>
>> +   dev_warn(smmu->dev,
>> +"Unable to add link to the consumer
>> %s\n",
>> +dev_name(dev));
>
>
> (side note: since device_link_add() already prints a message on success,
> maybe it could print its own failure message too?)

Should we make device_link that verbose - to print failure messages at
each step (there are atleast a couple where we return link as NULL),
or we can let the users handle printing the message?

regards
Vivek

>
> Robin.
>
>
>> +   ret = -ENODEV;
>> +   goto out_unlink;
>> +   }
>> +   }
>> +
>> arm_smmu_rpm_put(smmu);
>> return 0;
>>   +out_unlink:
>> +   iommu_device_unlink(>iommu, dev);
>> +   arm_smmu_master_free_smes(fwspec);
>>   out_rpm_put:
>> arm_smmu_rpm_put(smmu);
>>   out_cfg_free:
>> @@ -1486,6 +1507,14 @@ static void arm_smmu_remove_device(struct device
>> *dev)
>> cfg  = fwspec->iommu_priv;
>> smmu = cfg->smmu;
>>   + if (pm_runtime_enabled(smmu->dev)) {
>> +   struct device_link *link;
>> +
>> +   link = device_link_find(dev, smmu->dev);
>> +   if (link)
>> +   device_link_del(link);
>> +   }
>> +
>> ret = arm_smmu_rpm_get(smmu);
>> if (ret < 0)
>> return;
>>
>



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/14] x86: use dma-direct

2018-03-15 Thread Thomas Gleixner
On Wed, 14 Mar 2018, Christoph Hellwig wrote:

> The generic dma-direct implementation is now functionally equivalent to
> the x86 nommu dma_map implementation, so switch over to using it.

Can you please convert the various drivers first and then remove the
unused code?

> Note that the various iommu drivers are switched from x86_dma_supported
> to dma_direct_supported to provide identical functionality, although the
> checks looks fairly questionable for at least some of them.

Can you please elaborate? From the above it's not clear which checks you
are referring to. If you convert these drivers seperately then explicit
information about your concerns wants to be in the changelogs.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-15 Thread Christoph Hellwig
On Wed, Mar 14, 2018 at 05:43:46PM +, Robin Murphy wrote:
>> Looking back I don't really understand why we even indirect the "classic"
>> per-device dma_declare_coherent_memory use case through the DMA API.
>
> It certainly makes sense for devices which can exist in both shared-memory 
> and device-local-memory configurations, so the driver doesn't have to care 
> about the details (particularly on mobile SoCs where the 'local' memory 
> might just be a chunk of system RAM reserved by the bootloader, and it's 
> just a matter of different use-cases on identical hardware).

Well, the "classic" case for me is memory buffers in the device.  Setting
some memory aside, either in a global pool as now done for arm-nommu
or even per-device as on some ARM SOCs is different indeed.

As far as I can tell the few devices that use 'local' memory always
use that.

>> It seems like a pretty different use case to me.  In the USB case we
>> also have the following additional twist in that it doesn't even need
>> the mapping to be coherent.
>
> I'm pretty sure it does (in the sense that it needs to ensure the arch code 
> makes the mapping non-cacheable), otherwise I can't see how the bouncing 
> could work properly. I think the last bit of the comment above 
> hcd_alloc_coherent() is a bit misleading.

Well, if it isn't marked non-cacheable we'd have to do dma_cache_sync
operations for it.  Which would probably still be faster than non-cacheable
mappings.

>> So maybe for now the quick fix is to move the sleep check as suggested
>> earlier in this thread, but in the long run we probably need to do some
>> major rework of how dma_declare_coherent_memory and friends work.
>
> Maybe; I do think the specific hcd_alloc_coherent() case could still be 
> fixed within the scope of the existing code, but it's not quite as clean 
> and straightforward as I first thought, and the practical impact of 
> tweaking the WARN should be effectively zero despite the theoretical edge 
> cases it opens up. Do you want me to write it up as a proper patch?

Yes.  Including a proper comment on why the might_sleep is placed there.

My mid-term plan was to actually remove the gfp flags argument from
the dma alloc routines as is creates more confusion than it helps.
I guess this means we'll at least need to introduce a DMA_ATTR_NON_BLOCK
or similar flag instead then unfortunately.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-15 Thread Tomasz Figa
On Thu, Mar 15, 2018 at 2:46 AM, Robin Murphy  wrote:
> On 13/03/18 08:55, Vivek Gautam wrote:
>>
>> From: Sricharan R 
>>
>> The smmu device probe/remove and add/remove master device callbacks
>> gets called when the smmu is not linked to its master, that is without
>> the context of the master device. So calling runtime apis in those places
>> separately.
>>
>> Signed-off-by: Sricharan R 
>> [vivek: Cleanup pm runtime calls]
>> Signed-off-by: Vivek Gautam 
>> Reviewed-by: Tomasz Figa 
>> ---
>>   drivers/iommu/arm-smmu.c | 95
>> 
>>   1 file changed, 87 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index d5873d545024..56a04ae80bf3 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -268,6 +268,20 @@ static struct arm_smmu_option_prop arm_smmu_options[]
>> = {
>> { 0, NULL},
>>   };
>>   +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
>> +{
>> +   if (pm_runtime_enabled(smmu->dev))
>> +   return pm_runtime_get_sync(smmu->dev);
>> +
>> +   return 0;
>> +}
>> +
>> +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
>> +{
>> +   if (pm_runtime_enabled(smmu->dev))
>> +   pm_runtime_put(smmu->dev);
>> +}
>> +
>>   static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
>>   {
>> return container_of(dom, struct arm_smmu_domain, domain);
>> @@ -913,11 +927,15 @@ static void arm_smmu_destroy_domain_context(struct
>> iommu_domain *domain)
>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> struct arm_smmu_device *smmu = smmu_domain->smmu;
>> struct arm_smmu_cfg *cfg = _domain->cfg;
>> -   int irq;
>> +   int ret, irq;
>> if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
>> return;
>>   + ret = arm_smmu_rpm_get(smmu);
>> +   if (ret < 0)
>> +   return;
>> +
>> /*
>>  * Disable the context bank and free the page tables before
>> freeing
>>  * it.
>> @@ -932,6 +950,8 @@ static void arm_smmu_destroy_domain_context(struct
>> iommu_domain *domain)
>> free_io_pgtable_ops(smmu_domain->pgtbl_ops);
>> __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
>> +
>> +   arm_smmu_rpm_put(smmu);
>>   }
>> static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>> @@ -1213,10 +1233,15 @@ static int arm_smmu_attach_dev(struct iommu_domain
>> *domain, struct device *dev)
>> return -ENODEV;
>> smmu = fwspec_smmu(fwspec);
>> +
>> +   ret = arm_smmu_rpm_get(smmu);
>> +   if (ret < 0)
>> +   return ret;
>> +
>> /* Ensure that the domain is finalised */
>> ret = arm_smmu_init_domain_context(domain, smmu);
>> if (ret < 0)
>> -   return ret;
>> +   goto rpm_put;
>> /*
>>  * Sanity check the domain. We don't support domains across
>> @@ -1230,29 +1255,47 @@ static int arm_smmu_attach_dev(struct iommu_domain
>> *domain, struct device *dev)
>> }
>> /* Looks ok, so add the device to the domain */
>> -   return arm_smmu_domain_add_master(smmu_domain, fwspec);
>> +   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
>> +
>> +rpm_put:
>> +   arm_smmu_rpm_put(smmu);
>> +   return ret;
>>   }
>> static int arm_smmu_map(struct iommu_domain *domain, unsigned long
>> iova,
>> phys_addr_t paddr, size_t size, int prot)
>>   {
>> struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +   struct arm_smmu_device *smmu = smmu_domain->smmu;
>
>
> Nit: please use arm_smmu_domain for ops as well (as it was before
> 523d7423e21b), or consistently elide it for smmu - the mixture of both
> methods is just a horrible mess (here and in unmap).
>
>
>> +   int ret;
>> if (!ops)
>> return -ENODEV;
>>   - return ops->map(ops, iova, paddr, size, prot);
>> +   arm_smmu_rpm_get(smmu);
>> +   ret = ops->map(ops, iova, paddr, size, prot);
>> +   arm_smmu_rpm_put(smmu);
>> +
>> +   return ret;
>>   }
>> static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>> long iova,
>>  size_t size)
>>   {
>> struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +   struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +   size_t ret;
>> if (!ops)
>> return 0;
>>   - return ops->unmap(ops, iova, size);
>> +   arm_smmu_rpm_get(smmu);
>> +   ret = ops->unmap(ops, iova, size);
>> +   

Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

2018-03-15 Thread Tomasz Figa
On Thu, Mar 15, 2018 at 2:50 AM, Robin Murphy  wrote:
> On 13/03/18 08:55, Vivek Gautam wrote:
>>
>> From: Sricharan R 
>>
>> Finally add the device link between the master device and
>> smmu, so that the smmu gets runtime enabled/disabled only when the
>> master needs it. This is done from add_device callback which gets
>> called once when the master is added to the smmu.
>>
>> Signed-off-by: Sricharan R 
>> Signed-off-by: Vivek Gautam 
>> Reviewed-by: Tomasz Figa 
>> ---
>>   drivers/iommu/arm-smmu.c | 29 +
>>   1 file changed, 29 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 56a04ae80bf3..64953ff2281f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev)
>> iommu_device_link(>iommu, dev);
>>   + if (pm_runtime_enabled(smmu->dev)) {
>> +   struct device_link *link;
>> +
>> +   /*
>> +* Establish the link between smmu and master, so that the
>> +* smmu gets runtime enabled/disabled as per the master's
>> +* needs.
>> +*/
>> +   link = device_link_add(dev, smmu->dev,
>> DL_FLAG_PM_RUNTIME);
>> +   if (!link) {
>
>
> FWIW, given that we don't really care about link itself, I'd be quite happy
> to simplify that lot down to:
>
> if (pm_runtime_enabled(smmu_dev) &&
> !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) {
>
>> +   dev_warn(smmu->dev,
>> +"Unable to add link to the consumer
>> %s\n",
>> +dev_name(dev));
>
>
> (side note: since device_link_add() already prints a message on success,
> maybe it could print its own failure message too?)

I think we care whether adding the link succeeded. If it fails to be
added, we might end up with a complete system lockup on a system with
power domains.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu