Re: [PATCH 0/3] mmu_notifier: Allow to manage CPU external TLBs

2014-07-24 Thread Andrew Morton
On Thu, 24 Jul 2014 16:35:38 +0200 Joerg Roedel  wrote:

> here is a patch-set to extend the mmu_notifiers in the Linux
> kernel to allow managing CPU external TLBs. Those TLBs may
> be implemented in IOMMUs or any other external device, e.g.
> ATS/PRI capable PCI devices.
> 
> The problem with managing these TLBs are the semantics of
> the invalidate_range_start/end call-backs currently
> available. Currently the subsystem using mmu_notifiers has
> to guarantee that no new TLB entries are established between
> invalidate_range_start/end. Furthermore the
> invalidate_range_start() function is called when all pages
> are still mapped and invalidate_range_end() when the pages
> are unmapped an already freed.
> 
> So both call-backs can't be used to safely flush any non-CPU
> TLB because _start() is called too early and _end() too
> late.
> 
> In the AMD IOMMUv2 driver this is currently implemented by
> assigning an empty page-table to the external device between
> _start() and _end(). But as tests have shown this doesn't
> work as external devices don't re-fault infinitly but enter
> a failure state after some time.
> 
> Next problem with this solution is that it causes an
> interrupt storm for IO page faults to be handled when an
> empty page-table is assigned.
> 
> To solve this situation I wrote a patch-set to introduce a
> new notifier call-back: mmu_notifer_invalidate_range(). This
> notifier lifts the strict requirements that no new
> references are taken in the range between _start() and
> _end(). When the subsystem can't guarantee that any new
> references are taken is has to provide the
> invalidate_range() call-back to clear any new references in
> there.
> 
> It is called between invalidate_range_start() and _end()
> every time the VMM has to wipe out any references to a
> couple of pages. This are usually the places where the CPU
> TLBs are flushed too and where its important that this
> happens before invalidate_range_end() is called.
> 
> Any comments and review appreciated!

It looks pretty simple and harmless.

I assume the AMD IOMMUv2 driver actually uses this and it's all
tested and good?  What is the status of that driver?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iommu/ipmmu-vmsa: Invalidate TLB after unmapping

2014-07-24 Thread Laurent Pinchart
The TLB must be invalidated after unmapping memory to remove stale TLB
entries. this was supposed to be performed already, but a bug in the
driver prevented the TLB invalidate function from being called. Fix it.

Signed-off-by: Laurent Pinchart 
---
 drivers/iommu/ipmmu-vmsa.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index eb605e5..a010e47 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -784,7 +784,6 @@ static int ipmmu_clear_mapping(struct ipmmu_vmsa_domain 
*domain,
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   int ret = 0;
 
if (!pgd)
return -EINVAL;
@@ -846,8 +845,7 @@ static int ipmmu_clear_mapping(struct ipmmu_vmsa_domain 
*domain,
 done:
spin_unlock_irqrestore(&domain->lock, flags);
 
-   if (ret)
-   ipmmu_tlb_invalidate(domain);
+   ipmmu_tlb_invalidate(domain);
 
return 0;
 }
-- 
1.8.5.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] iommu/ipmmu-vmsa: Cleanup failures of ARM mapping creation or attachment

2014-07-24 Thread Laurent Pinchart
The ARM IOMMU mapping needs to be released when attaching the device
fails. Add arm_iommu_release_mapping() to the error code path. This is
safe to call with a NULL mapping, so no specific check is needed.

Cleanup is also missing when failing to create a mapping. Jump to the
error code path in that case instead of returning immediately.

Signed-off-by: Laurent Pinchart 
---
 drivers/iommu/ipmmu-vmsa.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 53cde08..7dc77a6 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -1090,7 +1090,8 @@ static int ipmmu_add_device(struct device *dev)
   SZ_1G, SZ_2G);
if (IS_ERR(mapping)) {
dev_err(mmu->dev, "failed to create ARM IOMMU 
mapping\n");
-   return PTR_ERR(mapping);
+   ret = PTR_ERR(mapping);
+   goto error;
}
 
mmu->mapping = mapping;
@@ -1106,6 +1107,7 @@ static int ipmmu_add_device(struct device *dev)
return 0;
 
 error:
+   arm_iommu_release_mapping(mmu->mapping);
kfree(dev->archdata.iommu);
dev->archdata.iommu = NULL;
iommu_group_remove_device(dev);
-- 
1.8.5.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu/ipmmu-vmsa: Flush P[UM]D entry before freeing the child page table

2014-07-24 Thread Laurent Pinchart
When clearing PUD or PMD entries the child page table (if any) is freed
and the PUD or PMD entry is then cleared. This result in a small race
condition window during which a free page table could be accessed by the
IPMMU.

Fix it by clearing and flushing the PUD or PMD entry before freeing the
child page table.

Signed-off-by: Laurent Pinchart 
---
 drivers/iommu/ipmmu-vmsa.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 7dc77a6..eb605e5 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -678,30 +678,32 @@ done:
 
 static void ipmmu_clear_pud(struct ipmmu_vmsa_device *mmu, pud_t *pud)
 {
-   /* Free the page table. */
pgtable_t table = pud_pgtable(*pud);
-   __free_page(table);
 
/* Clear the PUD. */
*pud = __pud(0);
ipmmu_flush_pgtable(mmu, pud, sizeof(*pud));
+
+   /* Free the page table. */
+   __free_page(table);
 }
 
 static void ipmmu_clear_pmd(struct ipmmu_vmsa_device *mmu, pud_t *pud,
pmd_t *pmd)
 {
+   pmd_t pmdval = *pmd;
unsigned int i;
 
-   /* Free the page table. */
-   if (pmd_table(*pmd)) {
-   pgtable_t table = pmd_pgtable(*pmd);
-   __free_page(table);
-   }
-
/* Clear the PMD. */
*pmd = __pmd(0);
ipmmu_flush_pgtable(mmu, pmd, sizeof(*pmd));
 
+   /* Free the page table. */
+   if (pmd_table(pmdval)) {
+   pgtable_t table = pmd_pgtable(pmdval);
+   __free_page(table);
+   }
+
/* Check whether the PUD is still needed. */
pmd = pmd_offset(pud, 0);
for (i = 0; i < IPMMU_PTRS_PER_PMD; ++i) {
-- 
1.8.5.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] Renesas IPMMU-VMSA fixes

2014-07-24 Thread Laurent Pinchart
Hello,

This series fixes small issues with the ipmmu-vmsa driver. Please see
individual patches for details.

If not too late, I'd like these fixes to be merged in v3.17 (after proper
review of course, or, in the worst case, after lack of review).

Laurent Pinchart (3):
  iommu/ipmmu-vmsa: Cleanup failures of ARM mapping creation or
attachment
  iommu/ipmmu-vmsa: Flush P[UM]D entry before freeing the child page
table
  iommu/ipmmu-vmsa: Invalidate TLB after unmapping

 drivers/iommu/ipmmu-vmsa.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] mmu_notifier: Allow to manage CPU external TLBs

2014-07-24 Thread Andrea Arcangeli
On Thu, Jul 24, 2014 at 04:35:38PM +0200, Joerg Roedel wrote:
> To solve this situation I wrote a patch-set to introduce a
> new notifier call-back: mmu_notifer_invalidate_range(). This
> notifier lifts the strict requirements that no new
> references are taken in the range between _start() and
> _end(). When the subsystem can't guarantee that any new
> references are taken is has to provide the
> invalidate_range() call-back to clear any new references in
> there.
> 
> It is called between invalidate_range_start() and _end()
> every time the VMM has to wipe out any references to a
> couple of pages. This are usually the places where the CPU
> TLBs are flushed too and where its important that this
> happens before invalidate_range_end() is called.
> 
> Any comments and review appreciated!

Reviewed-by: Andrea Arcangeli 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] mmu_notifier: Allow to manage CPU external TLBs

2014-07-24 Thread Joerg Roedel
Hi,

here is a patch-set to extend the mmu_notifiers in the Linux
kernel to allow managing CPU external TLBs. Those TLBs may
be implemented in IOMMUs or any other external device, e.g.
ATS/PRI capable PCI devices.

The problem with managing these TLBs are the semantics of
the invalidate_range_start/end call-backs currently
available. Currently the subsystem using mmu_notifiers has
to guarantee that no new TLB entries are established between
invalidate_range_start/end. Furthermore the
invalidate_range_start() function is called when all pages
are still mapped and invalidate_range_end() when the pages
are unmapped an already freed.

So both call-backs can't be used to safely flush any non-CPU
TLB because _start() is called too early and _end() too
late.

In the AMD IOMMUv2 driver this is currently implemented by
assigning an empty page-table to the external device between
_start() and _end(). But as tests have shown this doesn't
work as external devices don't re-fault infinitly but enter
a failure state after some time.

Next problem with this solution is that it causes an
interrupt storm for IO page faults to be handled when an
empty page-table is assigned.

To solve this situation I wrote a patch-set to introduce a
new notifier call-back: mmu_notifer_invalidate_range(). This
notifier lifts the strict requirements that no new
references are taken in the range between _start() and
_end(). When the subsystem can't guarantee that any new
references are taken is has to provide the
invalidate_range() call-back to clear any new references in
there.

It is called between invalidate_range_start() and _end()
every time the VMM has to wipe out any references to a
couple of pages. This are usually the places where the CPU
TLBs are flushed too and where its important that this
happens before invalidate_range_end() is called.

Any comments and review appreciated!

Thanks,

Joerg

Joerg Roedel (3):
  mmu_notifier: Add mmu_notifier_invalidate_range()
  mmu_notifier: Call mmu_notifier_invalidate_range() from VMM
  mmu_notifier: Add the call-back for mmu_notifier_invalidate_range()

 include/linux/mmu_notifier.h | 66 
 kernel/events/uprobes.c  |  2 +-
 mm/fremap.c  |  2 +-
 mm/huge_memory.c |  9 +++---
 mm/hugetlb.c |  7 -
 mm/ksm.c |  4 +--
 mm/memory.c  |  3 +-
 mm/migrate.c |  3 +-
 mm/mmu_notifier.c| 15 ++
 mm/rmap.c|  2 +-
 10 files changed, 95 insertions(+), 18 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] mmu_notifier: Call mmu_notifier_invalidate_range() from VMM

2014-07-24 Thread Joerg Roedel
From: Joerg Roedel 

Add calls to the new mmu_notifier_invalidate_range()
function to all places if the VMM that need it.

Signed-off-by: Joerg Roedel 
---
 include/linux/mmu_notifier.h | 28 
 kernel/events/uprobes.c  |  2 +-
 mm/fremap.c  |  2 +-
 mm/huge_memory.c |  9 +
 mm/hugetlb.c |  7 ++-
 mm/ksm.c |  4 ++--
 mm/memory.c  |  3 ++-
 mm/migrate.c |  3 ++-
 mm/rmap.c|  2 +-
 9 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index f333668..6959dc8 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -273,6 +273,32 @@ static inline void mmu_notifier_mm_destroy(struct 
mm_struct *mm)
__young;\
 })
 
+#defineptep_clear_flush_notify(__vma, __address, __ptep)   
\
+({ \
+   unsigned long ___addr = __address & PAGE_MASK;  \
+   struct mm_struct *___mm = (__vma)->vm_mm;   \
+   pte_t ___pte;   \
+   \
+   ___pte = ptep_clear_flush(__vma, __address, __ptep);\
+   mmu_notifier_invalidate_range(___mm, ___addr,   \
+   ___addr + PAGE_SIZE);   \
+   \
+   ___pte; \
+})
+
+#define pmdp_clear_flush_notify(__vma, __haddr, __pmd) \
+({ \
+   unsigned long ___haddr = __haddr & HPAGE_PMD_MASK;  \
+   struct mm_struct *___mm = (__vma)->vm_mm;   \
+   pmd_t ___pmd;   \
+   \
+   ___pmd = pmdp_clear_flush(__vma, __haddr, __pmd);   \
+   mmu_notifier_invalidate_range(___mm, ___haddr,  \
+ ___haddr + HPAGE_PMD_SIZE);   \
+   \
+   ___pmd; \
+})
+
 /*
  * set_pte_at_notify() sets the pte _after_ running the notifier.
  * This is safe to start by updating the secondary MMUs, because the primary 
MMU
@@ -346,6 +372,8 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct 
*mm)
 
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
+#defineptep_clear_flush_notify ptep_clear_flush
+#define pmdp_clear_flush_notify pmdp_clear_flush
 #define set_pte_at_notify set_pte_at
 
 #endif /* CONFIG_MMU_NOTIFIER */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 6f3254e..642262d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -186,7 +186,7 @@ static int __replace_page(struct vm_area_struct *vma, 
unsigned long addr,
}
 
flush_cache_page(vma, addr, pte_pfn(*ptep));
-   ptep_clear_flush(vma, addr, ptep);
+   ptep_clear_flush_notify(vma, addr, ptep);
set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
 
page_remove_rmap(page);
diff --git a/mm/fremap.c b/mm/fremap.c
index 72b8fa3..9129013 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -37,7 +37,7 @@ static void zap_pte(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
if (pte_present(pte)) {
flush_cache_page(vma, addr, pte_pfn(pte));
-   pte = ptep_clear_flush(vma, addr, ptep);
+   pte = ptep_clear_flush_notify(vma, addr, ptep);
page = vm_normal_page(vma, addr, pte);
if (page) {
if (pte_dirty(pte))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 33514d8..b322c97 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1031,7 +1031,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct 
*mm,
goto out_free_pages;
VM_BUG_ON_PAGE(!PageHead(page), page);
 
-   pmdp_clear_flush(vma, haddr, pmd);
+   pmdp_clear_flush_notify(vma, haddr, pmd);
/* leave pmd empty until pte is filled */
 
pgtable = pgtable_trans_huge_withdraw(mm, pmd);
@@ -1168,7 +1168,7 @@ alloc:
pmd_t entry;
entry = mk_huge_pmd(new_page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
-   pmdp_clear_flush(vma, haddr, pmd);
+   pmdp_clear_flush_notify(vma, haddr, pmd);
page

[PATCH 1/3] mmu_notifier: Add mmu_notifier_invalidate_range()

2014-07-24 Thread Joerg Roedel
From: Joerg Roedel 

This notifier closes an important gap with the current
invalidate_range_start()/end() notifiers. The _start() part
is called when all pages are still mapped while the _end()
notifier is called when all pages are potentially unmapped
and already freed.

This does not allow to manage external (non-CPU) hardware
TLBs with MMU-notifiers because there is no way to prevent
that hardware will establish new TLB entries between the
calls of these two functions. But this is a requirement to
the subsytem that implements these existing notifiers.

To allow managing external TLBs the MMU-notifiers need to
catch the moment when pages are unmapped but not yet freed.
This new notifier catches that moment and notifies the
interested subsytem when pages that were unmapped are about
to be freed. The new notifier will only be called between
invalidate_range_start()/end().

Signed-off-by: Joerg Roedel 
---
 include/linux/mmu_notifier.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index deca874..f333668 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -228,6 +228,11 @@ static inline void 
mmu_notifier_invalidate_range_start(struct mm_struct *mm,
__mmu_notifier_invalidate_range_start(mm, start, end);
 }
 
+static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+}
+
 static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
  unsigned long start, unsigned long end)
 {
@@ -321,6 +326,11 @@ static inline void 
mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 {
 }
 
+static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+}
+
 static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
  unsigned long start, unsigned long end)
 {
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] mmu_notifier: Add the call-back for mmu_notifier_invalidate_range()

2014-07-24 Thread Joerg Roedel
From: Joerg Roedel 

Now that the mmu_notifier_invalidate_range() calls are in
place, add the call-back to allow subsystems to register
against it.

Signed-off-by: Joerg Roedel 
---
 include/linux/mmu_notifier.h | 28 ++--
 mm/mmu_notifier.c| 15 +++
 2 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 6959dc8..50dc679 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -95,11 +95,11 @@ struct mmu_notifier_ops {
/*
 * invalidate_range_start() and invalidate_range_end() must be
 * paired and are called only when the mmap_sem and/or the
-* locks protecting the reverse maps are held. The subsystem
-* must guarantee that no additional references are taken to
-* the pages in the range established between the call to
-* invalidate_range_start() and the matching call to
-* invalidate_range_end().
+* locks protecting the reverse maps are held. If the subsystem
+* can't guarantee that no additional references are taken to
+* the pages in the range, it has to implement the
+* invalidate_range() notifier to remove any references taken
+* after invalidate_range_start().
 *
 * Invalidation of multiple concurrent ranges may be
 * optionally permitted by the driver. Either way the
@@ -110,9 +110,19 @@ struct mmu_notifier_ops {
 * invalidate_range_start() is called when all pages in the
 * range are still mapped and have at least a refcount of one.
 *
+* invalidate_range() is called between invalidate_range_start()
+* and invalidate_range_end() when the memory management code
+* removed mappings to pages in the range and is about to free
+* them.  This captures the point when pages are unmapped but
+* not yet freed.
+* Note that invalidate_range() might be called only on a
+* sub-range of the range passed to the corresponding
+* invalidate_range_start() call.
+*
 * invalidate_range_end() is called when all pages in the
 * range have been unmapped and the pages have been freed by
-* the VM.
+* the VM. It might be called under the ptl spin-lock, so this
+* notifier is not allowed to preempt.
 *
 * The VM will remove the page table entries and potentially
 * the page between invalidate_range_start() and
@@ -138,6 +148,8 @@ struct mmu_notifier_ops {
void (*invalidate_range_start)(struct mmu_notifier *mn,
   struct mm_struct *mm,
   unsigned long start, unsigned long end);
+   void (*invalidate_range)(struct mmu_notifier *mn, struct mm_struct *mm,
+unsigned long start, unsigned long end);
void (*invalidate_range_end)(struct mmu_notifier *mn,
 struct mm_struct *mm,
 unsigned long start, unsigned long end);
@@ -182,6 +194,8 @@ extern void __mmu_notifier_invalidate_page(struct mm_struct 
*mm,
  unsigned long address);
 extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
 extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
  unsigned long start, unsigned long end);
 
@@ -231,6 +245,8 @@ static inline void 
mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
  unsigned long start, unsigned long end)
 {
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_range(mm, start, end);
 }
 
 static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 41cefdf..d1bdea0 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -165,6 +165,21 @@ void __mmu_notifier_invalidate_range_start(struct 
mm_struct *mm,
 }
 EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
 
+void __mmu_notifier_invalidate_range(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+   struct mmu_notifier *mn;
+   int id;
+
+   id = srcu_read_lock(&srcu);
+   hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
+   if (mn->ops->invalidate_range)
+   mn->ops->invalidate_range(mn, mm, start, end);
+   }
+   srcu_read_unlock(&srcu, id);
+}
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range);
+
 void __mmu_not

Re: [PATCH v2 1/1] iommu-api: Add map_range/unmap_range functions

2014-07-24 Thread Thierry Reding
On Wed, Jul 23, 2014 at 10:49:55AM -0700, Olav Haugan wrote:
> On 7/22/2014 12:45 AM, Thierry Reding wrote:
> > On Mon, Jul 21, 2014 at 05:59:22PM -0700, Olav Haugan wrote:
> >> On 7/17/2014 1:21 AM, Thierry Reding wrote:
> >>> On Wed, Jul 16, 2014 at 06:01:57PM -0700, Olav Haugan wrote:
> > [...]
>  Additionally, the mapping operation would be faster in general since
>  clients does not have to keep calling map API over and over again for
>  each physically contiguous chunk of memory that needs to be mapped to a
>  virtually contiguous region.
> 
>  Signed-off-by: Olav Haugan 
>  ---
>   drivers/iommu/iommu.c | 48 
>  
>   include/linux/iommu.h | 25 +
>   2 files changed, 73 insertions(+)
> 
>  diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>  index 1698360..a0eebb7 100644
>  --- a/drivers/iommu/iommu.c
>  +++ b/drivers/iommu/iommu.c
>  @@ -1089,6 +1089,54 @@ size_t iommu_unmap(struct iommu_domain *domain, 
>  unsigned long iova, size_t size)
>   EXPORT_SYMBOL_GPL(iommu_unmap);
>   
>   
>  +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
> >>>
> >>> Maybe iova should be dma_addr_t? Or at least unsigned long? And perhaps
> >>> iommu_map_sg() would be more consistent with the equivalent function in
> >>> struct dma_ops?
> >>>
>  +struct scatterlist *sg, unsigned int len, int opt)
> >>>
> >>> The length argument seems to be the size of the mapping. Again, the
> >>> struct dma_ops function uses this argument to denote the number of
> >>> entries in the scatterlist.
> >>>
> >>> opt is somewhat opaque. Perhaps this should be turned into unsigned long
> >>> flags? Although given that there aren't any users yet it's difficult to
> >>> say what's best here. Perhaps the addition of this argument should be
> >>> postponed until there are actual users?
> >>
> >> I am thinking something like this:
> >>
> >> int iommu_map_sg(struct iommu_domain *domain, struct scatterlist *sg,
> >> unsigned int nents, int prot, unsigned long flags);
> >> int iommu_unmap_sg(struct iommu_domain *domain, struct scatterlist *sg,
> >> unsigned int nents, unsigned long flags);
> > 
> > Looks good.
> > 
> >> The iova is contained within sg so we don't need that argument really
> > 
> > I'm not sure. I think a common use-case for this function is for some
> > driver to map an imported DMA-BUF. In that case you'll get a struct
> > sg_table, but I think it won't have sg.dma_address set to anything
> > useful. So if we don't have iova as a parameter to this function, the
> > driver would have to make it a two-step process, like this:
> > 
> > sg_dma_address(sg) = iova;
> > 
> > err = iommu_map_sg(...);
> > 
> > And drivers that use the IOMMU API directly need to manage IOVA space
> > themselves anyway, so I think passing around the IOVA within the SG
> > won't be a very common case. It will almost always be the driver that
> > calls this function which allocates the IOVA range.
> 
> Yes, I see your point. Rob is not a fan either...
> So what about this:
> 
> int iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct
> scatterlist *sg, unsigned int nents, int prot, unsigned long flags);
> int iommu_unmap_sg(struct iommu_domain *domain, unsigned long iova,
> size_t size, unsigned long flags);
> 
> No need for sg in the unmap call then. Keeping iova an unsigned long to
> match the existing iommu_map/iommu_unmap calls.

Looks good to me. I think we may want to eventually convert iova to
dma_addr_t since that's a more appropriate type for these addresses but
we can do that in a separate patch later on.

[...]
> > For .map_sg() I think pretty much every driver will want to implement
> > it, so requiring developers to explicitly set it for their driver seems
> > like a good idea. If there's no advantage in implementing it then they
> > can always get the same functionality by explicitly using the fallback.
> 
> I feel that requiring drivers to set the default callback defeats the
> purpose of having a fallback in the first place. The reason to provide
> the default fallback is to catch any driver that does not implement this
> themselves.

Certainly, but the exact same can be achieved by making all drivers
point to the generic implementation. In my opinion that makes it much
more explicit (and therefore obvious) what's really going on. Hiding
fallbacks in the core API obfuscates.

And this is all in-tree code, so we have full control over what we
change, so the patch that introduces this new API could simply make the
necessary adjustments to existing drivers.

Thierry


pgpQvCRf9H5xM.pgp
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/1] iommu-api: Add map_range/unmap_range functions

2014-07-24 Thread Thierry Reding
On Thu, Jul 24, 2014 at 11:34:27AM +0200, Joerg Roedel wrote:
> On Wed, Jul 23, 2014 at 10:49:55AM -0700, Olav Haugan wrote:
> > Joerg, can you comment on what you envisioned when you suggested that we
> > add the fallback?
> > 
> 
> The problem is that we already have tons of IOMMU drivers in the tree
> which don't provide these call-backs. So adding this API extension
> without a fall-back that works for these drivers too would fragment the
> functionality between different IOMMU drivers in an inacceptable way and
> undermine the purpose of a generic API.

But we only care about in-tree drivers anyway, so we can equally well
just point all drivers to the generic implementation in the same patch
that adds this new function. The end result will be the same, but it
will keep the core function simpler (and more consistent with the other
core functions).

Thierry


pgpGmikQGfy4A.pgp
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/1] iommu-api: Add map_range/unmap_range functions

2014-07-24 Thread Joerg Roedel
On Wed, Jul 23, 2014 at 10:49:55AM -0700, Olav Haugan wrote:
> Joerg, can you comment on what you envisioned when you suggested that we
> add the fallback?
> 

The problem is that we already have tons of IOMMU drivers in the tree
which don't provide these call-backs. So adding this API extension
without a fall-back that works for these drivers too would fragment the
functionality between different IOMMU drivers in an inacceptable way and
undermine the purpose of a generic API.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu