[PATCH] dma-debug: Only skip one stackframe entry

2019-04-10 Thread Scott Wood
With skip set to 1, I get a traceback like this:

[  106.867637] DMA-API: Mapped at:
[  106.870784]  afu_dma_map_region+0x2cd/0x4f0 [dfl_afu]
[  106.875839]  afu_ioctl+0x258/0x380 [dfl_afu]
[  106.880108]  do_vfs_ioctl+0xa9/0x720
[  106.883688]  ksys_ioctl+0x60/0x90
[  106.887007]  __x64_sys_ioctl+0x16/0x20

With the previous value of 2, afu_dma_map_region was being omitted.  I
suspect that the code paths have simply changed since the value of 2 was
chosen a decade ago, but it's also possible that it varies based on which
mapping function was used, compiler inlining choices, etc.  In any case,
it's best to err on the side of skipping less.

Signed-off-by: Scott Wood 
---
 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 45d51e8e26f6..a218e43cc382 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -706,7 +706,7 @@ static struct dma_debug_entry *dma_entry_alloc(void)
 #ifdef CONFIG_STACKTRACE
entry->stacktrace.max_entries = DMA_DEBUG_STACKTRACE_ENTRIES;
entry->stacktrace.entries = entry->st_entries;
-   entry->stacktrace.skip = 2;
+   entry->stacktrace.skip = 1;
save_stack_trace(>stacktrace);
 #endif
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 16/20] powerpc/dma: use dma_direct_{alloc,free}

2018-08-27 Thread Scott Wood
On Thu, 2018-08-09 at 10:52 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-07-30 at 18:38 +0200, Christoph Hellwig wrote:
> > These do the same functionality as the existing helpers, but do it
> > simpler, and also allow the (optional) use of CMA.
> > 
> > Note that the swiotlb code now calls into the dma_direct code directly,
> > given that it doesn't work with noncoherent caches at all, and isn't
> > called
> > when we have an iommu either, so the iommu special case in
> > dma_nommu_alloc_coherent isn't required for swiotlb.
> 
> I am not convinced that this will produce the same results due to
> the way the zone picking works.
> 
> As for the interaction with swiotlb, we'll need the FSL guys to have
> a look. Scott, do you remember what this is about ?

dma_direct_alloc() has similar (though not identical[1]) zone picking, so I
think it will work.  Needs testing though, and I no longer have a book3e
machine with a PCIe card in it.

The odd thing about this platform (fsl book3e) is the 31-bit[2] limitation on
PCI.  We currently use ZONE_DMA32 for this, rather than ZONE_DMA, at Ben's
request[3].  dma_direct_alloc() regards ZONE_DMA32 as being fixed at 32-bits,
but it doesn't really matter as long as limit_zone_pfn() still works, and the
allocation is made below 2 GiB.  If we were to switch to ZONE_DMA, and have
both 31-bit and 32-bit zones, then dma_direct_alloc() would have a problem
knowing when to use the 31-bit zone since it's based on a non-power-of-2 limit
that isn't reflected in the dma mask.

-Scott

[1] The logic in dma_direct_alloc() seems wrong -- the zone should need to fit
in the mask, not the other way around.  If ARCH_ZONE_DMA_BITS is 24, then
0x007f should be a failure rather than GFP_DMA, 0x7fff should be
GFP_DMA rather than GFP_DMA32, and 0x3 should be GFP_DMA32 rather than
an unrestricted allocation (in each case assuming that the end of RAM is
beyond the mask).

[2] The actual limit is closer to 4 GiB, but not quite due to special windows.
 swiotlb still uses the real limit when deciding whether to bounce, so the dma
mask is still 32 bits.

[3] https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-July/099593.html

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/10 v2] iommu/amd: lock splitting & GFP_KERNEL allocation

2018-03-19 Thread Scott Wood
On Mon, 2018-03-19 at 13:15 +0100, Sebastian Andrzej Siewior wrote:
> On 2018-03-17 16:43:39 [-0500], Scott Wood wrote:
> > If that's worth the lock dropping then fine (though why does only
> > one
> > of the two allocations use GFP_KERNEL?), but it doesn't need to be
> > a
> 
> That was a mistake, I planned to keep both as GFP_KERNEL.
> 
> > raw lock if the non-allocating users are separated.  Keeping them
> > separate will also preserve the WARNs if we somehow end up in an
> > atomic
> > context with no table (versus relying on atomic sleep debugging
> > that
> > may or may not be enabled), and make the code easier to understand
> > by
> > being explicit about which functions can be used from RT-atomic
> > context.
> 
> That separated part is okay. We could keep it. However, I am not sure
> if
> looking at the table irq_lookup_table[devid] without the lock is
> okay.
> The pointer is assigned without DTE entry/iommu-flush to be
> completed. 
> This does not look "okay".

Those callers are getting the devid from an irq_2_irte struct, which
was set up in irq_remapping_alloc() after get/alloc_irq_table() is
completed.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/10 v2] iommu/amd: lock splitting & GFP_KERNEL allocation

2018-03-17 Thread Scott Wood
On Sat, 2018-03-17 at 22:10 +0100, Sebastian Andrzej Siewior wrote:
> On 2018-03-17 14:49:54 [-0500], Scott Wood wrote:
> > On Fri, 2018-03-16 at 21:18 +0100, Sebastian Andrzej Siewior wrote:
> > > The goal here is to make the memory allocation in get_irq_table()
> > > not
> > > with disabled interrupts and having as little raw_spin_lock as
> > > possible
> > > while having them if the caller is also holding one (like desc-
> > > >lock
> > > during IRQ-affinity changes).
> > > I reverted one patch one patch in the iommu while rebasing since
> > > it
> > > make job easier.
> > 
> > If the goal is to have "as little raw_spin_lock as possible" -- and
> > presumably also to avoid unnecessary complexity -- wouldn't it be
> > better to leave my patch in, and drop patches 4 and 9?
> 
> 9 gives me GFP_KERNEL instead atomic so no.
> 4 is needed I think but I could double check on Monday. 

If that's worth the lock dropping then fine (though why does only one
of the two allocations use GFP_KERNEL?), but it doesn't need to be a
raw lock if the non-allocating users are separated.  Keeping them
separate will also preserve the WARNs if we somehow end up in an atomic
context with no table (versus relying on atomic sleep debugging that
may or may not be enabled), and make the code easier to understand by
being explicit about which functions can be used from RT-atomic
context.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/10 v2] iommu/amd: lock splitting & GFP_KERNEL allocation

2018-03-17 Thread Scott Wood
On Fri, 2018-03-16 at 21:18 +0100, Sebastian Andrzej Siewior wrote:
> The goal here is to make the memory allocation in get_irq_table() not
> with disabled interrupts and having as little raw_spin_lock as
> possible
> while having them if the caller is also holding one (like desc->lock
> during IRQ-affinity changes).
> I reverted one patch one patch in the iommu while rebasing since it
> make job easier.

If the goal is to have "as little raw_spin_lock as possible" -- and
presumably also to avoid unnecessary complexity -- wouldn't it be
better to leave my patch in, and drop patches 4 and 9?

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/amd: Avoid locking get_irq_table() from atomic context

2018-02-14 Thread Scott Wood
get_irq_table() previously acquired amd_iommu_devtable_lock which is not
a raw lock, and thus cannot be acquired from atomic context on
PREEMPT_RT.  Many calls to modify_irte*() come from atomic context due to
the IRQ desc->lock, as does amd_iommu_update_ga() due to the preemption
disabling in vcpu_load/put().

The only difference between calling get_irq_table() and reading from
irq_lookup_table[] directly, other than the lock acquisition and
amd_iommu_rlookup_table[] check, is if the table entry is unpopulated,
which should never happen when looking up a devid that came from an
irq_2_irte struct, as get_irq_table() would have already been called on
that devid during irq_remapping_alloc().

The lock acquisition is not needed in these cases because entries in
irq_lookup_table[] never change once non-NULL -- nor would the
amd_iommu_devtable_lock usage in get_irq_table() provide meaningful
protection if they did, since it's released before using the looked up
table in the get_irq_table() caller.

Rename the old get_irq_table() to alloc_irq_table(), and create a new
lockless get_irq_table() to be used in non-allocating contexts that WARNs
if it doesn't find what it's looking for.

Signed-off-by: Scott Wood <sw...@redhat.com>
---
v2: Added new get_irq_table() with WARNs rather than accessing
irq_lookup_table[] directly.
---
 drivers/iommu/amd_iommu.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e4026133aa1d..5d41e0733cb3 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3594,7 +3594,22 @@ static void set_dte_irq_entry(u16 devid, struct 
irq_remap_table *table)
amd_iommu_dev_table[devid].data[2] = dte;
 }
 
-static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic)
+static struct irq_remap_table *get_irq_table(u16 devid)
+{
+   struct irq_remap_table *table;
+
+   if (WARN_ONCE(!amd_iommu_rlookup_table[devid],
+ "%s: no iommu for devid %x\n", __func__, devid))
+   return NULL;
+
+   table = irq_lookup_table[devid];
+   if (WARN_ONCE(!table, "%s: no table for devid %x\n", __func__, devid))
+   return NULL;
+
+   return table;
+}
+
+static struct irq_remap_table *alloc_irq_table(u16 devid, bool ioapic)
 {
struct irq_remap_table *table = NULL;
struct amd_iommu *iommu;
@@ -3681,7 +3696,7 @@ static int alloc_irq_index(u16 devid, int count, bool 
align)
if (!iommu)
return -ENODEV;
 
-   table = get_irq_table(devid, false);
+   table = alloc_irq_table(devid, false);
if (!table)
return -ENODEV;
 
@@ -3732,7 +3747,7 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
if (iommu == NULL)
return -EINVAL;
 
-   table = get_irq_table(devid, false);
+   table = get_irq_table(devid);
if (!table)
return -ENOMEM;
 
@@ -3765,7 +3780,7 @@ static int modify_irte(u16 devid, int index, union irte 
*irte)
if (iommu == NULL)
return -EINVAL;
 
-   table = get_irq_table(devid, false);
+   table = get_irq_table(devid);
if (!table)
return -ENOMEM;
 
@@ -3789,7 +3804,7 @@ static void free_irte(u16 devid, int index)
if (iommu == NULL)
return;
 
-   table = get_irq_table(devid, false);
+   table = get_irq_table(devid);
if (!table)
return;
 
@@ -4107,7 +4122,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
return ret;
 
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
-   if (get_irq_table(devid, true))
+   if (alloc_irq_table(devid, true))
index = info->ioapic_pin;
else
ret = -ENOMEM;
@@ -4390,7 +4405,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
if (!iommu)
return -ENODEV;
 
-   irt = get_irq_table(devid, false);
+   irt = get_irq_table(devid);
if (!irt)
return -ENODEV;
 
-- 
2.14.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Don't use dev_data in irte_ga_set_affinity()

2018-01-28 Thread Scott Wood
search_dev_data() acquires a non-raw lock, which can't be done
from atomic context on PREEMPT_RT.  There is no need to look at
dev_data because guest_mode should never be set if use_vapic is
not set.

Signed-off-by: Scott Wood <sw...@redhat.com>
---
This is a followup to the patches below:

https://patchwork.codeaurora.org/patch/433611/
https://patchwork.codeaurora.org/patch/433613/

 drivers/iommu/amd_iommu.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 84e99097dfe3..a933c26df652 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3861,10 +3861,8 @@ static void irte_ga_set_affinity(void *entry, u16 devid, 
u16 index,
 u8 vector, u32 dest_apicid)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
-   struct iommu_dev_data *dev_data = search_dev_data(devid);
 
-   if (!dev_data || !dev_data->use_vapic ||
-   !irte->lo.fields_remap.guest_mode) {
+   if (!irte->lo.fields_remap.guest_mode) {
irte->hi.fields.vector = vector;
irte->lo.fields_remap.destination = dest_apicid;
modify_irte_ga(devid, index, irte, NULL);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] amd/iommu: Use raw locks on atomic context paths

2018-01-21 Thread Scott Wood
Several functions in this driver are called from atomic context,
and thus raw locks must be used in order to be safe on PREEMPT_RT.

This includes paths that must wait for command completion, which is
a potential PREEMPT_RT latency concern but not easily avoidable.

Signed-off-by: Scott Wood <sw...@redhat.com>
---
 drivers/iommu/amd_iommu.c   | 30 +++---
 drivers/iommu/amd_iommu_init.c  |  2 +-
 drivers/iommu/amd_iommu_types.h |  4 ++--
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8ead1b296d09..213f5a796ae5 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1055,9 +1055,9 @@ static int iommu_queue_command_sync(struct amd_iommu 
*iommu,
unsigned long flags;
int ret;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
ret = __iommu_queue_command_sync(iommu, cmd, sync);
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
return ret;
 }
@@ -1083,7 +1083,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu)
 
build_completion_wait(, (u64)>cmd_sem);
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
iommu->cmd_sem = 0;
 
@@ -1094,7 +1094,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu)
ret = wait_on_sem(>cmd_sem);
 
 out_unlock:
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
return ret;
 }
@@ -3626,7 +3626,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, 
bool ioapic)
goto out_unlock;
 
/* Initialize table spin-lock */
-   spin_lock_init(>lock);
+   raw_spin_lock_init(>lock);
 
if (ioapic)
/* Keep the first 32 indexes free for IOAPIC interrupts */
@@ -3688,7 +3688,7 @@ static int alloc_irq_index(u16 devid, int count, bool 
align)
if (align)
alignment = roundup_pow_of_two(count);
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
/* Scan table for free entries */
for (index = ALIGN(table->min_index, alignment), c = 0;
@@ -3715,7 +3715,7 @@ static int alloc_irq_index(u16 devid, int count, bool 
align)
index = -ENOSPC;
 
 out:
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
return index;
 }
@@ -3736,7 +3736,7 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
if (!table)
return -ENOMEM;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
entry = (struct irte_ga *)table->table;
entry = [index];
@@ -3747,7 +3747,7 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
if (data)
data->ref = entry;
 
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
iommu_flush_irt(iommu, devid);
iommu_completion_wait(iommu);
@@ -3769,9 +3769,9 @@ static int modify_irte(u16 devid, int index, union irte 
*irte)
if (!table)
return -ENOMEM;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
table->table[index] = irte->val;
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
iommu_flush_irt(iommu, devid);
iommu_completion_wait(iommu);
@@ -3793,9 +3793,9 @@ static void free_irte(u16 devid, int index)
if (!table)
return;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
iommu->irte_ops->clear_allocated(table, index);
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
iommu_flush_irt(iommu, devid);
iommu_completion_wait(iommu);
@@ -4396,7 +4396,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
if (!irt)
return -ENODEV;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
if (ref->lo.fields_vapic.guest_mode) {
if (cpu >= 0)
@@ -4405,7 +4405,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
barrier();
}
 
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
iommu_flush_irt(iommu, devid);
iommu_completion_wait(iommu);
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6fe2d0346073..e3cd81b32a33 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1474,7 +1474,7 @@ static int __init init_iommu_one(st

[PATCH 1/2] iommu/amd: Avoid get_irq_table() from atomic context

2018-01-21 Thread Scott Wood
get_irq_table() acquires amd_iommu_devtable_lock which is not a raw lock,
and thus cannot be acquired from atomic context on PREEMPT_RT.  Many
calls to modify_irte*() come from atomic context due to the IRQ
desc->lock, as does amd_iommu_update_ga() due to the preemption disabling
in vcpu_load/put().

The only difference between calling get_irq_table() and reading from
irq_lookup_table[] directly, other than the lock acquisition and
amd_iommu_rlookup_table[] check, is if the table entry is unpopulated,
which should never happen when looking up a devid that came from an
irq_2_irte struct, as get_irq_table() would have already been called on
that devid during irq_remapping_alloc().

The lock acquisition is not needed in these cases because entries in
irq_lookup_table[] never change once non-NULL -- nor would the
amd_iommu_devtable_lock usage in get_irq_table() provide meaningful
protection if they did, since it's released before using the looked up
table in the get_irq_table() caller.

The amd_iommu_rlookup_table[] check is not needed because
irq_lookup_table[devid] should never be non-NULL if
amd_iommu_rlookup_table[devid] is NULL.

Signed-off-by: Scott Wood <sw...@redhat.com>
---
 drivers/iommu/amd_iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index dc4b73833419..8ead1b296d09 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3732,7 +3732,7 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
if (iommu == NULL)
return -EINVAL;
 
-   table = get_irq_table(devid, false);
+   table = irq_lookup_table[devid];
if (!table)
return -ENOMEM;
 
@@ -3765,7 +3765,7 @@ static int modify_irte(u16 devid, int index, union irte 
*irte)
if (iommu == NULL)
return -EINVAL;
 
-   table = get_irq_table(devid, false);
+   table = irq_lookup_table[devid];
if (!table)
return -ENOMEM;
 
@@ -3789,7 +3789,7 @@ static void free_irte(u16 devid, int index)
if (iommu == NULL)
return;
 
-   table = get_irq_table(devid, false);
+   table = irq_lookup_table[devid];
if (!table)
return;
 
@@ -4392,7 +4392,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
if (!iommu)
return -ENODEV;
 
-   irt = get_irq_table(devid, false);
+   irt = irq_lookup_table[devid];
if (!irt)
return -ENODEV;
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v16, 0/7] Fix eSDHC host version register bug

2016-11-09 Thread Scott Wood
On Thu, 2016-11-10 at 04:11 +, Y.B. Lu wrote:
> > 
> > -Original Message-
> > From: Y.B. Lu
> > Sent: Thursday, November 10, 2016 12:06 PM
> > To: 'Scott Wood'; Ulf Hansson
> > Cc: linux-mmc; Arnd Bergmann; linuxppc-...@lists.ozlabs.org;
> > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> > ker...@vger.kernel.org; linux-clk; iommu@lists.linux-foundation.org;
> > net...@vger.kernel.org; Greg Kroah-Hartman; Mark Rutland; Rob Herring;
> > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> > Sharma; Qiang Zhao; Kumar Gala; Leo Li; X.B. Xie; M.H. Lian
> > Subject: RE: [v16, 0/7] Fix eSDHC host version register bug
> > 
> > > 
> > > -Original Message-
> > > From: linux-mmc-ow...@vger.kernel.org [mailto:linux-mmc-
> > > ow...@vger.kernel.org] On Behalf Of Scott Wood
> > > Sent: Thursday, November 10, 2016 11:55 AM
> > > To: Ulf Hansson; Y.B. Lu
> > > Cc: linux-mmc; Arnd Bergmann; linuxppc-...@lists.ozlabs.org;
> > > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > > linux- ker...@vger.kernel.org; linux-clk;
> > > iommu@lists.linux-foundation.org; net...@vger.kernel.org; Greg
> > > Kroah-Hartman; Mark Rutland; Rob Herring; Russell King; Jochen
> > > Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh Sharma; Qiang Zhao;
> > > Kumar Gala; Leo Li; X.B. Xie; M.H. Lian
> > > Subject: Re: [v16, 0/7] Fix eSDHC host version register bug
> > > 
> > > On Wed, 2016-11-09 at 19:27 +0100, Ulf Hansson wrote:
> > > > 
> > > > - i2c-list
> > > > 
> > > > On 9 November 2016 at 04:14, Yangbo Lu <yangbo...@nxp.com> wrote:
> > > > > 
> > > > > 
> > > > > This patchset is used to fix a host version register bug in the
> > > > > T4240-
> > > > > R1.0-R2.0
> > > > > eSDHC controller. To match the SoC version and revision, 15
> > > > > previous version patchsets had tried many methods but all of them
> > > > > were rejected by reviewers.
> > > > > Such as
> > > > > - dts compatible method
> > > > > - syscon method
> > > > > - ifdef PPC method
> > > > > - GUTS driver getting SVR method Anrd suggested a
> > > > > soc_device_match method in v10, and this is the only available
> > > > > method left now. This v11 patchset introduces the soc_device_match
> > > > > interface in soc driver.
> > > > > 
> > > > > The first four patches of Yangbo are to add the GUTS driver. This
> > > > > is used to register a soc device which contain soc version and
> > > > > revision information.
> > > > > The other three patches introduce the soc_device_match method in
> > > > > soc driver and apply it on esdhc driver to fix this bug.
> > > > > 
> > > > > ---
> > > > > Changes for v15:
> > > > > - Dropped patch 'dt: bindings: update Freescale DCFG
> > > compatible'
> > > > 
> > > > > 
> > > > >   since the work had been done by below patch on
> > > > > ShawnGuo's linux tree.
> > > > >   'dt-bindings: fsl: add LS1043A/LS1046A/LS2080A
> > > > > compatible for SCFG
> > > > >    and DCFG'
> > > > > - Fixed error code issue in guts driver Changes for v16:
> > > > > - Dropped patch 'powerpc/fsl: move mpc85xx.h to
> > > include/linux/fsl'
> > > > 
> > > > > 
> > > > > - Added a bug-fix patch from Geert
> > > > > ---
> > > > > 
> > > > > Arnd Bergmann (1):
> > > > >   base: soc: introduce soc_device_match() interface
> > > > > 
> > > > > Geert Uytterhoeven (1):
> > > > >   base: soc: Check for NULL SoC device attributes
> > > > > 
> > > > > Yangbo Lu (5):
> > > > >   ARM64: dts: ls2080a: add device configuration node
> > > > >   dt: bindings: move guts devicetree doc out of powerpc directory
> > > > >   soc: fsl: add GUTS driver for QorIQ platforms
> > > > >   MAINTAINERS: add entry for Freescale SoC drivers
> > > > >   mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
> > > > > 
> > > > >  .../bindings/{p

Re: [v16, 0/7] Fix eSDHC host version register bug

2016-11-09 Thread Scott Wood
On Wed, 2016-11-09 at 19:27 +0100, Ulf Hansson wrote:
> - i2c-list
> 
> On 9 November 2016 at 04:14, Yangbo Lu  wrote:
> > 
> > This patchset is used to fix a host version register bug in the T4240-
> > R1.0-R2.0
> > eSDHC controller. To match the SoC version and revision, 15 previous
> > version
> > patchsets had tried many methods but all of them were rejected by
> > reviewers.
> > Such as
> > - dts compatible method
> > - syscon method
> > - ifdef PPC method
> > - GUTS driver getting SVR method
> > Anrd suggested a soc_device_match method in v10, and this is the only
> > available
> > method left now. This v11 patchset introduces the soc_device_match
> > interface in
> > soc driver.
> > 
> > The first four patches of Yangbo are to add the GUTS driver. This is used
> > to
> > register a soc device which contain soc version and revision information.
> > The other three patches introduce the soc_device_match method in soc
> > driver
> > and apply it on esdhc driver to fix this bug.
> > 
> > ---
> > Changes for v15:
> > - Dropped patch 'dt: bindings: update Freescale DCFG compatible'
> >   since the work had been done by below patch on ShawnGuo's linux
> > tree.
> >   'dt-bindings: fsl: add LS1043A/LS1046A/LS2080A compatible for
> > SCFG
> >    and DCFG'
> > - Fixed error code issue in guts driver
> > Changes for v16:
> > - Dropped patch 'powerpc/fsl: move mpc85xx.h to include/linux/fsl'
> > - Added a bug-fix patch from Geert
> > ---
> > 
> > Arnd Bergmann (1):
> >   base: soc: introduce soc_device_match() interface
> > 
> > Geert Uytterhoeven (1):
> >   base: soc: Check for NULL SoC device attributes
> > 
> > Yangbo Lu (5):
> >   ARM64: dts: ls2080a: add device configuration node
> >   dt: bindings: move guts devicetree doc out of powerpc directory
> >   soc: fsl: add GUTS driver for QorIQ platforms
> >   MAINTAINERS: add entry for Freescale SoC drivers
> >   mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
> > 
> >  .../bindings/{powerpc => soc}/fsl/guts.txt |   3 +
> >  MAINTAINERS|  11 +-
> >  arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi |   6 +
> >  drivers/base/Kconfig   |   1 +
> >  drivers/base/soc.c |  70 ++
> >  drivers/mmc/host/Kconfig   |   1 +
> >  drivers/mmc/host/sdhci-of-esdhc.c  |  20 ++
> >  drivers/soc/Kconfig|   3 +-
> >  drivers/soc/fsl/Kconfig|  18 ++
> >  drivers/soc/fsl/Makefile   |   1 +
> >  drivers/soc/fsl/guts.c | 236
> > +
> >  include/linux/fsl/guts.h   | 125 ++-
> >  include/linux/sys_soc.h|   3 +
> >  13 files changed, 447 insertions(+), 51 deletions(-)
> >  rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt
> > (91%)
> >  create mode 100644 drivers/soc/fsl/Kconfig
> >  create mode 100644 drivers/soc/fsl/guts.c
> > 
> > --
> > 2.1.0.27.g96db324
> > 
> Thanks, applied on my mmc tree for next!
> 
> I noticed that some DT compatibles weren't documented, according to
> checkpatch. Please fix that asap!

They are documented, in fsl/guts.txt (the file moved in patch 2/7):
>  - compatible : Should define the compatible device type for
>    global-utilities.
>    Possible compatibles:
> "fsl,qoriq-device-config-1.0"
> "fsl,qoriq-device-config-2.0"
> "fsl,-device-config"
> "fsl,-guts"

Checkpatch doesn't understand compatibles defined in such a way.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [v13, 5/8] soc: fsl: add GUTS driver for QorIQ platforms

2016-10-27 Thread Scott Wood
On Fri, 2016-10-28 at 11:32 +0800, Yangbo Lu wrote:
> + guts->regs = of_iomap(np, 0);
> + if (!guts->regs)
> + return -ENOMEM;
> +
> + /* Register soc device */
> + machine = of_flat_dt_get_machine_name();
> + if (machine)
> + soc_dev_attr.machine = devm_kstrdup(dev, machine,
> GFP_KERNEL);
> +
> + svr = fsl_guts_get_svr();
> + soc_die = fsl_soc_die_match(svr, fsl_soc_die);
> + if (soc_die) {
> + soc_dev_attr.family = devm_kasprintf(dev, GFP_KERNEL,
> +  "QorIQ %s", soc_die-
> >die);
> + } else {
> + soc_dev_attr.family = devm_kasprintf(dev, GFP_KERNEL,
> "QorIQ");
> + }
> + soc_dev_attr.soc_id = devm_kasprintf(dev, GFP_KERNEL,
> +  "svr:0x%08x", svr);
> + soc_dev_attr.revision = devm_kasprintf(dev, GFP_KERNEL, "%d.%d",
> +    SVR_MAJ(svr), SVR_MIN(svr));
> +
> + soc_dev = soc_device_register(_dev_attr);
> + if (IS_ERR(soc_dev))
> + return PTR_ERR(soc_dev);

ioremap leaks on this error path.  Use devm_ioremap_resource().

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [v12, 5/8] soc: fsl: add GUTS driver for QorIQ platforms

2016-10-26 Thread Scott Wood
On Wed, 2016-09-21 at 14:57 +0800, Yangbo Lu wrote:
> diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
> new file mode 100644
> index 000..b99764c
> --- /dev/null
> +++ b/drivers/soc/fsl/Kconfig
> @@ -0,0 +1,19 @@
> +#
> +# Freescale SOC drivers
> +#
> +
> +source "drivers/soc/fsl/qe/Kconfig"
> +
> +config FSL_GUTS
> + bool "Freescale QorIQ GUTS driver"
> + select SOC_BUS
> + help
> +   The global utilities block controls power management, I/O device
> +   enabling, power-onreset(POR) configuration monitoring, alternate
> +   function selection for multiplexed signals,and clock control.
> +   This driver is to manage and access global utilities block.
> +   Initially only reading SVR and registering soc device are
> supported.
> +   Other guts accesses, such as reading RCW, should eventually be
> moved
> +   into this driver as well.
> +
> +   If you want GUTS driver support, you should say Y here.

This is user-enablable without dependencies, which means it will break some
randconfigs.  If this is to be enabled via select then remove the text after
"bool".

> +/* SoC die attribute definition for QorIQ platform */
> +static const struct fsl_soc_die_attr fsl_soc_die[] = {
> +#ifdef CONFIG_PPC
> + /*
> +  * Power Architecture-based SoCs T Series
> +  */
> +
> + /* Die: T4240, SoC: T4240/T4160/T4080 */
> + { .die  = "T4240",
> +   .svr  = 0x8240,
> +   .mask = 0xfff0,
> + },
> + /* Die: T1040, SoC: T1040/T1020/T1042/T1022 */
> + { .die  = "T1040",
> +   .svr  = 0x8520,
> +   .mask = 0xfff0,
> + },
> + /* Die: T2080, SoC: T2080/T2081 */
> + { .die  = "T2080",
> +   .svr  = 0x8530,
> +   .mask = 0xfff0,
> + },
> + /* Die: T1024, SoC: T1024/T1014/T1023/T1013 */
> + { .die  = "T1024",
> +   .svr  = 0x8540,
> +   .mask = 0xfff0,
> + },
> +#endif /* CONFIG_PPC */
> +#if defined(CONFIG_ARCH_MXC) || defined(CONFIG_ARCH_LAYERSCAPE)

Will this driver ever be probed on MXC?  Why do we need these ifdefs at all?


> + /*
> +  * ARM-based SoCs LS Series
> +  */
> +
> + /* Die: LS1043A, SoC: LS1043A/LS1023A */
> + { .die  = "LS1043A",
> +   .svr  = 0x8792,
> +   .mask = 0x,
> + },
> + /* Die: LS2080A, SoC: LS2080A/LS2040A/LS2085A */
> + { .die  = "LS2080A",
> +   .svr  = 0x8701,
> +   .mask = 0xff3f,
> + },
> + /* Die: LS1088A, SoC: LS1088A/LS1048A/LS1084A/LS1044A */
> + { .die  = "LS1088A",
> +   .svr  = 0x8703,
> +   .mask = 0xff3f,
> + },
> + /* Die: LS1012A, SoC: LS1012A */
> + { .die  = "LS1012A",
> +   .svr  = 0x8704,
> +   .mask = 0x,
> + },
> + /* Die: LS1046A, SoC: LS1046A/LS1026A */
> + { .die  = "LS1046A",
> +   .svr  = 0x8707,
> +   .mask = 0x,
> + },
> + /* Die: LS2088A, SoC: LS2088A/LS2048A/LS2084A/LS2044A */
> + { .die  = "LS2088A",
> +   .svr  = 0x8709,
> +   .mask = 0xff3f,
> + },
> + /* Die: LS1021A, SoC: LS1021A/LS1020A/LS1022A
> +  * Note: Put this die at the end in cause of incorrect
> identification
> +  */
> + { .die  = "LS1021A",
> +   .svr  = 0x8700,
> +   .mask = 0xfff0,
> + },
> +#endif /* CONFIG_ARCH_MXC || CONFIG_ARCH_LAYERSCAPE */

Instead of relying on ordering, add more bits to the mask so that there's no
overlap.  I think 0xfff7 would work.

> +out:
> + kfree(soc_dev_attr.machine);
> + kfree(soc_dev_attr.family);
> + kfree(soc_dev_attr.soc_id);
> + kfree(soc_dev_attr.revision);
> + iounmap(guts->regs);
> +out_free:
> + kfree(guts);
> + return ret;
> +}

Please use devm.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms

2016-09-13 Thread Scott Wood
On Tue, 2016-09-13 at 07:23 +, Y.B. Lu wrote:
> > 


> > 
> > -Original Message-
> > From: linux-mmc-ow...@vger.kernel.org [mailto:linux-mmc-
> > ow...@vger.kernel.org] On Behalf Of Scott Wood
> > Sent: Tuesday, September 13, 2016 7:25 AM
> > To: Y.B. Lu; linux-...@vger.kernel.org; ulf.hans...@linaro.org; Arnd
> > Bergmann
> > Cc: linuxppc-...@lists.ozlabs.org; devicet...@vger.kernel.org; linux-arm-
> > ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux-
> > c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux-
> > foundation.org; net...@vger.kernel.org; Mark Rutland; Rob Herring;
> > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> > Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie
> > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
> > 
> > BTW, aren't ls2080a and ls2085a the same die?  And is there no non-E
> > version of LS2080A/LS2040A?
> [Lu Yangbo-B47093] I checked all the svr values in chip errata doc "Revision
> level to part marking cross-reference" table.
> I found ls2080a and ls2085a were in two separate doc. And I didn’t find non-
> E version of LS2080A/LS2040A in chip errata doc.
> Do you know is there any other doc we can confirm this?

No.  Traditionally we've always had E and non-E versions of each chip, but I
have no knowledge of whether that has changed (I do note that the way that E-
status is indicated in SVR has changed).

But please label LS2080A and LS2085A as the same die (or provide strong
evidence that they are not).

> 
> > 
> > 
> > > 
> > > > > 
> > > > > + do {
> > > > > + if (!matches->soc_id)
> > > > > + return NULL;
> > > > > + if (glob_match(svr_match, matches->soc_id))
> > > > > + break;
> > > > > + } while (matches++);
> > > > Are you expecting "matches++" to ever evaluate as false?
> > > [Lu Yangbo-B47093] Yes, this is used to match the soc we use in
> > > qoriq_soc array until getting true.
> > > We need to get the name and die information defined in array.
> > I'm not asking whether the glob_match will ever return true.  I'm saying
> > that "matches++" will never become NULL.
> [Lu Yangbo-B47093] The matches++ will never become NULL while it will return
> NULL after matching for all the members in array.

"matches++" will never "return NULL".  It's just an incrementing address.  It
won't be null until you wrap around the address space, and even if the other
loop terminators never kicked in you'd crash long before that happens.

Please rewrite the loop as something like:

while (matches->soc_id) {
if (glob_match(...))
return matches;

matches++;
}

return NULL;


> > > > > + /* Register soc device */
> > > > > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> > > > > + if (!soc_dev_attr) {
> > > > > + ret = -ENOMEM;
> > > > > + goto out_unmap;
> > > > > + }
> > > > Couldn't this be statically allocated?
> > > [Lu Yangbo-B47093] Do you mean we define this struct statically ?
> > > 
> > > static struct soc_device_attribute soc_dev_attr;
> > Yes.
> > 
> [Lu Yangbo-B47093] It's ok to define it statically. Is there any need to do
> that?

It's simpler.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms

2016-09-12 Thread Scott Wood
On Mon, 2016-09-12 at 06:39 +, Y.B. Lu wrote:
> Hi Scott,
> 
> Thanks for your review :)
> See my comment inline.
> 
> > 
> > -Original Message-
> > From: Scott Wood [mailto:o...@buserror.net]
> > Sent: Friday, September 09, 2016 11:47 AM
> > To: Y.B. Lu; linux-...@vger.kernel.org; ulf.hans...@linaro.org; Arnd
> > Bergmann
> > Cc: linuxppc-...@lists.ozlabs.org; devicet...@vger.kernel.org; linux-arm-
> > ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux-
> > c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux-
> > foundation.org; net...@vger.kernel.org; Mark Rutland; Rob Herring;
> > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> > Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie
> > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
> > 
> > On Tue, 2016-09-06 at 16:28 +0800, Yangbo Lu wrote:
> > > 
> > > The global utilities block controls power management, I/O device
> > > enabling, power-onreset(POR) configuration monitoring, alternate
> > > function selection for multiplexed signals,and clock control.
> > > 
> > > This patch adds a driver to manage and access global utilities block.
> > > Initially only reading SVR and registering soc device are supported.
> > > Other guts accesses, such as reading RCW, should eventually be moved
> > > into this driver as well.
> > > 
> > > Signed-off-by: Yangbo Lu <yangbo...@nxp.com>
> > > Signed-off-by: Scott Wood <o...@buserror.net>
> > Don't put my signoff on patches that I didn't put it on
> > myself.  Definitely don't put mine *after* yours on patches that were
> > last modified by you.
> > 
> > If you want to mention that the soc_id encoding was my suggestion, then
> > do so explicitly.
> > 
> [Lu Yangbo-B47093] I found your 'signoff' on this patch at below link.
> http://patchwork.ozlabs.org/patch/649211/
> 
> So, let me just change the order in next version ?
> Signed-off-by: Scott Wood <o...@buserror.net>
> Signed-off-by: Yangbo Lu <yangbo...@nxp.com>

No.  This isn't my patch so my signoff shouldn't be on it.

> [Lu Yangbo-B47093] It's a good idea to move die into .family I think.
> In my opinion, it's better to keep svr and name in soc_id just like your
> suggestion above.
> > 
> > {
> > .soc_id = "svr:0x85490010,name:T1023E,",
> > .family = "QorIQ T1024",
> > }
> The user probably don’t like to learn the svr value. What they want is just
> to match the soc they use.
> It's convenient to use name+rev for them to match a soc.

What the user should want 99% of the time is to match the die (plus revision),
not the soc.

> Regarding shrinking the table, I think it's hard to use svr+mask. Because I
> find many platforms use different masks.
> We couldn’t know the mask according svr value.

The mask would be part of the table:

{
{
.die = "T1024",
.svr = 0x8540,
.mask = 0xfff0,
},
{
.die = "T1040",
.svr = 0x8520,
.mask = 0xfff0,
},
{
.die = "LS1088A",
.svr = 0x8703,
.mask = 0x,
},
...
}

There's a small risk that we get the mask wrong and a different die is created
that matches an existing table, but it doesn't seem too likely, and can easily
be fixed with a kernel update if it happens.

BTW, aren't ls2080a and ls2085a the same die?  And is there no non-E version
of LS2080A/LS2040A?

> > > + do {
> > > + if (!matches->soc_id)
> > > + return NULL;
> > > + if (glob_match(svr_match, matches->soc_id))
> > > + break;
> > > + } while (matches++);
> > Are you expecting "matches++" to ever evaluate as false?
> [Lu Yangbo-B47093] Yes, this is used to match the soc we use in qoriq_soc
> array until getting true. 
> We need to get the name and die information defined in array.

I'm not asking whether the glob_match will ever return true.  I'm saying that
"matches++" will never become NULL.

> > > + /* Register soc device */
> > > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> > > + if (!soc_dev_attr) {
> > > + ret = -ENOMEM;
> > > + goto out_unmap;
> > > + }
> > Couldn't this be statically allocated?
> [Lu Yangbo-B47093] Do you mean we define this struct statically ?
> 
>

Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms

2016-09-08 Thread Scott Wood
On Tue, 2016-09-06 at 16:28 +0800, Yangbo Lu wrote:
> The global utilities block controls power management, I/O device
> enabling, power-onreset(POR) configuration monitoring, alternate
> function selection for multiplexed signals,and clock control.
> 
> This patch adds a driver to manage and access global utilities block.
> Initially only reading SVR and registering soc device are supported.
> Other guts accesses, such as reading RCW, should eventually be moved
> into this driver as well.
> 
> Signed-off-by: Yangbo Lu <yangbo...@nxp.com>
> Signed-off-by: Scott Wood <o...@buserror.net>

Don't put my signoff on patches that I didn't put it on myself.  Definitely
don't put mine *after* yours on patches that were last modified by you.

If you want to mention that the soc_id encoding was my suggestion, then do so
explicitly.

> +/* SoC attribute definition for QorIQ platform */
> +static const struct soc_device_attribute qoriq_soc[] = {
> +#ifdef CONFIG_PPC
> + /*
> +  * Power Architecture-based SoCs T Series
> +  */
> +
> + /* SoC: T1024/T1014/T1023/T1013 Rev: 1.0 */
> + { .soc_id   = "svr:0x85400010,name:T1024,die:T1024",
> +   .revision = "1.0",
> + },
> + { .soc_id   = "svr:0x85480010,name:T1024E,die:T1024",
> +   .revision = "1.0",
> + },

Revision could be computed from the low 8 bits of SVR (just as you do for 
unknown SVRs).

We could move the die name into .family:

{
.soc_id = "svr:0x85490010,name:T1023E,",
.family = "QorIQ T1024",
}

I see you dropped svre (and the trailing comma), though I guess the vast
majority of potential users will be looking at .family.  In which case do we
even need name?  If we just make the soc_id be "svr:0x" then we could
shrink the table to an svr+mask that identifies each die.  I'd still want to
keep the "svr:" even if we're giving up on the general tagging system, to make
it clear what the number refers to, and to provide some defense against users
who match only against soc_id rather than soc_id+family.  Or we could go
further and format soc_id as "QorIQ SVR 0x" so that soc_id-only
matches are fully acceptable rather than just less dangerous.

> +static const struct soc_device_attribute *fsl_soc_device_match(
> + unsigned int svr, const struct soc_device_attribute *matches)
> +{
> + char svr_match[50];
> + int n;
> +
> + n = sprintf(svr_match, "*%08x*", svr);

n = sprintf(svr_match, "svr:0x%08x,*", svr);

(according to the current encoding)

> +
> + do {
> + if (!matches->soc_id)
> + return NULL;
> + if (glob_match(svr_match, matches->soc_id))
> + break;
> + } while (matches++);

Are you expecting "matches++" to ever evaluate as false?

> + /* Register soc device */
> + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> + if (!soc_dev_attr) {
> + ret = -ENOMEM;
> + goto out_unmap;
> + }

Couldn't this be statically allocated?

> +
> + machine = of_flat_dt_get_machine_name();
> + if (machine)
> + soc_dev_attr->machine = kasprintf(GFP_KERNEL, "%s",
> machine);
> +
> + soc_dev_attr->family = kasprintf(GFP_KERNEL, "QorIQ");
> +
> + svr = fsl_guts_get_svr();
> + fsl_soc = fsl_soc_device_match(svr, qoriq_soc);
> + if (fsl_soc) {
> + soc_dev_attr->soc_id = kasprintf(GFP_KERNEL, "%s",
> +  fsl_soc->soc_id);

You can use kstrdup() if you're just copying the string as is.

> + soc_dev_attr->revision = kasprintf(GFP_KERNEL, "%s",
> +    fsl_soc->revision);
> + } else {
> + soc_dev_attr->soc_id = kasprintf(GFP_KERNEL, "0x%08x",
> svr);

kasprintf(GFP_KERNEL, "svr:0x%08x,", svr);


> +
> + soc_dev = soc_device_register(soc_dev_attr);
> + if (IS_ERR(soc_dev)) {
> + ret = -ENODEV;

Why are you changing the error code?

> + goto out;
> + } else {

Unnecessary "else".

> + pr_info("Detected: %s\n", soc_dev_attr->machine);

Machine: %s

> + pr_info("Detected SoC family: %s\n", soc_dev_attr->family);
> + pr_info("Detected SoC ID: %s, revision: %s\n",
> + soc_dev_attr->soc_id, soc_dev_attr->revision);

s/Detected //g


> + }
> + return 0;
> +out:
> +

Re: [v10, 3/7] soc: fsl: add GUTS driver for QorIQ platforms

2016-07-15 Thread Scott Wood
On Fri, 2016-07-15 at 12:43 -0400, Paul Gortmaker wrote:
> On Wed, May 4, 2016 at 11:12 PM, Yangbo Lu <yangbo...@nxp.com> wrote:
> > 
> > The global utilities block controls power management, I/O device
> > enabling, power-onreset(POR) configuration monitoring, alternate
> > function selection for multiplexed signals,and clock control.
> > 
> > This patch adds GUTS driver to manage and access global utilities
> > block.
> > 
> > Signed-off-by: Yangbo Lu <yangbo...@nxp.com>
> > Acked-by: Scott Wood <o...@buserror.net>
> > ---
> > Changes for v4:
> > - Added this patch
> > Changes for v5:
> > - Modified copyright info
> > - Changed MODULE_LICENSE to GPL
> > - Changed EXPORT_SYMBOL_GPL to EXPORT_SYMBOL
> > - Made FSL_GUTS user-invisible
> > - Added a complete compatible list for GUTS
> > - Stored guts info in file-scope variable
> > - Added mfspr() getting SVR
> > - Redefined GUTS APIs
> > - Called fsl_guts_init rather than using platform driver
> > - Removed useless parentheses
> > - Removed useless 'extern' key words
> > Changes for v6:
> > - Made guts thread safe in fsl_guts_init
> > Changes for v7:
> > - Removed 'ifdef' for function declaration in guts.h
> > Changes for v8:
> > - Fixes lines longer than 80 characters checkpatch issue
> > - Added 'Acked-by: Scott Wood'
> > Changes for v9:
> > - None
> > Changes for v10:
> > - None
> > ---
> >  drivers/soc/Kconfig  |   2 +-
> >  drivers/soc/fsl/Kconfig  |   8 +++
> >  drivers/soc/fsl/Makefile |   1 +
> >  drivers/soc/fsl/guts.c   | 119
> > 
> >  include/linux/fsl/guts.h | 126 +-
> > -
> >  5 files changed, 207 insertions(+), 49 deletions(-)
> >  create mode 100644 drivers/soc/fsl/Kconfig
> >  create mode 100644 drivers/soc/fsl/guts.c
> > 
> > diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
> > index cb58ef0..7106463 100644
> > --- a/drivers/soc/Kconfig
> > +++ b/drivers/soc/Kconfig
> > @@ -2,7 +2,7 @@ menu "SOC (System On Chip) specific Drivers"
> > 
> >  source "drivers/soc/bcm/Kconfig"
> >  source "drivers/soc/brcmstb/Kconfig"
> > -source "drivers/soc/fsl/qe/Kconfig"
> > +source "drivers/soc/fsl/Kconfig"
> >  source "drivers/soc/mediatek/Kconfig"
> >  source "drivers/soc/qcom/Kconfig"
> >  source "drivers/soc/rockchip/Kconfig"
> > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
> > new file mode 100644
> > index 000..b313759
> > --- /dev/null
> > +++ b/drivers/soc/fsl/Kconfig
> > @@ -0,0 +1,8 @@
> > +#
> > +# Freescale SOC drivers
> > +#
> > +
> > +source "drivers/soc/fsl/qe/Kconfig"
> > +
> > +config FSL_GUTS
> > +   bool
> > diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
> > index 203307f..02afb7f 100644
> > --- a/drivers/soc/fsl/Makefile
> > +++ b/drivers/soc/fsl/Makefile
> > @@ -4,3 +4,4 @@
> > 
> >  obj-$(CONFIG_QUICC_ENGINE) += qe/
> >  obj-$(CONFIG_CPM)  += qe/
> > +obj-$(CONFIG_FSL_GUTS) += guts.o
> > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> > new file mode 100644
> > index 000..fa155e6
> > --- /dev/null
> > +++ b/drivers/soc/fsl/guts.c
> > @@ -0,0 +1,119 @@
> > +/*
> > + * Freescale QorIQ Platforms GUTS Driver
> > + *
> > + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include 
> > +#include 
> Seems there was lots of discussion on this.  If it does end up being
> resent, it would be nice to get the module.h and other modular stuff
> gone since it is a bool Kconfig.

I plan to resend just the GUTS driver portion and send it through the PPC
tree.

I don't see any modular stuff in there besides the linux/module.h include.

-Scott


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/4] soc: fsl: add GUTS driver for QorIQ platforms

2016-07-07 Thread Scott Wood
On Thu, 2016-07-07 at 10:30 +0200, Arnd Bergmann wrote:
> On Thursday, July 7, 2016 2:35:33 AM CEST Yangbo Lu wrote:
> > 
> > Hi Arnd,
> > 
> > Could you reply when you see the email?
> > If your method doesn’t resolve the problem, we still want to use our old
> > patchset.
> > 
> > This guts driver had been discussed about one year and blocked many
> > workaround upstream.
> > So please help to review and comment soon.
> > 
> I don't really see how more discussion is going to help us here. I think
> I've made it pretty clear that I don't want to see another platform
> specific way to read an SoC revision and I've even sent a proof-of-concept
> patch to show how the interface can work, now it's up to you to fit the
> guts hardware into that and send a new patch series.

In which relevant maintainership capacity are you NACKing it?

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 4/4] Revert "powerpc/fsl: Move fsl_guts.h out of arch/powerpc"

2016-06-10 Thread Scott Wood
On Thu, 2016-06-02 at 11:01 +0200, Arnd Bergmann wrote:
> On Wednesday, June 1, 2016 8:24:20 PM CEST Scott Wood wrote:
> > On Mon, 2016-05-30 at 15:18 +0200, Arnd Bergmann wrote:
> > > All users of this driver are PowerPC specific and the header file
> > > has no business in the global include/linux/ hierarchy, so move
> > > it back before anyone starts using it on ARM.
> > > 
> > > This reverts commit 948486544713492f00ac8a9572909101ea892cb0.
> > > 
> > > Signed-off-by: Arnd Bergmann <a...@arndb.de>
> > > ---
> > > This part of the series is not required for the eSDHC quirk,
> > > but it restores the asm/fsl_guts.h header so it doesn't accidentally
> > > get abused for this in the future. I found two drivers outside of
> > > arch/powerpc that already accessed the registers directly, but the
> > > functions look fairly contained, and can be easily hidden in an
> > > #ifdef CONFIG_PPC
> > 
> > NACK
> > 
> > Besides adding ifdef pollution for no good reason, this register block is
> > used
> > on some ARM chips as well.  Why is it a problem if "anyone starts using it
> > on
> > ARM"?
> 
> It's just not a good interface when it's defined as "this is the layout of
> a register area that any driver can ioremap() if they can figure out the
> device node".

That's why I want to move accesses into one guts driver.

>  It's not uncommon to have register areas like that, but
> normally you have at the minimum a 'syscon' device to handle locking
> between drivers accessing the same registers and to avoid having to map
> the same area multiple times.

syscon requires device tree changes.

I don't see read-modify-write operations in regmap -- how does locking around
an individual, inherently-atomic load or store help?

> If we need to use 'guts' registers on ARM, we can find a way to abstract
> them properly for the given use cases, using a syscon or a driver with
> exported functions, but just making a PowerPC platform specific header
> global to all Linux drivers by putting it into include/linux doesn't seem
> right.

Again, it's not PowerPC-specific!  It started that way but then the same
register block got put onto some ARM chips.

It's not global to "all Linux drivers", just the ones that choose to include
an fsl-specific header.

If and when all uses of guts are moved into the guts driver, the header can be
moved into drivers/soc/fsl.

> Note that the header file uses a structure definition rather than the more
> common macros with register offsets, which is fine for a driver that has
> its own registers and abstracts them, but it doesn't really work with
> the regmap interface, so if we want to use it with syscon, it also needs to
> be rewritten.

We don't want to use it with syscon.  If we did, the solution wouldn't be to
move the header back to arch/powerpc, but to convert the struct into offsets.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-06-10 Thread Scott Wood
On Thu, 2016-06-02 at 10:52 +0200, Arnd Bergmann wrote:
> On Wednesday, June 1, 2016 8:11:14 PM CEST Scott Wood wrote:
> > > +#define T4240_HOST_VER ((VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) |
> > > SDHCI_SPEC_200)
> > > +static const struct soc_device_attribute esdhc_t4240_quirk = {
> > > + /* T4240 revision < 0x20 uses vendor version 23, SDHCI version 200
> > > */
> > > + { .soc_id = "T4*(0x824000)", .revision = "0x[01]?",
> > > +   .data = (void *)(uintptr_t)(T4240_HOST_VER) },
> > 
> > Why should this code need to care that the string begins with "T4"?  This
> > creates dual maintenance if that were to change.  It's also broken because
> > T4240 has compatible = "fsl,t4240-device-config", "fsl,qoriq-device-config
> > -2.0" and thus with these patches it would incorrectly show up as "P
> > series
> > (0x824000)".  The compatible string of this node was never meant to be a
> > key
> > for choosing a string to describe the system to userspace.
> 
> This is an artifact of not knowing the specific SoC name, and we can change
> that by looking up the name from the SVR value in the soc_device driver.

...or we could keep it simple and just match the number.

> > 0x824000 is a magic number which should be represented symbolically.
> 
> Sure, feel free to change the format of the soc_device string in any
> name,

That's not what I was asking for...  The match should be numeric but the
knowledge of what the number is should come from a symbolic #define.

> > If T4240 is affected, then so are the reduced-core variants T4160 and
> > T4080,
> > but 0x824000 doesn't match them (Yangbo's patch had the same problem). 
> >  And
> > please don't respond with "0x824*"
> > 
> > You also didn't strip out the E bit of SVR which indicates encryption
> > capability and nothing else (Yangbo's patch did not have this problem
> > because
> > it used SVR_SOC_VER).
> 
> Ok, that should be easy enough to fix in the soc_device driver.

No, because the soc_device driver doesn't know whether the consumer of the ID
cares about the E bit.

> > What happens if the revision condition is more complicated, such as <=
> > 0x20
> > with 0x21 being fine?  Multiple quirk entries where before we had as
> > simple
> > comparison?
> 
> I guess yes. I would really hope that there is no need to use this interface
> pervasively, it's really just to work around the cases where there is no
> way to pass the information in DT otherwise.

How does putting it in the DT work when you have multiple versions of the same
SoC, some of which have the bug and some which don't?

> > I fail to see how this approach is an improvement (much less one that
> > needs to
> > hold up a patchset that is fixing a problem and is not touching any
> > generic
> > code).  Why does this need to be a string?
> 
> A string is what user space gets in /sys/devices/soc/*,

It is rare that the kernel accesses information in the exact same way that
userspace does.  And once we expose this to userspace we're stuck with it, so
exporting anything other than a simple number is even less desirable.

>  and we already have
> code that does the same things there to work around quirks, here we just
> use the same interface in a completely generic way. Note that not every
> SoC family uses numbers in the same way, some have multiple subrevisions,
> some have names etc.

Where is the need for a "completely generic way" for one piece of vendor
-specific code to get information that is inherently specific to that vendor,
that is supplied by code specific to that vendor?

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] soc: fsl: add GUTS driver for QorIQ platforms

2016-06-10 Thread Scott Wood
On Thu, 2016-06-02 at 10:43 +0200, Arnd Bergmann wrote:
> On Wednesday, June 1, 2016 8:47:22 PM CEST Scott Wood wrote:
> > On Mon, 2016-05-30 at 15:15 +0200, Arnd Bergmann wrote:
> > > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> > > new file mode 100644
> > > index ..2f30698f5bcf
> > > --- /dev/null
> > > +++ b/drivers/soc/fsl/guts.c
> > > @@ -0,0 +1,130 @@
> > > +/*
> > > + * Freescale QorIQ Platforms GUTS Driver
> > > + *
> > > + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License as published by
> > > + * the Free Software Foundation; either version 2 of the License, or
> > > + * (at your option) any later version.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#define GUTS_PVR 0x0a0
> > > +#define GUTS_SVR 0x0a4
> > > +
> > > +struct guts {
> > > + void __iomem *regs;
> > 
> > We already have a struct to define guts.  Why are you not using it?  Why
> > do
> > you consider using it to be "abuse"?  What if we want to move more guts
> > functionality into this driver?
> 
> This structure was in the original patch, I left it in there, only
> removed the inclusion of the powerpc header file, which seemed to
> be misplaced.

I'm not refering "struct guts".  I'm referring to changing
"struct ccsr_guts __iomem *regs" into "void __iomem *regs".

And it's not a powerpc header file.

> > > +/*
> > > + * Table for matching compatible strings, for device tree
> > > + * guts node, for Freescale QorIQ SOCs.
> > > + */
> > > +static const struct of_device_id fsl_guts_of_match[] = {
> > > + /* For T4 & B4 Series SOCs */
> > > + { .compatible = "fsl,qoriq-device-config-1.0", .data = "T4/B4
> > > series" },
> > [snip]
> > > + { .compatible = "fsl,qoriq-device-config-2.0", .data = "P
> > > series"
> > 
> > As noted in my comment on patch 3/4, these descriptions are reversed.
> > 
> > They're also incomplete.  t2080 has device config 2.0.  t1040 is described
> > as
> > 2.0 though it should probably be 2.1 (or better, drop the generic
> > compatible
> > altogether).
> 
> Ok. Ideally I think we'd even look up the specific SoC names from the
> SVC rather than the compatible string. I just didn't have a good list
> for those to put in the driver.

The list is in arch/powerpc/include/asm/mpc85xx.h but I don't know why we need
to convert it to a string in the first place.

> 
> > > + /*
> > > +  * syscon devices default to little-endian, but on powerpc we
> > > have
> > > +  * existing device trees with big-endian maps and an absent
> > > endianess
> > > +  * "big-property"
> > > +  */
> > > + if (!IS_ENABLED(CONFIG_POWERPC) &&
> > > + !of_property_read_bool(dev->of_node, "big-endian"))
> > > + guts->little_endian = true;
> > 
> > This is not a syscon device (Yangbo's patch to add a guts node on ls2080
> > is
> > the only guts node that says "syscon", and that was a leftover from
> > earlier
> > revisions and should probably be removed).  Even if it were, where is it
> > documented that syscon defaults to little-endian?
> 
> Documentation/devicetree/bindings/regmap/regmap.txt
> 
> We had a little screwup here, basically regmap (and by consequence, syscon)
> always defaulted to little-endian way before that was documented, so it's
> too late to change it, 

What causes a device node to fall under the jurisdiction of regmap.txt? 
 Again, these nodes do not claim "syscon" compatibility.

> although I agree it would have made sense to document
> regmap to default to big-endian on powerpc.

Please don't.  It's enough of a mess as is; no need to start throwing in
architecture ifdefs.

> > Documentation/devicetree/bindings/common-properties.txt says that the
> > individual binding specifies the default.  The default for this node
> > should be
> > big-endian because that's what existed before there was a need to describe
> > the
> > endianness.  And we need an update to the guts binding to specify that.
> 
> Good point. This proably mea

Re: [PATCH 2/4] soc: fsl: add GUTS driver for QorIQ platforms

2016-06-01 Thread Scott Wood
On Mon, 2016-05-30 at 15:15 +0200, Arnd Bergmann wrote:
> diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> new file mode 100644
> index ..2f30698f5bcf
> --- /dev/null
> +++ b/drivers/soc/fsl/guts.c
> @@ -0,0 +1,130 @@
> +/*
> + * Freescale QorIQ Platforms GUTS Driver
> + *
> + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define GUTS_PVR 0x0a0
> +#define GUTS_SVR 0x0a4
> +
> +struct guts {
> + void __iomem *regs;

We already have a struct to define guts.  Why are you not using it?  Why do
you consider using it to be "abuse"?  What if we want to move more guts
functionality into this driver?

> + bool little_endian;
> + struct soc_device_attribute soc;
> +};
> +
> +static u32 fsl_guts_get_svr(struct guts *guts)
> +{
> + if (guts->little_endian)
> + return ioread32(guts->regs + GUTS_SVR);
> + else
> + return ioread32be(guts->regs + GUTS_SVR);
> +}
> +
> +static u32 fsl_guts_get_pvr(struct guts *guts)
> +{
> + if (guts->little_endian)
> + return ioread32(guts->regs + GUTS_PVR);
> + else
> + return ioread32be(guts->regs + GUTS_PVR);
> +}

You've removed the fallback to mfspr() on PPC, which would be helpful in some
virtualized environments where we don't have the guts node (but do have other
directly assigned devices).  Of course, this is a consequence of the
conversion into a platform device.

> +
> +/*
> + * Table for matching compatible strings, for device tree
> + * guts node, for Freescale QorIQ SOCs.
> + */
> +static const struct of_device_id fsl_guts_of_match[] = {
> + /* For T4 & B4 Series SOCs */
> + { .compatible = "fsl,qoriq-device-config-1.0", .data = "T4/B4
> series" },
[snip]
> + { .compatible = "fsl,qoriq-device-config-2.0", .data = "P series"

As noted in my comment on patch 3/4, these descriptions are reversed.

They're also incomplete.  t2080 has device config 2.0.  t1040 is described as
2.0 though it should probably be 2.1 (or better, drop the generic compatible
altogether).

> + /*
> +  * syscon devices default to little-endian, but on powerpc we have
> +  * existing device trees with big-endian maps and an absent
> endianess
> +  * "big-property"
> +  */
> + if (!IS_ENABLED(CONFIG_POWERPC) &&
> + !of_property_read_bool(dev->of_node, "big-endian"))
> + guts->little_endian = true;

This is not a syscon device (Yangbo's patch to add a guts node on ls2080 is
the only guts node that says "syscon", and that was a leftover from earlier
revisions and should probably be removed).  Even if it were, where is it
documented that syscon defaults to little-endian?  

Documentation/devicetree/bindings/common-properties.txt says that the
individual binding specifies the default.  The default for this node should be
big-endian because that's what existed before there was a need to describe the
endianness.  And we need an update to the guts binding to specify that.

> +
> + guts->regs = devm_ioremap_resource(dev, 0);
> + if (!guts->regs) {
> + ret = -ENOMEM;
> + kfree(guts);
> + goto out;
> + }
> +
> + fsl_guts_init(dev, guts);
> + ret = 0;
> +out:
> + return ret;
> +}
> +
> +static struct platform_driver fsl_soc_guts = {
> + .probe = fsl_guts_probe,
> + .driver.of_match_table = fsl_guts_of_match,
> +};
> +
> +module_platform_driver(fsl_soc_guts);

Again, this means that the information is not available during early boot,
such as in the clock driver.  Thus we would not be able to convert clk-qoriq's
direct mfspr(SPRN_SVR) into an soc_device_match() (or anything else that makes
use of this file), nor would we be able to move its access of the guts RCW
registers into this driver.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/4] Revert "powerpc/fsl: Move fsl_guts.h out of arch/powerpc"

2016-06-01 Thread Scott Wood
On Mon, 2016-05-30 at 15:18 +0200, Arnd Bergmann wrote:
> All users of this driver are PowerPC specific and the header file
> has no business in the global include/linux/ hierarchy, so move
> it back before anyone starts using it on ARM.
> 
> This reverts commit 948486544713492f00ac8a9572909101ea892cb0.
> 
> Signed-off-by: Arnd Bergmann 
> ---
> This part of the series is not required for the eSDHC quirk,
> but it restores the asm/fsl_guts.h header so it doesn't accidentally
> get abused for this in the future. I found two drivers outside of
> arch/powerpc that already accessed the registers directly, but the
> functions look fairly contained, and can be easily hidden in an
> #ifdef CONFIG_PPC

NACK

Besides adding ifdef pollution for no good reason, this register block is used
on some ARM chips as well.  Why is it a problem if "anyone starts using it on
ARM"?

BTW, of all the mailing lists you included on this CC, you seem to have left
off the PPC list (I've added it).

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-06-01 Thread Scott Wood
On Mon, 2016-05-30 at 15:16 +0200, Arnd Bergmann wrote:
> This is a rewrite of an earlier patch from Yangbo Lu, adding a quirk
> for the NXP QorIQ T4240 in the detection of the host device version.
> 
> Unfortunately, this device cannot be detected using the compatible
> string, as we have to support existing DTS files that use the generic
> "fsl,t4240-esdhc" identifier but that have other host versions that
> are correctly detected.
> 
> Signed-off-by: Arnd Bergmann 
> 
> diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of
> -esdhc.c
> index 3f34d354f1fc..1d4814fe4cb2 100644
> --- a/drivers/mmc/host/sdhci-of-esdhc.c
> +++ b/drivers/mmc/host/sdhci-of-esdhc.c
> @@ -73,14 +73,16 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host,
>  static u16 esdhc_readw_fixup(struct sdhci_host *host,
>int spec_reg, u32 value)
>  {
> + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> + struct sdhci_esdhc *esdhc = sdhci_pltfm_priv(pltfm_host);
>   u16 ret;
>   int shift = (spec_reg & 0x2) * 8;
>  
>   if (spec_reg == SDHCI_HOST_VERSION)
> - ret = value & 0x;
> - else
> - ret = (value >> shift) & 0x;
> - return ret;
> + return esdhc->vendor_ver << SDHCI_VENDOR_VER_SHIFT |
> +esdhc->spec_ver;
> +
> + return (value >> shift) & 0x;
>  }
>  
>  static u8 esdhc_readb_fixup(struct sdhci_host *host,
> @@ -562,16 +564,32 @@ static const struct sdhci_pltfm_data
> sdhci_esdhc_le_pdata = {
>   .ops = _esdhc_le_ops,
>  };
>  
> +#define T4240_HOST_VER ((VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) |
> SDHCI_SPEC_200)
> +static const struct soc_device_attribute esdhc_t4240_quirk = {
> + /* T4240 revision < 0x20 uses vendor version 23, SDHCI version 200
> */
> + { .soc_id = "T4*(0x824000)", .revision = "0x[01]?",
> +   .data = (void *)(uintptr_t)(T4240_HOST_VER) },

Why should this code need to care that the string begins with "T4"?  This
creates dual maintenance if that were to change.  It's also broken because
T4240 has compatible = "fsl,t4240-device-config", "fsl,qoriq-device-config
-2.0" and thus with these patches it would incorrectly show up as "P series
(0x824000)".  The compatible string of this node was never meant to be a key
for choosing a string to describe the system to userspace.

0x824000 is a magic number which should be represented symbolically.

If T4240 is affected, then so are the reduced-core variants T4160 and T4080,
but 0x824000 doesn't match them (Yangbo's patch had the same problem).  And
please don't respond with "0x824*"

You also didn't strip out the E bit of SVR which indicates encryption
capability and nothing else (Yangbo's patch did not have this problem because
it used SVR_SOC_VER).

What happens if the revision condition is more complicated, such as <= 0x20
with 0x21 being fine?  Multiple quirk entries where before we had as simple
comparison?

I fail to see how this approach is an improvement (much less one that needs to
hold up a patchset that is fixing a problem and is not touching any generic
code).  Why does this need to be a string?

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-05-10 Thread Scott Wood
On Thu, 2016-05-05 at 13:10 +0200, Arnd Bergmann wrote:
> On Thursday 05 May 2016 09:41:32 Yangbo Lu wrote:
> > > -Original Message-
> > > From: Arnd Bergmann [mailto:a...@arndb.de]
> > > Sent: Thursday, May 05, 2016 4:32 PM
> > > To: linuxppc-...@lists.ozlabs.org
> > > Cc: Yangbo Lu; linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > > linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org;
> > > linux-...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux-
> > > foundation.org; net...@vger.kernel.org; Mark Rutland;
> > > ulf.hans...@linaro.org; Russell King; Bhupesh Sharma; Joerg Roedel;
> > > Santosh Shilimkar; Yang-Leo Li; Scott Wood; Rob Herring; Claudiu Manoil;
> > > Kumar Gala; Xiaobo Xie; Qiang Zhao
> > > Subject: Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-
> > > R1.0-R2.0
> > > 
> > > On Thursday 05 May 2016 11:12:30 Yangbo Lu wrote:
> > > > IIRC, it is the same IP block as i.MX and Arnd's point is this won't
> > > > even compile on !PPC. It is things like this that prevent sharing the
> > > > driver.
> > 
> > The whole point of using the MMIO SVR instead of the PPC SPR is so that
> > it will work on ARM...  The guts driver should build on any platform as
> > long as OF is enabled, and if it doesn't find a node to bind to it will
> > return 0 for SVR, and the eSDHC driver will continue (after printing an
> > error that should be removed) without the ability to test for errata
> > based on SVR.
> 
> It feels like a bad design to have to come up with a different
> method for each SoC type here when they all do the same thing
> and want to identify some variant of the chip to do device
> specific quirks.
> 
> As far as I'm concerned, every driver in drivers/soc that needs to
> export a symbol to be used by a device driver is an indication that
> we don't have the right set of abstractions yet. There are cases
> that are not worth abstracting because the functionality is rather
> obscure and only a couple of drivers for one particular chip
> ever need it.
> 
> Finding out the version of the SoC does not look like this case.

I'm open to new ways of abstracting this, but can that please be discussed
after these patches are merged?  This patchset is fixing a problem, the
existing abstraction is unappealing and not widely adopted, a new abstraction
is not ready, and we're only touching code for our hardware.

Oh, and the existing abstraction isn't even "existing".  I don't see any
examples where soc_device is being used like this -- or even any way for a
driver (the one consuming the information, not the soc "driver") to get a
reference to the soc_device that's been registered short of searching for the
device object by name -- and you're asking for new functionality in
drivers/base/soc.c.

> > > I think the first four patches take care of building for ARM,
> > > but the problem remains if you want to enable COMPILE_TEST as
> > > we need for certain automated checking.
> > 
> > What specific problem is there with COMPILE_TEST?
> 
> COMPILE_TEST is solvable here and the way it is implemented in this
> case (selecting FSL_GUTS from the driver) indeed looks like it works
> correctly, but it's still awkward that this means building the
> SoC specific ID stuff into the vmlinux binary for any driver that
> uses something like that for a particular SoC.

Please keep in mind that this is a Freescale-specific driver... it's not as if
we're attaching this dependency to common SDHCI code.

> 
> > > > Dealing with Si revs is a common problem. We should have a
> > > > common solution. There is soc_device for this purpose.
> > > 
> > > Exactly. The last time this came up, I think we agreed to implement a
> > > helper using glob_match() on the soc_device strings. Unfortunately
> > > this hasn't happened then, but I'd still prefer that over yet another
> > > vendor-specific way of dealing with the generic issue.
> > 
> > soc_device would require encoding the SVR as a string and then decoding
> > the string, which is more complicated and error prone than having
> > platform-specific code test a platform-specific number. 
> 
> You already need to encode it as a string to register the soc_device,

No we don't, because we don't already register a soc_device on arm64 or ppc
(and it looks like whatever does get registered on at least some relevant
arm32 chips is not particularly useful).

> and the driver just needs to pass a glob string, so the only part that
> is missing is the generic function that takes the string from the
> 

Re: [v7, 0/5] Fix eSDHC host version register bug

2016-04-01 Thread Scott Wood
On Fri, 2016-04-01 at 11:07 +0800, Yangbo Lu wrote:
> This patchset is used to fix a host version register bug in the T4240-R1.0
> -R2.0
> eSDHC controller. To get the SoC version and revision, it's needed to add
> the
> GUTS driver to access the global utilities registers.
> 
> So, the first three patches are to add the GUTS driver.
> The following two patches are to enable GUTS driver support to get SVR in
> eSDHC
> driver and fix host version for T4240.
> 
> Yangbo Lu (5):
>   ARM64: dts: ls2080a: add device configuration node
>   soc: fsl: add GUTS driver for QorIQ platforms
>   dt: move guts devicetree doc out of powerpc directory
>   powerpc/fsl: move mpc85xx.h to include/linux/fsl
>   mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

Acked-by: Scott Wood <o...@buserror.net>

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v6, 3/5] dt: move guts devicetree doc out of powerpc directory

2016-03-19 Thread Scott Wood
On 03/17/2016 12:06 PM, Rob Herring wrote:
> On Wed, Mar 09, 2016 at 06:08:49PM +0800, Yangbo Lu wrote:
>> Move guts devicetree doc to Documentation/devicetree/bindings/soc/fsl/
>> since it's used by not only PowerPC but also ARM. And add a specification
>> for 'little-endian' property.
>>
>> Signed-off-by: Yangbo Lu 
>> ---
>> Changes for v2:
>>  - None
>> Changes for v3:
>>  - None
>> Changes for v4:
>>  - Added this patch
>> Changes for v5:
>>  - Modified the description for little-endian property
>> Changes for v6:
>>  - None
>> ---
>>  Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt | 3 +++
>>  1 file changed, 3 insertions(+)
>>  rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)
>>
>> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt 
>> b/Documentation/devicetree/bindings/soc/fsl/guts.txt
>> similarity index 91%
>> rename from Documentation/devicetree/bindings/powerpc/fsl/guts.txt
>> rename to Documentation/devicetree/bindings/soc/fsl/guts.txt
>> index b71b203..07adca9 100644
>> --- a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt
>> +++ b/Documentation/devicetree/bindings/soc/fsl/guts.txt
>> @@ -25,6 +25,9 @@ Recommended properties:
>>   - fsl,liodn-bits : Indicates the number of defined bits in the LIODN
>> registers, for those SOCs that have a PAMU device.
>>  
>> + - little-endian : Indicates that the global utilities block is little
>> +   endian. The default is big endian.
> 
> The default is "the native endianness of the system". So absence on an 
> ARM system would be LE.

No.  For this binding, the default is big-endian, because that's what
existed for this device before an endian property was added.

"endianness of the system" is not a well-defined concept.

> This property is valid for any simple-bus device, 

Since when does simple-bus mean anything more than that the nodes
underneath it can be used without bus-specific knowledge?

> so it isn't really required to document per device. You can, but 
> your description had better match the documented behaviour.

Documented where?

In fact, Documentation/devicetree/bindings/common-properties.txt
explicitly says of the endian properties, "If a binding supports these
properties, then the binding should also specify the default behavior if
none of these properties are present."

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v6, 2/5] soc: fsl: add GUTS driver for QorIQ platforms

2016-03-19 Thread Scott Wood
On 03/09/2016 04:18 AM, Yangbo Lu wrote:
> +#ifdef CONFIG_FSL_GUTS
> +u32 fsl_guts_get_svr(void);
> +int fsl_guts_init(void);
> +#endif

Don't ifdef prototypes (when not providing a stub alternative).

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v6, 5/5] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-03-19 Thread Scott Wood
On 03/14/2016 02:29 AM, Yangbo Lu wrote:
>> -Original Message-
>> From: Arnd Bergmann [mailto:a...@arndb.de]
>> Sent: Monday, March 14, 2016 6:26 AM
>> To: linuxppc-...@lists.ozlabs.org
>> Cc: Yangbo Lu; devicet...@vger.kernel.org; linux-arm-
>> ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux-
>> c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux-
>> foundation.org; net...@vger.kernel.org; linux-...@vger.kernel.org;
>> ulf.hans...@linaro.org; Zhao Qiang; Russell King; Bhupesh Sharma; Joerg
>> Roedel; Santosh Shilimkar; Scott Wood; Rob Herring; Claudiu Manoil; Kumar
>> Gala; Yang-Leo Li; Xiaobo Xie
>> Subject: Re: [v6, 5/5] mmc: sdhci-of-esdhc: fix host version for T4240-
>> R1.0-R2.0
>>
>> On Wednesday 09 March 2016 18:08:51 Yangbo Lu wrote:
>>> @@ -567,10 +580,20 @@ static void esdhc_init(struct platform_device
>> *pdev, struct sdhci_host *host)
>>> struct sdhci_pltfm_host *pltfm_host;
>>> struct sdhci_esdhc *esdhc;
>>> u16 host_ver;
>>> +   u32 svr;
>>>
>>> pltfm_host = sdhci_priv(host);
>>> esdhc = sdhci_pltfm_priv(pltfm_host);
>>>
>>> +   fsl_guts_init();
>>> +   svr = fsl_guts_get_svr();
>>> +   if (svr) {
>>> +   esdhc->soc_ver = SVR_SOC_VER(svr);
>>> +   esdhc->soc_rev = SVR_REV(svr);
>>> +   } else {
>>> +   dev_err(>dev, "Failed to get SVR value!\n");
>>> +   }
>>> +
>>
>> This makes the driver non-portable. Better identify the specific
>> workarounds based on the compatible string for this device, or add a
>> boolean DT property for the quirk.
>>
>>  Arnd
> 
> [Lu Yangbo-B47093] Hi Arnd, we did have a discussion about using DTS in v1 
> before.
> https://patchwork.kernel.org/patch/6834221/
> 
> We don’t have a separate DTS file for each revision of an SOC and if we did, 
> we'd constantly have people using the wrong one.
> In addition, the device tree is stable ABI and errata are often discovered 
> after device tree are deployed.
> See the link for details.
> 
> So we decide to read SVR from the device-config/guts MMIO block other than 
> using DTS.
> Thanks.

Also note that this driver is already only for fsl-specific hardware,
and it will still work even if fsl_guts doesn't find anything to bind to
-- it just wouldn't be able to detect errata based on SVR in that case.

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v6, 5/5] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-03-18 Thread Scott Wood
On 03/17/2016 12:06 PM, Arnd Bergmann wrote:
> On Thursday 17 March 2016 12:01:01 Rob Herring wrote:
>> On Mon, Mar 14, 2016 at 05:45:43PM +0000, Scott Wood wrote:
> 
>>>>> This makes the driver non-portable. Better identify the specific
>>>>> workarounds based on the compatible string for this device, or add a
>>>>> boolean DT property for the quirk.
>>>>>
>>>>>Arnd
>>>>
>>>> [Lu Yangbo-B47093] Hi Arnd, we did have a discussion about using DTS in v1 
>>>> before.
>>>> https://patchwork.kernel.org/patch/6834221/
>>>>
>>>> We don’t have a separate DTS file for each revision of an SOC and if we 
>>>> did, we'd constantly have people using the wrong one.
>>>> In addition, the device tree is stable ABI and errata are often discovered 
>>>> after device tree are deployed.
>>>> See the link for details.
>>>>
>>>> So we decide to read SVR from the device-config/guts MMIO block other than 
>>>> using DTS.
>>>> Thanks.
>>>
>>> Also note that this driver is already only for fsl-specific hardware,
>>> and it will still work even if fsl_guts doesn't find anything to bind to
>>> -- it just wouldn't be able to detect errata based on SVR in that case.
>>
>> IIRC, it is the same IP block as i.MX and Arnd's point is this won't 
>> even compile on !PPC. It is things like this that prevent sharing the 
>> driver.

The whole point of using the MMIO SVR instead of the PPC SPR is so that
it will work on ARM...  The guts driver should build on any platform as
long as OF is enabled, and if it doesn't find a node to bind to it will
return 0 for SVR, and the eSDHC driver will continue (after printing an
error that should be removed) without the ability to test for errata
based on SVR.

> I think the first four patches take care of building for ARM,
> but the problem remains if you want to enable COMPILE_TEST as
> we need for certain automated checking.

What specific problem is there with COMPILE_TEST?

>> Dealing with Si revs is a common problem. We should have a 
>> common solution. There is soc_device for this purpose.
> 
> Exactly. The last time this came up, I think we agreed to implement a
> helper using glob_match() on the soc_device strings. Unfortunately
> this hasn't happened then, but I'd still prefer that over yet another
> vendor-specific way of dealing with the generic issue.

soc_device would require encoding the SVR as a string and then decoding
the string, which is more complicated and error prone than having
platform-specific code test a platform-specific number.  And when would
it get registered on arm64, which doesn't have platform code?

-Scott

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/fsl: Fix the dependency check for PAMU driver.

2015-05-14 Thread Scott Wood
On Thu, 2015-05-14 at 23:11 +0530, Varun Sethi wrote:
 Fix the build dependency for the PAMU driver. PPC32 build dependecy is 
 incorrect.
 Add the CORENET_GENERIC build dependency for PAMU driver.
 
 Signed-off-by: Varun Sethi varun.se...@freescale.com
 ---
  drivers/iommu/Kconfig |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
 index 1ae4e54..4ace8db 100644
 --- a/drivers/iommu/Kconfig
 +++ b/drivers/iommu/Kconfig
 @@ -50,7 +50,7 @@ config OF_IOMMU
  
  config FSL_PAMU
   bool Freescale IOMMU support
 - depends on PPC32
 + depends on CORENET_GENERIC
   depends on PPC_E500MC || COMPILE_TEST
   select IOMMU_API
   select GENERIC_ALLOCATOR

CORENET_GENERIC is for board support.  There is no guarantee that all
corenet boards will use it.  You already depend on PPC_E500MC; why do
you need anything else (besides probably getting rid of ||
COMPILE_TEST which is useless if you do add CORENET_GENERIC, because
CORENET_GENERIC implies PPC_E500MC)?

-Scott


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()

2014-02-17 Thread Scott Wood
On Sat, 2014-02-15 at 12:19 -0600, Yoder Stuart-B08248 wrote:
 
  -Original Message-
  From: Greg KH [mailto:gre...@linuxfoundation.org]
  Sent: Saturday, February 15, 2014 11:34 AM
  To: Yoder Stuart-B08248
  Cc: Antonios Motakis; alex.william...@redhat.com;
  kvm...@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; linux-
  ker...@vger.kernel.org; t...@virtualopensystems.com;
  a.r...@virtualopensystems.com; kim.phill...@linaro.org;
  jan.kis...@siemens.com; k...@vger.kernel.org; Bhushan Bharat-R65777; Wood
  Scott-B07421; christoffer.d...@linaro.org; ag...@suse.de; Sethi Varun-
  B16395; will.dea...@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter Roeck;
  Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas
  Subject: Re: [RFC PATCH v4 01/10] driver core: export
  driver_probe_device()
  
  On Sat, Feb 15, 2014 at 04:33:44PM +, Stuart Yoder wrote:
   Are you in principle opposed to any mechanism that would allow 2
  drivers
   to be resident/active and allow a sysadmin to explicitly bind a
   particular device instance to the driver of their choice?
  
  No, that works today with the bind/unbind/new_id files, it's just that
  you don't like it :)
 
 We don't like it because of the ambiguities/race-conditions with
 the current situation.

Plus, it's semantically weird (a.k.a. a hack).  The user isn't trying to
bind an entire type of device to the vfio driver, but rather a specific
device.  Races and similar ugliness is often what you get when you try
to pile things on top of the wrong abstraction.  That you can hack
around the races with a userspace loop (and hope that no damage was done
by the wrong driver in the meantime -- packets sent, filesystems
automounted, other inappropriate I/O performed, driver unbind
bugs/unwillingness encountered, etc) is not a particularly satisfying
answer.  At best the race fixup will end up being a poorly tested code
path (if the person scripting userspace thinks of doing it at all).

It also doesn't work today because there is no new_id for platform
devices, and the matching situation for platform devices is more
complicated than on PCI, so it would be more awkward to implement and
more awkward to use.

We can apply enough grease and pound the square peg through the round
hole if we must, but we'd like to first exhaust our options for doing it
in a simple, straightforward, robust, and semantically sensible manner
-- especially since once we start supporting the new_id approach for
vfio binding on platform devices it'll be ABI that we're stuck with.

-Scott


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd

2014-02-10 Thread Scott Wood
On Mon, 2014-02-10 at 15:45 -0700, Alex Williamson wrote:
 On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote:
  VFIO returns a file descriptor which we can use to manipulate the memory
  regions of the device. Since some memory regions we cannot mmap due to
  security concerns, we also allow to read and write to this file descriptor
  directly.
  
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  Tested-by: Alvise Rigo a.r...@virtualopensystems.com
  ---
   drivers/vfio/platform/vfio_platform.c | 128 
  +-
   1 file changed, 125 insertions(+), 3 deletions(-)
  
  diff --git a/drivers/vfio/platform/vfio_platform.c 
  b/drivers/vfio/platform/vfio_platform.c
  index f7db5c0..ee96078 100644
  --- a/drivers/vfio/platform/vfio_platform.c
  +++ b/drivers/vfio/platform/vfio_platform.c
  @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct 
  vfio_platform_device *vdev)
   
  region.addr = res-start;
  region.size = resource_size(res);
  -   region.flags = 0;
  +   region.flags = VFIO_REGION_INFO_FLAG_READ
  +   | VFIO_REGION_INFO_FLAG_WRITE;
   
  vdev-region[i] = region;
  }
  @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data,
   static ssize_t vfio_platform_read(void *device_data, char __user *buf,
   size_t count, loff_t *ppos)
   {
  -   return 0;
  +   struct vfio_platform_device *vdev = device_data;
  +   unsigned int *io;
  +   int i;
  +
  +   for (i = 0; i  vdev-num_regions; i++) {
  +   struct vfio_platform_region region = vdev-region[i];
  +   unsigned int done = 0;
  +   loff_t off;
  +
  +   if ((*ppos  region.addr)
  +|| (*ppos + count - 1) = (region.addr + region.size))
  +   continue;
 
 Perhaps there's something to be said for vfio-pci's use of fixed offsets
 to have a direct offset to index lookup.
 
  +
  +   io = ioremap_nocache(region.addr, region.size);
 
 This must incur some overhead per access.

There's mmap() if you want fast...  Given the limited ioremap space on
32-bit, I can see not wanting to map everything that the user has open
all the time -- but in that case, wouldn't it be better to just map one
page here rather than the whole region?

  +
  +   off = *ppos - region.addr;
  +
  +   while (count) {
  +   size_t filled;
  +
  +   if (count = 4  !(off % 4)) {
  +   u32 val;
  +
  +   val = ioread32(io + off);
  +   if (copy_to_user(buf, val, 4))
  +   goto err;
 
 For vfio-pci we've decided that these interfaces are always little
 endian, have you considered whether it makes sense to do something
 similar here?  Thanks,

ioread32() is little endian -- but since read() puts its result in the
caller's memory buffer (rather than a register return), I think it makes
more sense to preserve byte-invariance -- similar to the conclusion of
the recent KVM MMIO API clarification discussion.  Then the VFIO user
would use the same type of access (byte swapped or not) to access the
read() buffer that they would have used to access the register directly.

Forcing little endian is a better fit for PCI (which is inherently
little endian) than for platform devices which can be either endianness.

-Scott


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-10 Thread Scott Wood
My e-mail address is scottw...@freescale.com, not
IMCEAEX-_O=MMS_OU=EXTERNAL+20+28FYDIBOHF25SPDLT
+29_CN=RECIPIENTS_CN=f0faac8d7e74473a9ee1c45b068d8...@namprd03.prod.outlook.com

On Tue, 2013-12-10 at 05:37 +, bharat.bhus...@freescale.com wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Saturday, December 07, 2013 12:55 AM
  To: Bhushan Bharat-R65777
  Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-
  B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
  d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
  Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
  
  If the administrator does not opt into this partial loss of isolation, then 
  once
  you run out of MSI groups, new users should not be able to set up MSIs.
 
 So mean vfio should use Legacy when out of MSI banks?

Yes, if the administrator hasn't granted permission to share.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-06 Thread Scott Wood
On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, December 06, 2013 5:52 AM
  To: Bhushan Bharat-R65777
  Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-
  B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
  d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
  Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
 
  On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote:
  
-Original Message-
From: Bhushan Bharat-R65777
Sent: Wednesday, November 27, 2013 9:39 PM
To: 'Alex Williamson'
Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de;
Yoder Stuart- B08248; iommu@lists.linux-foundation.org;
bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
linux-ker...@vger.kernel.org
Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale
IOMMU (PAMU)
   
If we just provide the size of MSI bank to userspace then userspace
cannot do anything wrong.
  
   So userspace does not know address, so it cannot mmap and cause any
  interference by directly reading/writing.
 
  That's security through obscurity...  Couldn't the malicious user find out 
  the
  address via other means, such as experimentation on another system over 
  which
  they have full control?  What would happen if the user reads from their 
  device's
  PCI config space?  Or gets the information via some back door in the PCI 
  device
  they own?  Or pokes throughout the address space looking for something that
  generates an interrupt to its own device?
 
 So how to solve this problem, Any suggestion ?
 
 We have to map one window in PAMU for MSIs and a malicious user can ask
 its device to do DMA to MSI window region with any pair of address and
 data, which can lead to unexpected MSIs in system?

I don't think there are any solutions other than to limit each bank to
one user, unless the admin turns some knob that says they're OK with the
partial loss of isolation.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-06 Thread Scott Wood
On Thu, 2013-12-05 at 22:17 -0600, Bharat Bhushan wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, December 06, 2013 5:31 AM
  To: Bhushan Bharat-R65777
  Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-
  B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
  d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
  Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
 
  On Sun, 2013-11-24 at 23:33 -0600, Bharat Bhushan wrote:
  
-Original Message-
From: Alex Williamson [mailto:alex.william...@redhat.com]
Sent: Friday, November 22, 2013 2:31 AM
To: Wood Scott-B07421
Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de;
Yoder Stuart-B08248; iommu@lists.linux-foundation.org;
bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
linux-ker...@vger.kernel.org
Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale
IOMMU (PAMU)
   
On Thu, 2013-11-21 at 14:47 -0600, Scott Wood wrote:
 They can interfere.
  
   Want to be sure of how they can interfere?
 
  If more than one VFIO user shares the same MSI group, one of the users can 
  send
  MSIs to another user, by using the wrong interrupt within the bank.  
  Unexpected
  MSIs could cause misbehavior or denial of service.
 
 With this hardware, the only way to prevent that
 is to make sure that a bank is not shared by multiple protection 
 contexts.
 For some of our users, though, I believe preventing this is less
 important than the performance benefit.
  
   So should we let this patch series in without protection?
 
  No, there should be some sort of opt-in mechanism similar to IOMMU-less 
  VFIO --
  but not the same exact one, since one is a much more serious loss of 
  isolation
  than the other.
 
 Can you please elaborate opt-in mechanism?

The system should be secure by default.  If the administrator wants to
relax protection in order to accomplish some functionality, that should
require an explicit request such as a write to a sysfs file.

I think we need some sort of ownership model around the msi banks then.
Otherwise there's nothing preventing another userspace from
attempting an MSI based attack on other users, or perhaps even on
the host.  VFIO can't allow that.  Thanks,
  
   We have very few (3 MSI bank on most of chips), so we can not assign
   one to each userspace.
 
  That depends on how many users there are.
 
 What I think we can do is:
  - Reserve one MSI region for host. Host will not share MSI region with Guest.
  - For upto 2 Guest (MAX msi with host - 1) give then separate MSI sub regions
  - Additional Guest will share MSI region with other guest.
 
 Any better suggestion are most welcome.

If the administrator does not opt into this partial loss of isolation,
then once you run out of MSI groups, new users should not be able to set
up MSIs.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-06 Thread Scott Wood
On Fri, 2013-12-06 at 12:30 -0700, Alex Williamson wrote:
 On Fri, 2013-12-06 at 12:59 -0600, Scott Wood wrote:
  On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote:
   
-Original Message-
From: Wood Scott-B07421
Sent: Friday, December 06, 2013 5:52 AM
To: Bhushan Bharat-R65777
Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder 
Stuart-
B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU 
(PAMU)
   
On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote:

  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Wednesday, November 27, 2013 9:39 PM
  To: 'Alex Williamson'
  Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de;
  Yoder Stuart- B08248; iommu@lists.linux-foundation.org;
  bhelg...@google.com; linuxppc- d...@lists.ozlabs.org;
  linux-ker...@vger.kernel.org
  Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale
  IOMMU (PAMU)
 
  If we just provide the size of MSI bank to userspace then userspace
  cannot do anything wrong.

 So userspace does not know address, so it cannot mmap and cause any
interference by directly reading/writing.
   
That's security through obscurity...  Couldn't the malicious user find 
out the
address via other means, such as experimentation on another system over 
which
they have full control?  What would happen if the user reads from their 
device's
PCI config space?  Or gets the information via some back door in the 
PCI device
they own?  Or pokes throughout the address space looking for something 
that
generates an interrupt to its own device?
   
   So how to solve this problem, Any suggestion ?
   
   We have to map one window in PAMU for MSIs and a malicious user can ask
   its device to do DMA to MSI window region with any pair of address and
   data, which can lead to unexpected MSIs in system?
  
  I don't think there are any solutions other than to limit each bank to
  one user, unless the admin turns some knob that says they're OK with the
  partial loss of isolation.
 
 Even if the admin does opt-in to an allow_unsafe_interrupts options, it
 should still be reasonably difficult for one guest to interfere with the
 other.  I don't think we want to rely on the blind luck of making the
 full MSI bank accessible to multiple guests and hoping they don't step
 on each other.  That probably means that vfio needs to manage the space
 rather than the guest.  Thanks,

Yes, the MSIs within a given bank would be allocated by the host kernel
in any case (presumably by the MSI driver, not VFIO itself).  This is
just about what happens if the MSI page is written to outside of the
normal mechanism.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-05 Thread Scott Wood
On Sun, 2013-11-24 at 23:33 -0600, Bharat Bhushan wrote:
 
  -Original Message-
  From: Alex Williamson [mailto:alex.william...@redhat.com]
  Sent: Friday, November 22, 2013 2:31 AM
  To: Wood Scott-B07421
  Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de; Yoder
  Stuart-B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; 
  linuxppc-
  d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
  Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
 
  On Thu, 2013-11-21 at 14:47 -0600, Scott Wood wrote:
   On Thu, 2013-11-21 at 13:43 -0700, Alex Williamson wrote:
On Thu, 2013-11-21 at 11:20 +, Bharat Bhushan wrote:

  -Original Message-
  From: Alex Williamson [mailto:alex.william...@redhat.com]
  Sent: Thursday, November 21, 2013 12:17 AM
  To: Bhushan Bharat-R65777
  Cc: j...@8bytes.org; bhelg...@google.com; ag...@suse.de; Wood
  Scott-B07421; Yoder Stuart-B08248;
  iommu@lists.linux-foundation.org; linux- p...@vger.kernel.org;
  linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org;
  Bhushan Bharat-R65777
  Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale
  IOMMU (PAMU)
 
  Is VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT per aperture (ie. each
  vfio user has $COUNT regions at their disposal exclusively)?

 Number of msi-bank count is system wide and not per aperture, But 
 will be
  setting windows for banks in the device aperture.
 So say if we are direct assigning 2 pci device (both have different 
 iommu
  group, so 2 aperture in iommu) to VM.
 Now qemu can make only one call to know how many msi-banks are there 
 but
  it must set sub-windows for all banks for both pci device in its respective
  aperture.
   
I'm still confused.  What I want to make sure of is that the banks
are independent per aperture.  For instance, if we have two separate
userspace processes operating independently and they both chose to
use msi bank zero for their device, that's bank zero within each
aperture and doesn't interfere.  Or another way to ask is can a
malicious user interfere with other users by using the wrong bank.
Thanks,
  
   They can interfere.
 
 Want to be sure of how they can interfere?

If more than one VFIO user shares the same MSI group, one of the users
can send MSIs to another user, by using the wrong interrupt within the
bank.  Unexpected MSIs could cause misbehavior or denial of service.

   With this hardware, the only way to prevent that
   is to make sure that a bank is not shared by multiple protection contexts.
   For some of our users, though, I believe preventing this is less
   important than the performance benefit.
 
 So should we let this patch series in without protection?

No, there should be some sort of opt-in mechanism similar to IOMMU-less
VFIO -- but not the same exact one, since one is a much more serious
loss of isolation than the other.

  I think we need some sort of ownership model around the msi banks then.
  Otherwise there's nothing preventing another userspace from attempting an 
  MSI
  based attack on other users, or perhaps even on the host.  VFIO can't allow
  that.  Thanks,
 
 We have very few (3 MSI bank on most of chips), so we can not assign
 one to each userspace.

That depends on how many users there are.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-12-05 Thread Scott Wood
On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote:
 
  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Wednesday, November 27, 2013 9:39 PM
  To: 'Alex Williamson'
  Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder 
  Stuart-
  B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
  d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
  Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
 
 
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Monday, November 25, 2013 10:08 PM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder
   Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com;
   linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
   Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU
   (PAMU)
  
   On Mon, 2013-11-25 at 05:33 +, Bharat Bhushan wrote:
   
 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Friday, November 22, 2013 2:31 AM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org;
 ag...@suse.de; Yoder Stuart-B08248;
 iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc-
 d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale
 IOMMU (PAMU)

 On Thu, 2013-11-21 at 14:47 -0600, Scott Wood wrote:
  On Thu, 2013-11-21 at 13:43 -0700, Alex Williamson wrote:
   On Thu, 2013-11-21 at 11:20 +, Bharat Bhushan wrote:
   
 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Thursday, November 21, 2013 12:17 AM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; bhelg...@google.com; ag...@suse.de;
 Wood Scott-B07421; Yoder Stuart-B08248;
 iommu@lists.linux-foundation.org; linux-
 p...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for
 Freescale IOMMU (PAMU)

 Is VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT per aperture (ie.
 each vfio user has $COUNT regions at their disposal 
 exclusively)?
   
Number of msi-bank count is system wide and not per
aperture, But will be
 setting windows for banks in the device aperture.
So say if we are direct assigning 2 pci device (both have
different iommu
 group, so 2 aperture in iommu) to VM.
Now qemu can make only one call to know how many msi-banks
are there but
 it must set sub-windows for all banks for both pci device in its
 respective aperture.
  
   I'm still confused.  What I want to make sure of is that the
   banks are independent per aperture.  For instance, if we have
   two separate userspace processes operating independently and
   they both chose to use msi bank zero for their device, that's
   bank zero within each aperture and doesn't interfere.  Or
   another way to ask is can a malicious user interfere with
   other users by
   using the wrong bank.
   Thanks,
 
  They can interfere.
   
Want to be sure of how they can interfere?
  
   What happens if more than one user selects the same MSI bank?
   Minimally, wouldn't that result in the IOMMU blocking transactions
   from the previous user once the new user activates their mapping?
 
  Yes and no; With current implementation yes but with a minor change no. 
  Later in
  this response I will explain how.
 
  
  With this hardware, the only way to prevent that
  is to make sure that a bank is not shared by multiple protection
  contexts.
  For some of our users, though, I believe preventing this is less
  important than the performance benefit.
   
So should we let this patch series in without protection?
  
   No.
  

 I think we need some sort of ownership model around the msi banks 
 then.
 Otherwise there's nothing preventing another userspace from
 attempting an MSI based attack on other users, or perhaps even on
 the host.  VFIO can't allow that.  Thanks,
   
We have very few (3 MSI bank on most of chips), so we can not assign
one to each userspace. What we can do is host and userspace does not
share a MSI bank while userspace will share a MSI bank.
  
   Then you probably need VFIO to own the MSI bank and program devices
   into it rather than exposing the MSI banks to userspace to let them have
  direct access.
 
  Overall idea of exposing the details of msi regions to userspace are
   1) User space can define the aperture size to fit MSI mapping in IOMMU.
   2) setup iova for a MSI banks; which is just after guest memory

Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes

2013-11-18 Thread Scott Wood
On Thu, 2013-11-14 at 21:16 -0600, Sethi Varun-B16395 wrote:
 Haiying/Scott,
 Forgot to mention this, the PAMU driver has to handle stash destination
 settings both for power and dsp cores (on B4 platform). For the dsp
 cores we would expect the physical core id (not controlled by Linux).
 To make the interface consistent, I would expect the caller (for
 iommu_set_attr) to pass the physical core id.

That sounds like you need two different interfaces.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes

2013-11-18 Thread Scott Wood
On Mon, 2013-11-18 at 20:42 -0600, Varun Sethi wrote:
 For the DSP case again we have to set up the stash attribute. Are you saying 
 that this should be a separate attribute?

Not necessarily a separate attribute, but there should be some way to
distinguish whether you're providing a Linux cpu number or some external
stash destination.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes

2013-11-14 Thread Scott Wood
On Thu, 2013-11-14 at 14:30 -0500, Haiying Wang wrote:
 In the case we miss to bring up some cpus, we need to make sure we can
 find the correct cpu nodes in the device tree based on the given logical
 cpu index from the caller.
 
 Signed-off-by: Haiying Wang haiying.w...@freescale.com
 ---
  drivers/iommu/fsl_pamu.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
 index cba0498..a9ab57b 100644
 --- a/drivers/iommu/fsl_pamu.c
 +++ b/drivers/iommu/fsl_pamu.c
 @@ -539,6 +539,7 @@ u32 get_stash_id(u32 stash_dest_hint, u32 vcpu)

Should probably also s/vcpu/cpu/g as vcpu makes no sense outside of
virtualization code.

   u32 cache_level;
   int len, found = 0;
   int i;
 + u32 cpuid = get_hard_smp_processor_id(vcpu);

s/cpuid/phys_cpu/ or similar

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Scott Wood
On Tue, 2013-10-08 at 10:47 -0600, Bjorn Helgaas wrote:
 On Thu, Oct 3, 2013 at 11:19 PM, Bhushan Bharat-R65777
 r65...@freescale.com wrote:
 
  I don't know enough about VFIO to understand why these new interfaces are
  needed.  Is this the first VFIO IOMMU driver?  I see 
  vfio_iommu_spapr_tce.c and
  vfio_iommu_type1.c but I don't know if they're comparable to the Freescale 
  PAMU.
  Do other VFIO IOMMU implementations support MSI?  If so, do they handle the
  problem of mapping the MSI regions in a different way?
 
  PAMU is an aperture type of IOMMU while other are paging type, So they are 
  completely different from what PAMU is and handle that differently.
 
 This is not an explanation or a justification for adding new
 interfaces.  I still have no idea what an aperture type IOMMU is,
 other than that it is different.  But I see that Alex is working on
 this issue with you in a different thread, so I'm sure you guys will
 sort it out.

PAMU is a very constrained IOMMU that cannot do arbitrary page mappings.
Due to these constraints, we cannot map the MSI I/O page at its normal
address while also mapping RAM at the address we want.  The address we
can map it at depends on the addresses of other mappings, so it can't be
hidden in the IOMMU driver -- the user needs to be in control.

Another difference is that (if I understand correctly) PCs handle MSIs
specially, via interrupt remapping, rather than being translated as a
normal memory access through the IOMMU.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Scott Wood
On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
 @@ -376,6 +405,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
   int len;
   u32 offset;
   static const u32 all_avail[] = { 0, NR_MSI_IRQS };
 + static int bank_index;
  
   match = of_match_device(fsl_of_msi_ids, dev-dev);
   if (!match)
 @@ -419,8 +449,8 @@ static int fsl_of_msi_probe(struct platform_device *dev)
   dev-dev.of_node-full_name);
   goto error_out;
   }
 - msi-msiir_offset =
 - features-msiir_offset + (res.start  0xf);
 + msi-msiir = res.start + features-msiir_offset;
 + printk(msi-msiir = %llx\n, msi-msiir);

dev_dbg or remove

   }
  
   msi-feature = features-fsl_pic_ip;
 @@ -470,6 +500,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
   }
   }
  
 + msi-bank_index = bank_index++;

What if multiple MSIs are boing probed in parallel?  bank_index is not
atomic.

 diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h
 index 8225f86..6bd5cfc 100644
 --- a/arch/powerpc/sysdev/fsl_msi.h
 +++ b/arch/powerpc/sysdev/fsl_msi.h
 @@ -29,12 +29,19 @@ struct fsl_msi {
   struct irq_domain *irqhost;
  
   unsigned long cascade_irq;
 -
 - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
 + dma_addr_t msiir; /* MSIIR Address in CCSR */

Are you sure dma_addr_t is right here, versus phys_addr_t?  It implies
that it's the output of the DMA API, but I don't think the DMA API is
used in the MSI driver.  Perhaps it should be, but we still want the raw
physical address to pass on to VFIO.

   void __iomem *msi_regs;
   u32 feature;
   int msi_virqs[NR_MSI_REG];
  
 + /*
 +  * During probe each bank is assigned a index number.
 +  * index number ranges from 0 to 2^32.
 +  * Example  MSI bank 1 = 0
 +  * MSI bank 2 = 1, and so on.
 +  */
 + int bank_index;

2^32 doesn't fit in int (nor does 2^32 - 1).

Just say that indices start at 0.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Scott Wood
On Tue, 2013-10-08 at 17:25 -0600, Bjorn Helgaas wrote:
  - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
  + dma_addr_t msiir; /* MSIIR Address in CCSR */
 
  Are you sure dma_addr_t is right here, versus phys_addr_t?  It implies
  that it's the output of the DMA API, but I don't think the DMA API is
  used in the MSI driver.  Perhaps it should be, but we still want the raw
  physical address to pass on to VFIO.
 
 I don't know what msiir is used for, but if it's an address you
 program into a PCI device, then it's a dma_addr_t even if you didn't
 get it from the DMA API.  Maybe bus_addr_t would have been a more
 suggestive name than dma_addr_t.  That said, I have no idea how this
 relates to VFIO.

It's a bit awkward because it gets used both as something to program
into a PCI device (and it's probably a bug that the DMA API doesn't get
used), and also (if I understand the current plans correctly) as a
physical address to give to VFIO to be a destination address in an IOMMU
mapping.  So I think the value we keep here should be a phys_addr_t (it
comes straight from the MMIO address in the device tree), which gets
trivially turned into a dma_addr_t by the non-VFIO code path because
there's currently no translation there.

-Scott



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3 v13] iommu/fsl: Add additional iommu attributes required by the PAMU driver.

2013-04-22 Thread Scott Wood

On 04/22/2013 12:31:55 AM, Varun Sethi wrote:

Added the following domain attributes for the FSL PAMU driver:
1. Added new iommu stash attribute, which allows setting of the
   LIODN specific stash id parameter through IOMMU API.
2. Added an attribute for enabling/disabling DMA to a particular
   memory window.
3. Added domain attribute to check for PAMUV1 specific constraints.

Signed-off-by: Varun Sethi varun.se...@freescale.com
---
v13 changes:
- created a new file include/linux/fsl_pamu_stash.h for stash
attributes.
v12 changes:
- Moved PAMU specifc stash ids and structures to PAMU header file.
- no change in v11.
- no change in v10.
 include/linux/fsl_pamu_stash.h |   39  
+++

 include/linux/iommu.h  |   16 
 2 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/fsl_pamu_stash.h

diff --git a/include/linux/fsl_pamu_stash.h  
b/include/linux/fsl_pamu_stash.h

new file mode 100644
index 000..caa1b21
--- /dev/null
+++ b/include/linux/fsl_pamu_stash.h
@@ -0,0 +1,39 @@
+/*
+ * This program is free software; you can redistribute it and/or  
modify
+ * it under the terms of the GNU General Public License, version 2,  
as

+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA   
02110-1301, USA.

+ *
+ * Copyright (C) 2013 Freescale Semiconductor, Inc.
+ *
+ */
+
+#ifndef __FSL_PAMU_STASH_H
+#define __FSL_PAMU_STASH_H
+
+/* cache stash targets */
+enum pamu_stash_target {
+   PAMU_ATTR_CACHE_L1 = 1,
+   PAMU_ATTR_CACHE_L2,
+   PAMU_ATTR_CACHE_L3,
+};
+
+/*
+ * This attribute allows configuring stashig specific parameters
+ * in the PAMU hardware.
+ */
+
+struct pamu_stash_attribute {
+   u32 cpu;/* cpu number */
+   u32 cache;  /* cache to stash to: L1,L2,L3 */
+};
+
+#endif  /* __FSL_PAMU_STASH_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2727810..c5dc2b9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -57,10 +57,26 @@ struct iommu_domain {
 #define IOMMU_CAP_CACHE_COHERENCY  0x1
 #define IOMMU_CAP_INTR_REMAP		0x2	/* isolates device  
intrs */


+/*
+ * Following constraints are specifc to PAMUV1:


FSL_PAMUV1


+ *  -aperture must be power of 2, and naturally aligned
+ *  -number of windows must be power of 2, and address space size
+ *   of each window is determined by aperture size / # of windows
+ *  -the actual size of the mapped region of a window must be power
+ *   of 2 starting with 4KB and physical address must be naturally
+ *   aligned.
+ * DOMAIN_ATTR_FSL_PAMUV1 corresponds to the above mentioned  
contraints.
+ * The caller can invoke iommu_domain_get_attr to check if the  
underlying

+ * iommu implementation supports these constraints.
+ */
+
 enum iommu_attr {
DOMAIN_ATTR_GEOMETRY,
DOMAIN_ATTR_PAGING,
DOMAIN_ATTR_WINDOWS,
+   DOMAIN_ATTR_PAMU_STASH,
+   DOMAIN_ATTR_PAMU_ENABLE,
+   DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_MAX,


Please be consistent on whether PAMU gets an FSL_ namespace prefix  
(I'd prefer that it does).


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc (v3)

2013-04-11 Thread Scott Wood

On 04/11/2013 07:56:59 AM, Joerg Roedel wrote:

On Tue, Apr 09, 2013 at 01:22:15AM +, Yoder Stuart-B08248 wrote:
  What happens if a normal unmap call is done on the MSI iova?  Do  
we

  need a separate unmap?

 I was thinking a normal unmap on an MSI windows would be an  
error...but
 I'm not set on that.   I put the msi unmap there to make things  
symmetric,
 a normal unmap would work as well...and then we could drop the msi  
unmap.


Hmm, this API semantic isn't very clean. When you explicitly map the  
MSI

banks a clean API would also allow to unmap them. But that is not
possible in your design because the kernel is responsible for mapping
MSIs and you can't unmap a MSI bank that is in use by the kernel.


Why is it not possible to unmap them?  Once they've been mapped,  
they're just like any other IOMMU mapping.  If the user breaks MSI for  
their own devices by unmapping the MSI page, that's their problem.


So since the kernel owns the MSI setup anyways it should also take  
care

of mapping the MSI banks. What is the reason to not let the kernel
allocate the MSI banks top-down from the end of the DMA window space?


It's less flexible, and possibly more complicated.

-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc (v3)

2013-04-05 Thread Scott Wood

On 04/04/2013 05:10:27 PM, Yoder Stuart-B08248 wrote:

/*
 * VFIO_IOMMU_PAMU_UNMAP_MSI_BANK
 *
 * Unmaps the MSI bank at the specified iova.
 * Caller provides struct vfio_pamu_msi_bank_unmap with all fields  
set.

 * Operates on VFIO file descriptor (/dev/vfio/vfio).
 * Return: 0 on success, -errno on failure
 */

struct vfio_pamu_msi_bank_unmap {
__u32   argsz;
__u32   flags; /* no flags currently */
__u64   iova;  /* the iova to be unmapped to */
};
#define VFIO_IOMMU_PAMU_UNMAP_MSI_BANK  _IO(VFIO_TYPE, VFIO_BASE + x,
struct vfio_pamu_msi_bank_unmap )


What happens if a normal unmap call is done on the MSI iova?  Do we  
need a separate unmap?


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/03/2013 01:32:26 PM, Stuart Yoder wrote:
On Tue, Apr 2, 2013 at 5:50 PM, Scott Wood scottw...@freescale.com  
wrote:

 On 04/02/2013 04:38:45 PM, Alex Williamson wrote:

 On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
  VFIO_IOMMU_MAP_MSI(iova, size)


 Not sure how you mean size to be used -- for MPIC it would be 4K  
per bank,
 and you can only map one bank at a time (which bank you're mapping  
should be
 a parameter, if only so that the kernel doesn't have to keep  
iteration state

 for you).

The intent was for user space to tell the kernel which windows to use
for MSI.   So I envisioned a total size of window-size *  
msi-bank-count.


Size doesn't tell the kernel *which* banks to use, only how many.  If  
it already knows which banks are used by the group, then it also knows  
how many are used.  And size is misleading because the mapping is not  
generally going to be contiguous.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/03/2013 02:09:45 PM, Stuart Yoder wrote:

 Would is be possible for userspace to simply leave room for MSI bank
 mapping (how much room could be determined by something like
 VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace  
can

 DMA_MAP starting at the 0x0 address of the aperture, growing up, and
 VFIO will map banks on demand at the top of the aperture, growing  
down?

 Wouldn't that avoid a lot of issues with userspace needing to know
 anything about MSI banks (other than count) and coordinating irq  
numbers

 and enabling handlers?

This is basically option #A in the original proposals sent.   I like
this approach, in that it
is simpler and keeps user space mostly out of this...which is
consistent with how things are done
on x86.  User space just needs to know how many windows to leave at
the top of the aperture.
The kernel then has the flexibility to use those windows how it wants.

But one question, is when should the kernel actually map (and unmap)
the MSI banks.


I think userspace should explicitly request it.  Userspace still  
wouldn't need to know anything but the count:


count = VFIO_IOMMU_GET_MSI_BANK_COUNT
VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
// do other DMA maps now, or later, or not at all, doesn't matter
for (i = 0; i  count; i++)
VFIO_IOMMU_MAP_MSI_BANK(iova, i);
// The kernel now knows where each bank has been mapped, and can update  
PCI config space appropriately.



One thing we need to do is enable the aperture...and current
thinking is that is done on the first DMA_MAP.


What if there are no other mappings required?

-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/03/2013 02:43:06 PM, Stuart Yoder wrote:
On Wed, Apr 3, 2013 at 2:18 PM, Scott Wood scottw...@freescale.com  
wrote:

 On 04/03/2013 02:09:45 PM, Stuart Yoder wrote:

  Would is be possible for userspace to simply leave room for MSI  
bank

  mapping (how much room could be determined by something like
  VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that  
userspace can
  DMA_MAP starting at the 0x0 address of the aperture, growing up,  
and
  VFIO will map banks on demand at the top of the aperture,  
growing down?
  Wouldn't that avoid a lot of issues with userspace needing to  
know
  anything about MSI banks (other than count) and coordinating irq  
numbers

  and enabling handlers?

 This is basically option #A in the original proposals sent.   I  
like

 this approach, in that it
 is simpler and keeps user space mostly out of this...which is
 consistent with how things are done
 on x86.  User space just needs to know how many windows to leave at
 the top of the aperture.
 The kernel then has the flexibility to use those windows how it  
wants.


 But one question, is when should the kernel actually map (and  
unmap)

 the MSI banks.


 I think userspace should explicitly request it.  Userspace still  
wouldn't

 need to know anything but the count:

 count = VFIO_IOMMU_GET_MSI_BANK_COUNT
 VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
 VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
 // do other DMA maps now, or later, or not at all, doesn't matter
 for (i = 0; i  count; i++)
 VFIO_IOMMU_MAP_MSI_BANK(iova, i);
 // The kernel now knows where each bank has been mapped, and can  
update PCI

 config space appropriately.

And the overall aperture enable/disable would occur on the first
dma/msi map() and last dma/msi unmap()?


Yes.  We may want the optional ability to do an overall enable/disable  
for reasons we discussed a while ago, but in the absence of an explicit  
disable the domain would be enabled on first map.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/02/2013 10:37:20 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 17:50 -0500, Scott Wood wrote:
 On 04/02/2013 04:38:45 PM, Alex Williamson wrote:
  On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
   On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood
  scottw...@freescale.com wrote:
C.  Explicit mapping using normal DMA map.  The last  
idea

  is that
we would introduce a new ioctl to give user-space  
an fd

  to
the MSI bank, which could be mmapped.  The flow  
would be

something like this:
   -for each group user space calls new ioctl
 VFIO_GROUP_GET_MSI_FD
   -user space mmaps the fd, getting a vaddr
   -user space does a normal DMA map for desired  
iova
This approach makes everything explicit, but adds a  
new

  ioctl
applicable most likely only to the PAMU (type2  
iommu).

   
And the DMA_MAP of that mmap then allows userspace to select  
the

  window
used?  This one seems like a lot of overhead, adding a new
  ioctl, new
fd, mmap, special mapping path, etc.
   
   
There's going to be special stuff no matter what.  This would
  keep it
separated from the IOMMU map code.
   
I'm not sure what you mean by overhead here... the runtime
  overhead of
setting things up is not particularly relevant as long as it's
  reasonable.
If you mean development and maintenance effort, keeping things
  well
separated should help.
  
   We don't need to change DMA_MAP.  If we can simply add a new  
type

  2
   ioctl that allows user space to set which windows are MSIs, it
  seems vastly
   less complex than an ioctl to supply a new fd, mmap of it, etc.
  
   So maybe 2 ioctls:
   VFIO_IOMMU_GET_MSI_COUNT

 Do you mean a count of actual MSIs or a count of MSI banks used by  
the

 whole VFIO group?

I hope the latter, which would clarify how this is distinct from
DEVICE_GET_IRQ_INFO.  Is hotplug even on the table?  Presumably
dynamically adding a device could bring along additional MSI banks?


I'm not sure -- maybe we could say that hotplug can add banks, but not  
remove them or change the order, so userspace would just need to check  
if the number of banks changed, and map the extras.


The current VFIO MSI support has the host handling everything about  
MSI.

The user never programs an MSI vector to the physical device, they set
up everything through ioctl.  On interrupt, we simply trigger an  
eventfd
and leave it to things like KVM irqfd or QEMU to do the right thing  
in a

virtual machine.

Here the MSI vector has to go through a PAMU window to hit the correct
MSI bank.  So that means it has some component of the iova involved,
which we're proposing here is controlled by userspace (whether that
vector uses an offset from 0x1000 or 0x depending on which
window slot is used to make the MSI bank).  I assume we're still  
working

in a model where the physical interrupt fires into the host and a
host-based interrupt handler triggers an eventfd, right?


Yes (subject to possible future optimizations).

So that means the vector also has host components so we trigger the  
correct ISR.  How

is that coordinated?


Everything but the iova component needs to come from the host MSI  
allocator.



Would is be possible for userspace to simply leave room for MSI bank
mapping (how much room could be determined by something like
VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace  
can

DMA_MAP starting at the 0x0 address of the aperture, growing up, and
VFIO will map banks on demand at the top of the aperture, growing  
down?

Wouldn't that avoid a lot of issues with userspace needing to know
anything about MSI banks (other than count) and coordinating irq  
numbers

and enabling handlers?


This would restrict a (possibly unlikely) use case where the user wants  
to map something near the top of the aperture but has another place  
MSIs can go (or is willing to live without MSIs).  Otherwise it could  
be workable, as long as we can require an explicit MSI enabling on a  
device to happen after the aperture and subwindow count are set up.   
I'm not sure it would really buy anything over having userspace iterate  
over the MSI bank count, though -- it would probably be a bit more  
complicated.



  On x86 MSI count is very
  device specific, which means it wold be a VFIO_DEVICE_* ioctl
  (actually
  VFIO_DEVICE_GET_IRQ_INFO does this for us on x86).  The trouble  
with

  it
  being a device ioctl is that you need to get the device FD, but  
the
  IOMMU protection needs to be established before you can get  
that... so

  there's an ordering problem if you need it from the device before
  configuring the IOMMU.  Thanks,

 What do you mean by IOMMU protection needs to be established?
 Wouldn't we just start with no mappings in place?

If no mappings blocks all DMA, sure, that's fine.  Once the VFIO  
device

FD is accessible

Re: RFC: vfio API changes needed for powerpc

2013-04-03 Thread Scott Wood

On 04/02/2013 10:12:31 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 17:44 -0500, Scott Wood wrote:
 On 04/02/2013 04:32:04 PM, Alex Williamson wrote:
  On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote:
   On 04/02/2013 03:32:17 PM, Alex Williamson wrote:
On x86 the interrupt remapper handles this transparently when  
MSI
is enabled and userspace never gets direct access to the  
device

  MSI
address/data registers.
  
   x86 has a totally different mechanism here, as far as I  
understand

  --
   even before you get into restrictions on mappings.
 
  So what control will userspace have over programming the actually  
MSI

  vectors on PAMU?

 Not sure what you mean -- PAMU doesn't get explicitly involved in
 MSIs.  It's just another 4K page mapping (per relevant MSI bank).   
If

 you want isolation, you need to make sure that an MSI group is only
 used by one VFIO group, and that you're on a chip that has alias  
pages

 with just one MSI bank register each (newer chips do, but the first
 chip to have a PAMU didn't).

How does a user figure this out?


The user's involvement could be limited to setting a policy knob of  
whether that degree of isolation is required (if required and  
unavailable, all devices using an MSI bank would be forced into the  
same group).  We'd need to do something with MSI allocation so that we  
avoid using an MSI bank with more than one IOMMU group where possible.   
I'm not sure about the details yet, or how practical this is.  There  
might need to be some MSI bank assignment done as part of the VFIO  
device binding process, if there are going to be more VFIO groups than  
there are MSI banks (reserving one bank for host use).


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 03:32:17 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 17:32 +, Yoder Stuart-B08248 wrote:
 2.   MSI window mappings

The more problematic question is how to deal with MSIs.  We need  
to
create mappings for up to 3 MSI banks that a device may need to  
target
to generate interrupts.  The Linux MSI driver can allocate MSIs  
from
the 3 banks any way it wants, and currently user space has no  
way of

knowing which bank may be used for a given device.

There are 3 options we have discussed and would like your  
direction:


A.  Implicit mappings -- with this approach user space would not
explicitly map MSIs.  User space would be required to set the
geometry so that there are 3 unused windows (the last 3  
windows)
for MSIs, and it would be up to the kernel to create the  
mappings.
This approach requires some specific semantics (leaving 3  
windows)
and it potentially gets a little weird-- when should the  
kernel
actually create the MSI mappings?  When should they be  
unmapped?

Some convention would need to be established.

VFIO would have control of SET/GET_ATTR, right?  So we could reduce  
the
number exposed to userspace on GET and transparently add MSI entries  
on

SET.


What do you mean by reduce the number exposed?  Userspace decides how  
many entries there are, but it must be a power of two beteen 1 and 256.



On x86 the interrupt remapper handles this transparently when MSI
is enabled and userspace never gets direct access to the device MSI
address/data registers.


x86 has a totally different mechanism here, as far as I understand --  
even before you get into restrictions on mappings.



What kind of restrictions do you have around
adding and removing windows while the aperture is enabled?


Subwindows can be modified while the aperture is enabled, but the  
aperture size and number of subwindows cannot be changed.



B.  Explicit mapping using DMA map flags.  The idea is that a new
flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
a mapping is to be created for the supplied iova.  No vaddr
is given though.  So in the above example there would be a
a dma map at 0x1000 for 24KB (and no vaddr).   It's
up to the kernel to determine which bank gets mapped where.
So, this option puts user space in control of which windows
are used for MSIs and when MSIs are mapped/unmapped.   There
would need to be some semantics as to how this is used-- it
only makes sense

This could also be done as another type2 ioctl extension.


Again, what is type2, specifically?  If someone else is adding their  
own IOMMU that is kind of, sort of like PAMU, how would they know if  
it's close enough?  What assumptions can a user make when they see that  
they're dealing with type2?


What's the value to userspace in determining which windows are used  
by which banks?


That depends on who programs the MSI config space address.  What is  
important is userspace controlling which iovas will be dedicated to  
this, in case it wants to put something else there.


It sounds like the case that there are X banks and if userspace wants  
to

use MSI it needs to leave X windows available for that.  Is this just
buying userspace a few more windows to allow them the choice between  
MSI

or RAM?


Well, there could be that.  But also, userspace will generally have a  
much better idea of the type of mappings it's creating, so it's easier  
to keep everything explicit at the kernel/user interface than require  
more complicated code in the kernel to figure things out automatically  
(not just for MSIs but in general).


If the kernel automatically creates the MSI mappings, when does it  
assume that userspace is done creating its own?  What if userspace  
doesn't need any DMA other than the MSIs?  What if userspace wants to  
continue dynamically modifying its other mappings?



C.  Explicit mapping using normal DMA map.  The last idea is that
we would introduce a new ioctl to give user-space an fd to
the MSI bank, which could be mmapped.  The flow would be
something like this:
   -for each group user space calls new ioctl  
VFIO_GROUP_GET_MSI_FD

   -user space mmaps the fd, getting a vaddr
   -user space does a normal DMA map for desired iova
This approach makes everything explicit, but adds a new ioctl
applicable most likely only to the PAMU (type2 iommu).

And the DMA_MAP of that mmap then allows userspace to select the  
window

used?  This one seems like a lot of overhead, adding a new ioctl, new
fd, mmap, special mapping path, etc.


There's going to be special stuff no matter what.  This would keep it  
separated from the IOMMU map code.


I'm not sure what you mean by overhead here... the runtime overhead  
of setting things up is not particularly relevant as long 

Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 03:38:42 PM, Stuart Yoder wrote:
On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com  
wrote:

 On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote:

 Alex,

 We are in the process of implementing vfio-pci support for the  
Freescale
 IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite  
different

 than x86, and will involve creating a 'type 2' vfio implementation.

 For each device's DMA mappings, PAMU has an overall aperture and a  
number

 of windows.  All sizes and window counts must be power of 2.  To
 illustrate,
 below is a mapping for a 256MB guest, including guest memory  
(backed by

 64MB huge pages) and some windows for MSIs:

 Total aperture: 512MB
 # of windows: 8

 win gphys/
 #   iovaphys  size
 ---   
 0   0x  0xX_XX00  64MB
 1   0x0400  0xX_XX00  64MB
 2   0x0800  0xX_XX00  64MB
 3   0x0C00  0xX_XX00  64MB
 4   0x1000  0xf_fe044000  4KB// msi bank 1
 5   0x1400  0xf_fe045000  4KB// msi bank 2
 6   0x1800  0xf_fe046000  4KB// msi bank 3
 7- -  disabled

 There are a couple of updates needed to the vfio user-kernel  
interface

 that we would like your feedback on.

 1.  IOMMU geometry

The kernel IOMMU driver now has an interface (see  
domain_set_attr,

domain_get_attr) that lets us set the domain geometry using
attributes.

We want to expose that to user space, so envision needing a  
couple

of new ioctls to do this:
 VFIO_IOMMU_SET_ATTR
 VFIO_IOMMU_GET_ATTR


 Note that this means attributes need to be updated for user-API
 appropriateness, such as using fixed-size types.


 2.   MSI window mappings

The more problematic question is how to deal with MSIs.  We  
need to
create mappings for up to 3 MSI banks that a device may need to  
target
to generate interrupts.  The Linux MSI driver can allocate MSIs  
from
the 3 banks any way it wants, and currently user space has no  
way of

knowing which bank may be used for a given device.

There are 3 options we have discussed and would like your  
direction:


A.  Implicit mappings -- with this approach user space would not
explicitly map MSIs.  User space would be required to set  
the
geometry so that there are 3 unused windows (the last 3  
windows)



 Where does userspace get the number 3 from?  E.g. on newer chips  
there are

 4 MSI banks.  Maybe future chips have even more.

Ok, then make the number 4.   The chance of more MSI banks in future  
chips

is nil,


What makes you so sure?  Especially since you seem to be presenting  
this as not specifically an MPIC API.



and if it ever happened user space could adjust.


What bit of API is going to tell it that it needs to adjust?

Also, practically speaking since memory is typically allocate in  
powers of

2 way you need to approximately double the window geometry anyway.


Only if your existing mapping needs fit exactly in a power of two.

B.  Explicit mapping using DMA map flags.  The idea is that a  
new

flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
a mapping is to be created for the supplied iova.  No vaddr
is given though.  So in the above example there would be a
a dma map at 0x1000 for 24KB (and no vaddr).


 A single 24 KiB mapping wouldn't work (and why 24KB? What if only  
one MSI

 group is involved in this VFIO group?  What if four MSI groups are
 involved?).  You'd need to either have a naturally aligned,  
power-of-two
 sized mapping that covers exactly the pages you want to map and no  
more, or
 you'd need to create a separate mapping for each MSI bank, and due  
to PAMU
 subwindow alignment restrictions these mappings could not be  
contiguous in

 iova-space.

You're right, a single 24KB mapping wouldn't work--  in the case of 3  
MSI banks

perhaps we could just do one 64MB*3 mapping to identify which windows
are used for MSIs.


Where did the assumption of a 64MiB subwindow size come from?

If only one MSI bank was involved the kernel could get clever and  
only enable

the banks actually needed.


I'd rather see cleverness kept in userspace.

-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 04:08:27 PM, Stuart Yoder wrote:
On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com  
wrote:

 This could also be done as another type2 ioctl extension.


 Again, what is type2, specifically?  If someone else is adding  
their own
 IOMMU that is kind of, sort of like PAMU, how would they know if  
it's close
 enough?  What assumptions can a user make when they see that  
they're dealing

 with type2?

We will define that as part of the type2 implementation.   Highly  
unlikely

anything but a PAMU will comply.


So then why not just call it pamu instead of being obfuscatory?

 There's going to be special stuff no matter what.  This would keep  
it

 separated from the IOMMU map code.

 I'm not sure what you mean by overhead here... the runtime  
overhead of
 setting things up is not particularly relevant as long as it's  
reasonable.

 If you mean development and maintenance effort, keeping things well
 separated should help.

We don't need to change DMA_MAP.  If we can simply add a new type 2
ioctl that allows user space to set which windows are MSIs,


And what specifically does that ioctl do?  It causes new mappings to be  
created, right?  So you're changing (or at least adding to) the DMA map  
mechanism.


it seems vastly less complex than an ioctl to supply a new fd, mmap  
of it, etc.


I don't see enough complexity in the mmap approach for anything to be  
vastly less complex in comparison.  I think you're building the mmap  
approach up in your head to be a lot worse that it would actually be.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 04:16:11 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 15:54 -0500, Stuart Yoder wrote:
 The number of windows is always power of 2 (and max is 256).  And  
to reduce

 PAMU cache pressure you want to use the fewest number of windows
 you can.So, I don't see practically how we could transparently
 steal entries to
 add the MSIs. Either user space knows to leave empty windows for
 MSIs and by convention the kernel knows which windows those are (as
 in option #A) or explicitly tell the kernel which windows (as in  
option #B).


Ok, apparently I don't understand the API.  Is it something like
userspace calls GET_ATTR and finds out that there are 256 available
windows, userspace determines that it needs 8 for RAM and then it has  
an

MSI device, so it needs to call SET_ATTR and ask for 16?  That seems
prone to exploitation by the first userspace to allocate it's  
aperture,


What exploitation?

It's not as if there is a pool of 256 global windows that users  
allocate from.  The subwindow count is just how finely divided the  
aperture is.  The only way one user will affect another is through  
cache contention (which is why we want the minimum number of subwindows  
that we can get away with).



but I'm also not sure why userspace could specify the (non-power of 2)
number of windows it needs for RAM, then VFIO would see that the  
devices

attached have MSI and add those windows and align to a power of 2.


If you double the subwindow count without userspace knowing, you have  
to double the aperture as well (and you may need to grow up or down  
depending on alignment).  This means you also need to halve the maximum  
aperture that userspace can request.  And you need to expose a  
different number of maximum subwindows in the IOMMU API based on  
whether we might have MSIs of this type.  It's ugly and awkward, and  
removes the possibility for userspace to place the MSIs in some unused  
slot in the middle, or not use MSIs at all.


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc

2013-04-02 Thread Scott Wood

On 04/02/2013 04:38:45 PM, Alex Williamson wrote:

On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
 On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood  
scottw...@freescale.com wrote:
  C.  Explicit mapping using normal DMA map.  The last idea  
is that
  we would introduce a new ioctl to give user-space an fd  
to

  the MSI bank, which could be mmapped.  The flow would be
  something like this:
 -for each group user space calls new ioctl
   VFIO_GROUP_GET_MSI_FD
 -user space mmaps the fd, getting a vaddr
 -user space does a normal DMA map for desired iova
  This approach makes everything explicit, but adds a new  
ioctl

  applicable most likely only to the PAMU (type2 iommu).
 
  And the DMA_MAP of that mmap then allows userspace to select the  
window
  used?  This one seems like a lot of overhead, adding a new  
ioctl, new

  fd, mmap, special mapping path, etc.
 
 
  There's going to be special stuff no matter what.  This would  
keep it

  separated from the IOMMU map code.
 
  I'm not sure what you mean by overhead here... the runtime  
overhead of
  setting things up is not particularly relevant as long as it's  
reasonable.
  If you mean development and maintenance effort, keeping things  
well

  separated should help.

 We don't need to change DMA_MAP.  If we can simply add a new type  
2
 ioctl that allows user space to set which windows are MSIs, it  
seems vastly

 less complex than an ioctl to supply a new fd, mmap of it, etc.

 So maybe 2 ioctls:
 VFIO_IOMMU_GET_MSI_COUNT


Do you mean a count of actual MSIs or a count of MSI banks used by the  
whole VFIO group?



 VFIO_IOMMU_MAP_MSI(iova, size)


Not sure how you mean size to be used -- for MPIC it would be 4K per  
bank, and you can only map one bank at a time (which bank you're  
mapping should be a parameter, if only so that the kernel doesn't have  
to keep iteration state for you).



How are MSIs related to devices on PAMU?


PAMU doesn't care about MSIs.  The relation of individual MSIs to a  
device is standard PCI stuff.  Each MSI bank (which is part of the  
MPIC, not PAMU) can hold numerous MSIs.  The VFIO user would want to  
map all MSI banks that are in use by any of the devices in the group.   
Ideally we'd let the VFIO grouping influence the allocation of MSIs.



On x86 MSI count is very
device specific, which means it wold be a VFIO_DEVICE_* ioctl  
(actually
VFIO_DEVICE_GET_IRQ_INFO does this for us on x86).  The trouble with  
it

being a device ioctl is that you need to get the device FD, but the
IOMMU protection needs to be established before you can get that... so
there's an ordering problem if you need it from the device before
configuring the IOMMU.  Thanks,


What do you mean by IOMMU protection needs to be established?   
Wouldn't we just start with no mappings in place?


-Scott
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu