[PATCH] dma-debug: Only skip one stackframe entry
With skip set to 1, I get a traceback like this: [ 106.867637] DMA-API: Mapped at: [ 106.870784] afu_dma_map_region+0x2cd/0x4f0 [dfl_afu] [ 106.875839] afu_ioctl+0x258/0x380 [dfl_afu] [ 106.880108] do_vfs_ioctl+0xa9/0x720 [ 106.883688] ksys_ioctl+0x60/0x90 [ 106.887007] __x64_sys_ioctl+0x16/0x20 With the previous value of 2, afu_dma_map_region was being omitted. I suspect that the code paths have simply changed since the value of 2 was chosen a decade ago, but it's also possible that it varies based on which mapping function was used, compiler inlining choices, etc. In any case, it's best to err on the side of skipping less. Signed-off-by: Scott Wood --- kernel/dma/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 45d51e8e26f6..a218e43cc382 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -706,7 +706,7 @@ static struct dma_debug_entry *dma_entry_alloc(void) #ifdef CONFIG_STACKTRACE entry->stacktrace.max_entries = DMA_DEBUG_STACKTRACE_ENTRIES; entry->stacktrace.entries = entry->st_entries; - entry->stacktrace.skip = 2; + entry->stacktrace.skip = 1; save_stack_trace(>stacktrace); #endif -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 16/20] powerpc/dma: use dma_direct_{alloc,free}
On Thu, 2018-08-09 at 10:52 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2018-07-30 at 18:38 +0200, Christoph Hellwig wrote: > > These do the same functionality as the existing helpers, but do it > > simpler, and also allow the (optional) use of CMA. > > > > Note that the swiotlb code now calls into the dma_direct code directly, > > given that it doesn't work with noncoherent caches at all, and isn't > > called > > when we have an iommu either, so the iommu special case in > > dma_nommu_alloc_coherent isn't required for swiotlb. > > I am not convinced that this will produce the same results due to > the way the zone picking works. > > As for the interaction with swiotlb, we'll need the FSL guys to have > a look. Scott, do you remember what this is about ? dma_direct_alloc() has similar (though not identical[1]) zone picking, so I think it will work. Needs testing though, and I no longer have a book3e machine with a PCIe card in it. The odd thing about this platform (fsl book3e) is the 31-bit[2] limitation on PCI. We currently use ZONE_DMA32 for this, rather than ZONE_DMA, at Ben's request[3]. dma_direct_alloc() regards ZONE_DMA32 as being fixed at 32-bits, but it doesn't really matter as long as limit_zone_pfn() still works, and the allocation is made below 2 GiB. If we were to switch to ZONE_DMA, and have both 31-bit and 32-bit zones, then dma_direct_alloc() would have a problem knowing when to use the 31-bit zone since it's based on a non-power-of-2 limit that isn't reflected in the dma mask. -Scott [1] The logic in dma_direct_alloc() seems wrong -- the zone should need to fit in the mask, not the other way around. If ARCH_ZONE_DMA_BITS is 24, then 0x007f should be a failure rather than GFP_DMA, 0x7fff should be GFP_DMA rather than GFP_DMA32, and 0x3 should be GFP_DMA32 rather than an unrestricted allocation (in each case assuming that the end of RAM is beyond the mask). [2] The actual limit is closer to 4 GiB, but not quite due to special windows. swiotlb still uses the real limit when deciding whether to bounce, so the dma mask is still 32 bits. [3] https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-July/099593.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 00/10 v2] iommu/amd: lock splitting & GFP_KERNEL allocation
On Mon, 2018-03-19 at 13:15 +0100, Sebastian Andrzej Siewior wrote: > On 2018-03-17 16:43:39 [-0500], Scott Wood wrote: > > If that's worth the lock dropping then fine (though why does only > > one > > of the two allocations use GFP_KERNEL?), but it doesn't need to be > > a > > That was a mistake, I planned to keep both as GFP_KERNEL. > > > raw lock if the non-allocating users are separated. Keeping them > > separate will also preserve the WARNs if we somehow end up in an > > atomic > > context with no table (versus relying on atomic sleep debugging > > that > > may or may not be enabled), and make the code easier to understand > > by > > being explicit about which functions can be used from RT-atomic > > context. > > That separated part is okay. We could keep it. However, I am not sure > if > looking at the table irq_lookup_table[devid] without the lock is > okay. > The pointer is assigned without DTE entry/iommu-flush to be > completed. > This does not look "okay". Those callers are getting the devid from an irq_2_irte struct, which was set up in irq_remapping_alloc() after get/alloc_irq_table() is completed. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 00/10 v2] iommu/amd: lock splitting & GFP_KERNEL allocation
On Sat, 2018-03-17 at 22:10 +0100, Sebastian Andrzej Siewior wrote: > On 2018-03-17 14:49:54 [-0500], Scott Wood wrote: > > On Fri, 2018-03-16 at 21:18 +0100, Sebastian Andrzej Siewior wrote: > > > The goal here is to make the memory allocation in get_irq_table() > > > not > > > with disabled interrupts and having as little raw_spin_lock as > > > possible > > > while having them if the caller is also holding one (like desc- > > > >lock > > > during IRQ-affinity changes). > > > I reverted one patch one patch in the iommu while rebasing since > > > it > > > make job easier. > > > > If the goal is to have "as little raw_spin_lock as possible" -- and > > presumably also to avoid unnecessary complexity -- wouldn't it be > > better to leave my patch in, and drop patches 4 and 9? > > 9 gives me GFP_KERNEL instead atomic so no. > 4 is needed I think but I could double check on Monday. If that's worth the lock dropping then fine (though why does only one of the two allocations use GFP_KERNEL?), but it doesn't need to be a raw lock if the non-allocating users are separated. Keeping them separate will also preserve the WARNs if we somehow end up in an atomic context with no table (versus relying on atomic sleep debugging that may or may not be enabled), and make the code easier to understand by being explicit about which functions can be used from RT-atomic context. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 00/10 v2] iommu/amd: lock splitting & GFP_KERNEL allocation
On Fri, 2018-03-16 at 21:18 +0100, Sebastian Andrzej Siewior wrote: > The goal here is to make the memory allocation in get_irq_table() not > with disabled interrupts and having as little raw_spin_lock as > possible > while having them if the caller is also holding one (like desc->lock > during IRQ-affinity changes). > I reverted one patch one patch in the iommu while rebasing since it > make job easier. If the goal is to have "as little raw_spin_lock as possible" -- and presumably also to avoid unnecessary complexity -- wouldn't it be better to leave my patch in, and drop patches 4 and 9? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/amd: Avoid locking get_irq_table() from atomic context
get_irq_table() previously acquired amd_iommu_devtable_lock which is not a raw lock, and thus cannot be acquired from atomic context on PREEMPT_RT. Many calls to modify_irte*() come from atomic context due to the IRQ desc->lock, as does amd_iommu_update_ga() due to the preemption disabling in vcpu_load/put(). The only difference between calling get_irq_table() and reading from irq_lookup_table[] directly, other than the lock acquisition and amd_iommu_rlookup_table[] check, is if the table entry is unpopulated, which should never happen when looking up a devid that came from an irq_2_irte struct, as get_irq_table() would have already been called on that devid during irq_remapping_alloc(). The lock acquisition is not needed in these cases because entries in irq_lookup_table[] never change once non-NULL -- nor would the amd_iommu_devtable_lock usage in get_irq_table() provide meaningful protection if they did, since it's released before using the looked up table in the get_irq_table() caller. Rename the old get_irq_table() to alloc_irq_table(), and create a new lockless get_irq_table() to be used in non-allocating contexts that WARNs if it doesn't find what it's looking for. Signed-off-by: Scott Wood <sw...@redhat.com> --- v2: Added new get_irq_table() with WARNs rather than accessing irq_lookup_table[] directly. --- drivers/iommu/amd_iommu.c | 29 ++--- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index e4026133aa1d..5d41e0733cb3 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3594,7 +3594,22 @@ static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table) amd_iommu_dev_table[devid].data[2] = dte; } -static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic) +static struct irq_remap_table *get_irq_table(u16 devid) +{ + struct irq_remap_table *table; + + if (WARN_ONCE(!amd_iommu_rlookup_table[devid], + "%s: no iommu for devid %x\n", __func__, devid)) + return NULL; + + table = irq_lookup_table[devid]; + if (WARN_ONCE(!table, "%s: no table for devid %x\n", __func__, devid)) + return NULL; + + return table; +} + +static struct irq_remap_table *alloc_irq_table(u16 devid, bool ioapic) { struct irq_remap_table *table = NULL; struct amd_iommu *iommu; @@ -3681,7 +3696,7 @@ static int alloc_irq_index(u16 devid, int count, bool align) if (!iommu) return -ENODEV; - table = get_irq_table(devid, false); + table = alloc_irq_table(devid, false); if (!table) return -ENODEV; @@ -3732,7 +3747,7 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte, if (iommu == NULL) return -EINVAL; - table = get_irq_table(devid, false); + table = get_irq_table(devid); if (!table) return -ENOMEM; @@ -3765,7 +3780,7 @@ static int modify_irte(u16 devid, int index, union irte *irte) if (iommu == NULL) return -EINVAL; - table = get_irq_table(devid, false); + table = get_irq_table(devid); if (!table) return -ENOMEM; @@ -3789,7 +3804,7 @@ static void free_irte(u16 devid, int index) if (iommu == NULL) return; - table = get_irq_table(devid, false); + table = get_irq_table(devid); if (!table) return; @@ -4107,7 +4122,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq, return ret; if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) { - if (get_irq_table(devid, true)) + if (alloc_irq_table(devid, true)) index = info->ioapic_pin; else ret = -ENOMEM; @@ -4390,7 +4405,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data) if (!iommu) return -ENODEV; - irt = get_irq_table(devid, false); + irt = get_irq_table(devid); if (!irt) return -ENODEV; -- 2.14.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Don't use dev_data in irte_ga_set_affinity()
search_dev_data() acquires a non-raw lock, which can't be done from atomic context on PREEMPT_RT. There is no need to look at dev_data because guest_mode should never be set if use_vapic is not set. Signed-off-by: Scott Wood <sw...@redhat.com> --- This is a followup to the patches below: https://patchwork.codeaurora.org/patch/433611/ https://patchwork.codeaurora.org/patch/433613/ drivers/iommu/amd_iommu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 84e99097dfe3..a933c26df652 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3861,10 +3861,8 @@ static void irte_ga_set_affinity(void *entry, u16 devid, u16 index, u8 vector, u32 dest_apicid) { struct irte_ga *irte = (struct irte_ga *) entry; - struct iommu_dev_data *dev_data = search_dev_data(devid); - if (!dev_data || !dev_data->use_vapic || - !irte->lo.fields_remap.guest_mode) { + if (!irte->lo.fields_remap.guest_mode) { irte->hi.fields.vector = vector; irte->lo.fields_remap.destination = dest_apicid; modify_irte_ga(devid, index, irte, NULL); -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] amd/iommu: Use raw locks on atomic context paths
Several functions in this driver are called from atomic context, and thus raw locks must be used in order to be safe on PREEMPT_RT. This includes paths that must wait for command completion, which is a potential PREEMPT_RT latency concern but not easily avoidable. Signed-off-by: Scott Wood <sw...@redhat.com> --- drivers/iommu/amd_iommu.c | 30 +++--- drivers/iommu/amd_iommu_init.c | 2 +- drivers/iommu/amd_iommu_types.h | 4 ++-- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 8ead1b296d09..213f5a796ae5 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1055,9 +1055,9 @@ static int iommu_queue_command_sync(struct amd_iommu *iommu, unsigned long flags; int ret; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); ret = __iommu_queue_command_sync(iommu, cmd, sync); - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return ret; } @@ -1083,7 +1083,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu) build_completion_wait(, (u64)>cmd_sem); - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); iommu->cmd_sem = 0; @@ -1094,7 +1094,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu) ret = wait_on_sem(>cmd_sem); out_unlock: - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return ret; } @@ -3626,7 +3626,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic) goto out_unlock; /* Initialize table spin-lock */ - spin_lock_init(>lock); + raw_spin_lock_init(>lock); if (ioapic) /* Keep the first 32 indexes free for IOAPIC interrupts */ @@ -3688,7 +3688,7 @@ static int alloc_irq_index(u16 devid, int count, bool align) if (align) alignment = roundup_pow_of_two(count); - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); /* Scan table for free entries */ for (index = ALIGN(table->min_index, alignment), c = 0; @@ -3715,7 +3715,7 @@ static int alloc_irq_index(u16 devid, int count, bool align) index = -ENOSPC; out: - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return index; } @@ -3736,7 +3736,7 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte, if (!table) return -ENOMEM; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); entry = (struct irte_ga *)table->table; entry = [index]; @@ -3747,7 +3747,7 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte, if (data) data->ref = entry; - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); iommu_flush_irt(iommu, devid); iommu_completion_wait(iommu); @@ -3769,9 +3769,9 @@ static int modify_irte(u16 devid, int index, union irte *irte) if (!table) return -ENOMEM; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); table->table[index] = irte->val; - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); iommu_flush_irt(iommu, devid); iommu_completion_wait(iommu); @@ -3793,9 +3793,9 @@ static void free_irte(u16 devid, int index) if (!table) return; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); iommu->irte_ops->clear_allocated(table, index); - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); iommu_flush_irt(iommu, devid); iommu_completion_wait(iommu); @@ -4396,7 +4396,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data) if (!irt) return -ENODEV; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); if (ref->lo.fields_vapic.guest_mode) { if (cpu >= 0) @@ -4405,7 +4405,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data) barrier(); } - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); iommu_flush_irt(iommu, devid); iommu_completion_wait(iommu); diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 6fe2d0346073..e3cd81b32a33 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1474,7 +1474,7 @@ static int __init init_iommu_one(st
[PATCH 1/2] iommu/amd: Avoid get_irq_table() from atomic context
get_irq_table() acquires amd_iommu_devtable_lock which is not a raw lock, and thus cannot be acquired from atomic context on PREEMPT_RT. Many calls to modify_irte*() come from atomic context due to the IRQ desc->lock, as does amd_iommu_update_ga() due to the preemption disabling in vcpu_load/put(). The only difference between calling get_irq_table() and reading from irq_lookup_table[] directly, other than the lock acquisition and amd_iommu_rlookup_table[] check, is if the table entry is unpopulated, which should never happen when looking up a devid that came from an irq_2_irte struct, as get_irq_table() would have already been called on that devid during irq_remapping_alloc(). The lock acquisition is not needed in these cases because entries in irq_lookup_table[] never change once non-NULL -- nor would the amd_iommu_devtable_lock usage in get_irq_table() provide meaningful protection if they did, since it's released before using the looked up table in the get_irq_table() caller. The amd_iommu_rlookup_table[] check is not needed because irq_lookup_table[devid] should never be non-NULL if amd_iommu_rlookup_table[devid] is NULL. Signed-off-by: Scott Wood <sw...@redhat.com> --- drivers/iommu/amd_iommu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index dc4b73833419..8ead1b296d09 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3732,7 +3732,7 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte, if (iommu == NULL) return -EINVAL; - table = get_irq_table(devid, false); + table = irq_lookup_table[devid]; if (!table) return -ENOMEM; @@ -3765,7 +3765,7 @@ static int modify_irte(u16 devid, int index, union irte *irte) if (iommu == NULL) return -EINVAL; - table = get_irq_table(devid, false); + table = irq_lookup_table[devid]; if (!table) return -ENOMEM; @@ -3789,7 +3789,7 @@ static void free_irte(u16 devid, int index) if (iommu == NULL) return; - table = get_irq_table(devid, false); + table = irq_lookup_table[devid]; if (!table) return; @@ -4392,7 +4392,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data) if (!iommu) return -ENODEV; - irt = get_irq_table(devid, false); + irt = irq_lookup_table[devid]; if (!irt) return -ENODEV; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v16, 0/7] Fix eSDHC host version register bug
On Thu, 2016-11-10 at 04:11 +, Y.B. Lu wrote: > > > > -Original Message- > > From: Y.B. Lu > > Sent: Thursday, November 10, 2016 12:06 PM > > To: 'Scott Wood'; Ulf Hansson > > Cc: linux-mmc; Arnd Bergmann; linuxppc-...@lists.ozlabs.org; > > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux- > > ker...@vger.kernel.org; linux-clk; iommu@lists.linux-foundation.org; > > net...@vger.kernel.org; Greg Kroah-Hartman; Mark Rutland; Rob Herring; > > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh > > Sharma; Qiang Zhao; Kumar Gala; Leo Li; X.B. Xie; M.H. Lian > > Subject: RE: [v16, 0/7] Fix eSDHC host version register bug > > > > > > > > -Original Message- > > > From: linux-mmc-ow...@vger.kernel.org [mailto:linux-mmc- > > > ow...@vger.kernel.org] On Behalf Of Scott Wood > > > Sent: Thursday, November 10, 2016 11:55 AM > > > To: Ulf Hansson; Y.B. Lu > > > Cc: linux-mmc; Arnd Bergmann; linuxppc-...@lists.ozlabs.org; > > > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > > > linux- ker...@vger.kernel.org; linux-clk; > > > iommu@lists.linux-foundation.org; net...@vger.kernel.org; Greg > > > Kroah-Hartman; Mark Rutland; Rob Herring; Russell King; Jochen > > > Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh Sharma; Qiang Zhao; > > > Kumar Gala; Leo Li; X.B. Xie; M.H. Lian > > > Subject: Re: [v16, 0/7] Fix eSDHC host version register bug > > > > > > On Wed, 2016-11-09 at 19:27 +0100, Ulf Hansson wrote: > > > > > > > > - i2c-list > > > > > > > > On 9 November 2016 at 04:14, Yangbo Lu <yangbo...@nxp.com> wrote: > > > > > > > > > > > > > > > This patchset is used to fix a host version register bug in the > > > > > T4240- > > > > > R1.0-R2.0 > > > > > eSDHC controller. To match the SoC version and revision, 15 > > > > > previous version patchsets had tried many methods but all of them > > > > > were rejected by reviewers. > > > > > Such as > > > > > - dts compatible method > > > > > - syscon method > > > > > - ifdef PPC method > > > > > - GUTS driver getting SVR method Anrd suggested a > > > > > soc_device_match method in v10, and this is the only available > > > > > method left now. This v11 patchset introduces the soc_device_match > > > > > interface in soc driver. > > > > > > > > > > The first four patches of Yangbo are to add the GUTS driver. This > > > > > is used to register a soc device which contain soc version and > > > > > revision information. > > > > > The other three patches introduce the soc_device_match method in > > > > > soc driver and apply it on esdhc driver to fix this bug. > > > > > > > > > > --- > > > > > Changes for v15: > > > > > - Dropped patch 'dt: bindings: update Freescale DCFG > > > compatible' > > > > > > > > > > > > > > since the work had been done by below patch on > > > > > ShawnGuo's linux tree. > > > > > 'dt-bindings: fsl: add LS1043A/LS1046A/LS2080A > > > > > compatible for SCFG > > > > > and DCFG' > > > > > - Fixed error code issue in guts driver Changes for v16: > > > > > - Dropped patch 'powerpc/fsl: move mpc85xx.h to > > > include/linux/fsl' > > > > > > > > > > > > > > - Added a bug-fix patch from Geert > > > > > --- > > > > > > > > > > Arnd Bergmann (1): > > > > > base: soc: introduce soc_device_match() interface > > > > > > > > > > Geert Uytterhoeven (1): > > > > > base: soc: Check for NULL SoC device attributes > > > > > > > > > > Yangbo Lu (5): > > > > > ARM64: dts: ls2080a: add device configuration node > > > > > dt: bindings: move guts devicetree doc out of powerpc directory > > > > > soc: fsl: add GUTS driver for QorIQ platforms > > > > > MAINTAINERS: add entry for Freescale SoC drivers > > > > > mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0 > > > > > > > > > > .../bindings/{p
Re: [v16, 0/7] Fix eSDHC host version register bug
On Wed, 2016-11-09 at 19:27 +0100, Ulf Hansson wrote: > - i2c-list > > On 9 November 2016 at 04:14, Yangbo Luwrote: > > > > This patchset is used to fix a host version register bug in the T4240- > > R1.0-R2.0 > > eSDHC controller. To match the SoC version and revision, 15 previous > > version > > patchsets had tried many methods but all of them were rejected by > > reviewers. > > Such as > > - dts compatible method > > - syscon method > > - ifdef PPC method > > - GUTS driver getting SVR method > > Anrd suggested a soc_device_match method in v10, and this is the only > > available > > method left now. This v11 patchset introduces the soc_device_match > > interface in > > soc driver. > > > > The first four patches of Yangbo are to add the GUTS driver. This is used > > to > > register a soc device which contain soc version and revision information. > > The other three patches introduce the soc_device_match method in soc > > driver > > and apply it on esdhc driver to fix this bug. > > > > --- > > Changes for v15: > > - Dropped patch 'dt: bindings: update Freescale DCFG compatible' > > since the work had been done by below patch on ShawnGuo's linux > > tree. > > 'dt-bindings: fsl: add LS1043A/LS1046A/LS2080A compatible for > > SCFG > > and DCFG' > > - Fixed error code issue in guts driver > > Changes for v16: > > - Dropped patch 'powerpc/fsl: move mpc85xx.h to include/linux/fsl' > > - Added a bug-fix patch from Geert > > --- > > > > Arnd Bergmann (1): > > base: soc: introduce soc_device_match() interface > > > > Geert Uytterhoeven (1): > > base: soc: Check for NULL SoC device attributes > > > > Yangbo Lu (5): > > ARM64: dts: ls2080a: add device configuration node > > dt: bindings: move guts devicetree doc out of powerpc directory > > soc: fsl: add GUTS driver for QorIQ platforms > > MAINTAINERS: add entry for Freescale SoC drivers > > mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0 > > > > .../bindings/{powerpc => soc}/fsl/guts.txt | 3 + > > MAINTAINERS| 11 +- > > arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 6 + > > drivers/base/Kconfig | 1 + > > drivers/base/soc.c | 70 ++ > > drivers/mmc/host/Kconfig | 1 + > > drivers/mmc/host/sdhci-of-esdhc.c | 20 ++ > > drivers/soc/Kconfig| 3 +- > > drivers/soc/fsl/Kconfig| 18 ++ > > drivers/soc/fsl/Makefile | 1 + > > drivers/soc/fsl/guts.c | 236 > > + > > include/linux/fsl/guts.h | 125 ++- > > include/linux/sys_soc.h| 3 + > > 13 files changed, 447 insertions(+), 51 deletions(-) > > rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt > > (91%) > > create mode 100644 drivers/soc/fsl/Kconfig > > create mode 100644 drivers/soc/fsl/guts.c > > > > -- > > 2.1.0.27.g96db324 > > > Thanks, applied on my mmc tree for next! > > I noticed that some DT compatibles weren't documented, according to > checkpatch. Please fix that asap! They are documented, in fsl/guts.txt (the file moved in patch 2/7): > - compatible : Should define the compatible device type for > global-utilities. > Possible compatibles: > "fsl,qoriq-device-config-1.0" > "fsl,qoriq-device-config-2.0" > "fsl,-device-config" > "fsl,-guts" Checkpatch doesn't understand compatibles defined in such a way. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v13, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
On Fri, 2016-10-28 at 11:32 +0800, Yangbo Lu wrote: > + guts->regs = of_iomap(np, 0); > + if (!guts->regs) > + return -ENOMEM; > + > + /* Register soc device */ > + machine = of_flat_dt_get_machine_name(); > + if (machine) > + soc_dev_attr.machine = devm_kstrdup(dev, machine, > GFP_KERNEL); > + > + svr = fsl_guts_get_svr(); > + soc_die = fsl_soc_die_match(svr, fsl_soc_die); > + if (soc_die) { > + soc_dev_attr.family = devm_kasprintf(dev, GFP_KERNEL, > + "QorIQ %s", soc_die- > >die); > + } else { > + soc_dev_attr.family = devm_kasprintf(dev, GFP_KERNEL, > "QorIQ"); > + } > + soc_dev_attr.soc_id = devm_kasprintf(dev, GFP_KERNEL, > + "svr:0x%08x", svr); > + soc_dev_attr.revision = devm_kasprintf(dev, GFP_KERNEL, "%d.%d", > + SVR_MAJ(svr), SVR_MIN(svr)); > + > + soc_dev = soc_device_register(_dev_attr); > + if (IS_ERR(soc_dev)) > + return PTR_ERR(soc_dev); ioremap leaks on this error path. Use devm_ioremap_resource(). -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v12, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
On Wed, 2016-09-21 at 14:57 +0800, Yangbo Lu wrote: > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig > new file mode 100644 > index 000..b99764c > --- /dev/null > +++ b/drivers/soc/fsl/Kconfig > @@ -0,0 +1,19 @@ > +# > +# Freescale SOC drivers > +# > + > +source "drivers/soc/fsl/qe/Kconfig" > + > +config FSL_GUTS > + bool "Freescale QorIQ GUTS driver" > + select SOC_BUS > + help > + The global utilities block controls power management, I/O device > + enabling, power-onreset(POR) configuration monitoring, alternate > + function selection for multiplexed signals,and clock control. > + This driver is to manage and access global utilities block. > + Initially only reading SVR and registering soc device are > supported. > + Other guts accesses, such as reading RCW, should eventually be > moved > + into this driver as well. > + > + If you want GUTS driver support, you should say Y here. This is user-enablable without dependencies, which means it will break some randconfigs. If this is to be enabled via select then remove the text after "bool". > +/* SoC die attribute definition for QorIQ platform */ > +static const struct fsl_soc_die_attr fsl_soc_die[] = { > +#ifdef CONFIG_PPC > + /* > + * Power Architecture-based SoCs T Series > + */ > + > + /* Die: T4240, SoC: T4240/T4160/T4080 */ > + { .die = "T4240", > + .svr = 0x8240, > + .mask = 0xfff0, > + }, > + /* Die: T1040, SoC: T1040/T1020/T1042/T1022 */ > + { .die = "T1040", > + .svr = 0x8520, > + .mask = 0xfff0, > + }, > + /* Die: T2080, SoC: T2080/T2081 */ > + { .die = "T2080", > + .svr = 0x8530, > + .mask = 0xfff0, > + }, > + /* Die: T1024, SoC: T1024/T1014/T1023/T1013 */ > + { .die = "T1024", > + .svr = 0x8540, > + .mask = 0xfff0, > + }, > +#endif /* CONFIG_PPC */ > +#if defined(CONFIG_ARCH_MXC) || defined(CONFIG_ARCH_LAYERSCAPE) Will this driver ever be probed on MXC? Why do we need these ifdefs at all? > + /* > + * ARM-based SoCs LS Series > + */ > + > + /* Die: LS1043A, SoC: LS1043A/LS1023A */ > + { .die = "LS1043A", > + .svr = 0x8792, > + .mask = 0x, > + }, > + /* Die: LS2080A, SoC: LS2080A/LS2040A/LS2085A */ > + { .die = "LS2080A", > + .svr = 0x8701, > + .mask = 0xff3f, > + }, > + /* Die: LS1088A, SoC: LS1088A/LS1048A/LS1084A/LS1044A */ > + { .die = "LS1088A", > + .svr = 0x8703, > + .mask = 0xff3f, > + }, > + /* Die: LS1012A, SoC: LS1012A */ > + { .die = "LS1012A", > + .svr = 0x8704, > + .mask = 0x, > + }, > + /* Die: LS1046A, SoC: LS1046A/LS1026A */ > + { .die = "LS1046A", > + .svr = 0x8707, > + .mask = 0x, > + }, > + /* Die: LS2088A, SoC: LS2088A/LS2048A/LS2084A/LS2044A */ > + { .die = "LS2088A", > + .svr = 0x8709, > + .mask = 0xff3f, > + }, > + /* Die: LS1021A, SoC: LS1021A/LS1020A/LS1022A > + * Note: Put this die at the end in cause of incorrect > identification > + */ > + { .die = "LS1021A", > + .svr = 0x8700, > + .mask = 0xfff0, > + }, > +#endif /* CONFIG_ARCH_MXC || CONFIG_ARCH_LAYERSCAPE */ Instead of relying on ordering, add more bits to the mask so that there's no overlap. I think 0xfff7 would work. > +out: > + kfree(soc_dev_attr.machine); > + kfree(soc_dev_attr.family); > + kfree(soc_dev_attr.soc_id); > + kfree(soc_dev_attr.revision); > + iounmap(guts->regs); > +out_free: > + kfree(guts); > + return ret; > +} Please use devm. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
On Tue, 2016-09-13 at 07:23 +, Y.B. Lu wrote: > > > > > > -Original Message- > > From: linux-mmc-ow...@vger.kernel.org [mailto:linux-mmc- > > ow...@vger.kernel.org] On Behalf Of Scott Wood > > Sent: Tuesday, September 13, 2016 7:25 AM > > To: Y.B. Lu; linux-...@vger.kernel.org; ulf.hans...@linaro.org; Arnd > > Bergmann > > Cc: linuxppc-...@lists.ozlabs.org; devicet...@vger.kernel.org; linux-arm- > > ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux- > > c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux- > > foundation.org; net...@vger.kernel.org; Mark Rutland; Rob Herring; > > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh > > Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie > > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms > > > > BTW, aren't ls2080a and ls2085a the same die? And is there no non-E > > version of LS2080A/LS2040A? > [Lu Yangbo-B47093] I checked all the svr values in chip errata doc "Revision > level to part marking cross-reference" table. > I found ls2080a and ls2085a were in two separate doc. And I didn’t find non- > E version of LS2080A/LS2040A in chip errata doc. > Do you know is there any other doc we can confirm this? No. Traditionally we've always had E and non-E versions of each chip, but I have no knowledge of whether that has changed (I do note that the way that E- status is indicated in SVR has changed). But please label LS2080A and LS2085A as the same die (or provide strong evidence that they are not). > > > > > > > > > > > > > > > > > > + do { > > > > > + if (!matches->soc_id) > > > > > + return NULL; > > > > > + if (glob_match(svr_match, matches->soc_id)) > > > > > + break; > > > > > + } while (matches++); > > > > Are you expecting "matches++" to ever evaluate as false? > > > [Lu Yangbo-B47093] Yes, this is used to match the soc we use in > > > qoriq_soc array until getting true. > > > We need to get the name and die information defined in array. > > I'm not asking whether the glob_match will ever return true. I'm saying > > that "matches++" will never become NULL. > [Lu Yangbo-B47093] The matches++ will never become NULL while it will return > NULL after matching for all the members in array. "matches++" will never "return NULL". It's just an incrementing address. It won't be null until you wrap around the address space, and even if the other loop terminators never kicked in you'd crash long before that happens. Please rewrite the loop as something like: while (matches->soc_id) { if (glob_match(...)) return matches; matches++; } return NULL; > > > > > + /* Register soc device */ > > > > > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL); > > > > > + if (!soc_dev_attr) { > > > > > + ret = -ENOMEM; > > > > > + goto out_unmap; > > > > > + } > > > > Couldn't this be statically allocated? > > > [Lu Yangbo-B47093] Do you mean we define this struct statically ? > > > > > > static struct soc_device_attribute soc_dev_attr; > > Yes. > > > [Lu Yangbo-B47093] It's ok to define it statically. Is there any need to do > that? It's simpler. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
On Mon, 2016-09-12 at 06:39 +, Y.B. Lu wrote: > Hi Scott, > > Thanks for your review :) > See my comment inline. > > > > > -Original Message- > > From: Scott Wood [mailto:o...@buserror.net] > > Sent: Friday, September 09, 2016 11:47 AM > > To: Y.B. Lu; linux-...@vger.kernel.org; ulf.hans...@linaro.org; Arnd > > Bergmann > > Cc: linuxppc-...@lists.ozlabs.org; devicet...@vger.kernel.org; linux-arm- > > ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux- > > c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux- > > foundation.org; net...@vger.kernel.org; Mark Rutland; Rob Herring; > > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh > > Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie > > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms > > > > On Tue, 2016-09-06 at 16:28 +0800, Yangbo Lu wrote: > > > > > > The global utilities block controls power management, I/O device > > > enabling, power-onreset(POR) configuration monitoring, alternate > > > function selection for multiplexed signals,and clock control. > > > > > > This patch adds a driver to manage and access global utilities block. > > > Initially only reading SVR and registering soc device are supported. > > > Other guts accesses, such as reading RCW, should eventually be moved > > > into this driver as well. > > > > > > Signed-off-by: Yangbo Lu <yangbo...@nxp.com> > > > Signed-off-by: Scott Wood <o...@buserror.net> > > Don't put my signoff on patches that I didn't put it on > > myself. Definitely don't put mine *after* yours on patches that were > > last modified by you. > > > > If you want to mention that the soc_id encoding was my suggestion, then > > do so explicitly. > > > [Lu Yangbo-B47093] I found your 'signoff' on this patch at below link. > http://patchwork.ozlabs.org/patch/649211/ > > So, let me just change the order in next version ? > Signed-off-by: Scott Wood <o...@buserror.net> > Signed-off-by: Yangbo Lu <yangbo...@nxp.com> No. This isn't my patch so my signoff shouldn't be on it. > [Lu Yangbo-B47093] It's a good idea to move die into .family I think. > In my opinion, it's better to keep svr and name in soc_id just like your > suggestion above. > > > > { > > .soc_id = "svr:0x85490010,name:T1023E,", > > .family = "QorIQ T1024", > > } > The user probably don’t like to learn the svr value. What they want is just > to match the soc they use. > It's convenient to use name+rev for them to match a soc. What the user should want 99% of the time is to match the die (plus revision), not the soc. > Regarding shrinking the table, I think it's hard to use svr+mask. Because I > find many platforms use different masks. > We couldn’t know the mask according svr value. The mask would be part of the table: { { .die = "T1024", .svr = 0x8540, .mask = 0xfff0, }, { .die = "T1040", .svr = 0x8520, .mask = 0xfff0, }, { .die = "LS1088A", .svr = 0x8703, .mask = 0x, }, ... } There's a small risk that we get the mask wrong and a different die is created that matches an existing table, but it doesn't seem too likely, and can easily be fixed with a kernel update if it happens. BTW, aren't ls2080a and ls2085a the same die? And is there no non-E version of LS2080A/LS2040A? > > > + do { > > > + if (!matches->soc_id) > > > + return NULL; > > > + if (glob_match(svr_match, matches->soc_id)) > > > + break; > > > + } while (matches++); > > Are you expecting "matches++" to ever evaluate as false? > [Lu Yangbo-B47093] Yes, this is used to match the soc we use in qoriq_soc > array until getting true. > We need to get the name and die information defined in array. I'm not asking whether the glob_match will ever return true. I'm saying that "matches++" will never become NULL. > > > + /* Register soc device */ > > > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL); > > > + if (!soc_dev_attr) { > > > + ret = -ENOMEM; > > > + goto out_unmap; > > > + } > > Couldn't this be statically allocated? > [Lu Yangbo-B47093] Do you mean we define this struct statically ? > >
Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
On Tue, 2016-09-06 at 16:28 +0800, Yangbo Lu wrote: > The global utilities block controls power management, I/O device > enabling, power-onreset(POR) configuration monitoring, alternate > function selection for multiplexed signals,and clock control. > > This patch adds a driver to manage and access global utilities block. > Initially only reading SVR and registering soc device are supported. > Other guts accesses, such as reading RCW, should eventually be moved > into this driver as well. > > Signed-off-by: Yangbo Lu <yangbo...@nxp.com> > Signed-off-by: Scott Wood <o...@buserror.net> Don't put my signoff on patches that I didn't put it on myself. Definitely don't put mine *after* yours on patches that were last modified by you. If you want to mention that the soc_id encoding was my suggestion, then do so explicitly. > +/* SoC attribute definition for QorIQ platform */ > +static const struct soc_device_attribute qoriq_soc[] = { > +#ifdef CONFIG_PPC > + /* > + * Power Architecture-based SoCs T Series > + */ > + > + /* SoC: T1024/T1014/T1023/T1013 Rev: 1.0 */ > + { .soc_id = "svr:0x85400010,name:T1024,die:T1024", > + .revision = "1.0", > + }, > + { .soc_id = "svr:0x85480010,name:T1024E,die:T1024", > + .revision = "1.0", > + }, Revision could be computed from the low 8 bits of SVR (just as you do for unknown SVRs). We could move the die name into .family: { .soc_id = "svr:0x85490010,name:T1023E,", .family = "QorIQ T1024", } I see you dropped svre (and the trailing comma), though I guess the vast majority of potential users will be looking at .family. In which case do we even need name? If we just make the soc_id be "svr:0x" then we could shrink the table to an svr+mask that identifies each die. I'd still want to keep the "svr:" even if we're giving up on the general tagging system, to make it clear what the number refers to, and to provide some defense against users who match only against soc_id rather than soc_id+family. Or we could go further and format soc_id as "QorIQ SVR 0x" so that soc_id-only matches are fully acceptable rather than just less dangerous. > +static const struct soc_device_attribute *fsl_soc_device_match( > + unsigned int svr, const struct soc_device_attribute *matches) > +{ > + char svr_match[50]; > + int n; > + > + n = sprintf(svr_match, "*%08x*", svr); n = sprintf(svr_match, "svr:0x%08x,*", svr); (according to the current encoding) > + > + do { > + if (!matches->soc_id) > + return NULL; > + if (glob_match(svr_match, matches->soc_id)) > + break; > + } while (matches++); Are you expecting "matches++" to ever evaluate as false? > + /* Register soc device */ > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL); > + if (!soc_dev_attr) { > + ret = -ENOMEM; > + goto out_unmap; > + } Couldn't this be statically allocated? > + > + machine = of_flat_dt_get_machine_name(); > + if (machine) > + soc_dev_attr->machine = kasprintf(GFP_KERNEL, "%s", > machine); > + > + soc_dev_attr->family = kasprintf(GFP_KERNEL, "QorIQ"); > + > + svr = fsl_guts_get_svr(); > + fsl_soc = fsl_soc_device_match(svr, qoriq_soc); > + if (fsl_soc) { > + soc_dev_attr->soc_id = kasprintf(GFP_KERNEL, "%s", > + fsl_soc->soc_id); You can use kstrdup() if you're just copying the string as is. > + soc_dev_attr->revision = kasprintf(GFP_KERNEL, "%s", > + fsl_soc->revision); > + } else { > + soc_dev_attr->soc_id = kasprintf(GFP_KERNEL, "0x%08x", > svr); kasprintf(GFP_KERNEL, "svr:0x%08x,", svr); > + > + soc_dev = soc_device_register(soc_dev_attr); > + if (IS_ERR(soc_dev)) { > + ret = -ENODEV; Why are you changing the error code? > + goto out; > + } else { Unnecessary "else". > + pr_info("Detected: %s\n", soc_dev_attr->machine); Machine: %s > + pr_info("Detected SoC family: %s\n", soc_dev_attr->family); > + pr_info("Detected SoC ID: %s, revision: %s\n", > + soc_dev_attr->soc_id, soc_dev_attr->revision); s/Detected //g > + } > + return 0; > +out: > +
Re: [v10, 3/7] soc: fsl: add GUTS driver for QorIQ platforms
On Fri, 2016-07-15 at 12:43 -0400, Paul Gortmaker wrote: > On Wed, May 4, 2016 at 11:12 PM, Yangbo Lu <yangbo...@nxp.com> wrote: > > > > The global utilities block controls power management, I/O device > > enabling, power-onreset(POR) configuration monitoring, alternate > > function selection for multiplexed signals,and clock control. > > > > This patch adds GUTS driver to manage and access global utilities > > block. > > > > Signed-off-by: Yangbo Lu <yangbo...@nxp.com> > > Acked-by: Scott Wood <o...@buserror.net> > > --- > > Changes for v4: > > - Added this patch > > Changes for v5: > > - Modified copyright info > > - Changed MODULE_LICENSE to GPL > > - Changed EXPORT_SYMBOL_GPL to EXPORT_SYMBOL > > - Made FSL_GUTS user-invisible > > - Added a complete compatible list for GUTS > > - Stored guts info in file-scope variable > > - Added mfspr() getting SVR > > - Redefined GUTS APIs > > - Called fsl_guts_init rather than using platform driver > > - Removed useless parentheses > > - Removed useless 'extern' key words > > Changes for v6: > > - Made guts thread safe in fsl_guts_init > > Changes for v7: > > - Removed 'ifdef' for function declaration in guts.h > > Changes for v8: > > - Fixes lines longer than 80 characters checkpatch issue > > - Added 'Acked-by: Scott Wood' > > Changes for v9: > > - None > > Changes for v10: > > - None > > --- > > drivers/soc/Kconfig | 2 +- > > drivers/soc/fsl/Kconfig | 8 +++ > > drivers/soc/fsl/Makefile | 1 + > > drivers/soc/fsl/guts.c | 119 > > > > include/linux/fsl/guts.h | 126 +- > > - > > 5 files changed, 207 insertions(+), 49 deletions(-) > > create mode 100644 drivers/soc/fsl/Kconfig > > create mode 100644 drivers/soc/fsl/guts.c > > > > diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig > > index cb58ef0..7106463 100644 > > --- a/drivers/soc/Kconfig > > +++ b/drivers/soc/Kconfig > > @@ -2,7 +2,7 @@ menu "SOC (System On Chip) specific Drivers" > > > > source "drivers/soc/bcm/Kconfig" > > source "drivers/soc/brcmstb/Kconfig" > > -source "drivers/soc/fsl/qe/Kconfig" > > +source "drivers/soc/fsl/Kconfig" > > source "drivers/soc/mediatek/Kconfig" > > source "drivers/soc/qcom/Kconfig" > > source "drivers/soc/rockchip/Kconfig" > > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig > > new file mode 100644 > > index 000..b313759 > > --- /dev/null > > +++ b/drivers/soc/fsl/Kconfig > > @@ -0,0 +1,8 @@ > > +# > > +# Freescale SOC drivers > > +# > > + > > +source "drivers/soc/fsl/qe/Kconfig" > > + > > +config FSL_GUTS > > + bool > > diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile > > index 203307f..02afb7f 100644 > > --- a/drivers/soc/fsl/Makefile > > +++ b/drivers/soc/fsl/Makefile > > @@ -4,3 +4,4 @@ > > > > obj-$(CONFIG_QUICC_ENGINE) += qe/ > > obj-$(CONFIG_CPM) += qe/ > > +obj-$(CONFIG_FSL_GUTS) += guts.o > > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c > > new file mode 100644 > > index 000..fa155e6 > > --- /dev/null > > +++ b/drivers/soc/fsl/guts.c > > @@ -0,0 +1,119 @@ > > +/* > > + * Freescale QorIQ Platforms GUTS Driver > > + * > > + * Copyright (C) 2016 Freescale Semiconductor, Inc. > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2 of the License, or > > + * (at your option) any later version. > > + */ > > + > > +#include > > +#include > Seems there was lots of discussion on this. If it does end up being > resent, it would be nice to get the module.h and other modular stuff > gone since it is a bool Kconfig. I plan to resend just the GUTS driver portion and send it through the PPC tree. I don't see any modular stuff in there besides the linux/module.h include. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] soc: fsl: add GUTS driver for QorIQ platforms
On Thu, 2016-07-07 at 10:30 +0200, Arnd Bergmann wrote: > On Thursday, July 7, 2016 2:35:33 AM CEST Yangbo Lu wrote: > > > > Hi Arnd, > > > > Could you reply when you see the email? > > If your method doesn’t resolve the problem, we still want to use our old > > patchset. > > > > This guts driver had been discussed about one year and blocked many > > workaround upstream. > > So please help to review and comment soon. > > > I don't really see how more discussion is going to help us here. I think > I've made it pretty clear that I don't want to see another platform > specific way to read an SoC revision and I've even sent a proof-of-concept > patch to show how the interface can work, now it's up to you to fit the > guts hardware into that and send a new patch series. In which relevant maintainership capacity are you NACKing it? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/4] Revert "powerpc/fsl: Move fsl_guts.h out of arch/powerpc"
On Thu, 2016-06-02 at 11:01 +0200, Arnd Bergmann wrote: > On Wednesday, June 1, 2016 8:24:20 PM CEST Scott Wood wrote: > > On Mon, 2016-05-30 at 15:18 +0200, Arnd Bergmann wrote: > > > All users of this driver are PowerPC specific and the header file > > > has no business in the global include/linux/ hierarchy, so move > > > it back before anyone starts using it on ARM. > > > > > > This reverts commit 948486544713492f00ac8a9572909101ea892cb0. > > > > > > Signed-off-by: Arnd Bergmann <a...@arndb.de> > > > --- > > > This part of the series is not required for the eSDHC quirk, > > > but it restores the asm/fsl_guts.h header so it doesn't accidentally > > > get abused for this in the future. I found two drivers outside of > > > arch/powerpc that already accessed the registers directly, but the > > > functions look fairly contained, and can be easily hidden in an > > > #ifdef CONFIG_PPC > > > > NACK > > > > Besides adding ifdef pollution for no good reason, this register block is > > used > > on some ARM chips as well. Why is it a problem if "anyone starts using it > > on > > ARM"? > > It's just not a good interface when it's defined as "this is the layout of > a register area that any driver can ioremap() if they can figure out the > device node". That's why I want to move accesses into one guts driver. > It's not uncommon to have register areas like that, but > normally you have at the minimum a 'syscon' device to handle locking > between drivers accessing the same registers and to avoid having to map > the same area multiple times. syscon requires device tree changes. I don't see read-modify-write operations in regmap -- how does locking around an individual, inherently-atomic load or store help? > If we need to use 'guts' registers on ARM, we can find a way to abstract > them properly for the given use cases, using a syscon or a driver with > exported functions, but just making a PowerPC platform specific header > global to all Linux drivers by putting it into include/linux doesn't seem > right. Again, it's not PowerPC-specific! It started that way but then the same register block got put onto some ARM chips. It's not global to "all Linux drivers", just the ones that choose to include an fsl-specific header. If and when all uses of guts are moved into the guts driver, the header can be moved into drivers/soc/fsl. > Note that the header file uses a structure definition rather than the more > common macros with register offsets, which is fine for a driver that has > its own registers and abstracts them, but it doesn't really work with > the regmap interface, so if we want to use it with syscon, it also needs to > be rewritten. We don't want to use it with syscon. If we did, the solution wouldn't be to move the header back to arch/powerpc, but to convert the struct into offsets. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
On Thu, 2016-06-02 at 10:52 +0200, Arnd Bergmann wrote: > On Wednesday, June 1, 2016 8:11:14 PM CEST Scott Wood wrote: > > > +#define T4240_HOST_VER ((VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) | > > > SDHCI_SPEC_200) > > > +static const struct soc_device_attribute esdhc_t4240_quirk = { > > > + /* T4240 revision < 0x20 uses vendor version 23, SDHCI version 200 > > > */ > > > + { .soc_id = "T4*(0x824000)", .revision = "0x[01]?", > > > + .data = (void *)(uintptr_t)(T4240_HOST_VER) }, > > > > Why should this code need to care that the string begins with "T4"? This > > creates dual maintenance if that were to change. It's also broken because > > T4240 has compatible = "fsl,t4240-device-config", "fsl,qoriq-device-config > > -2.0" and thus with these patches it would incorrectly show up as "P > > series > > (0x824000)". The compatible string of this node was never meant to be a > > key > > for choosing a string to describe the system to userspace. > > This is an artifact of not knowing the specific SoC name, and we can change > that by looking up the name from the SVR value in the soc_device driver. ...or we could keep it simple and just match the number. > > 0x824000 is a magic number which should be represented symbolically. > > Sure, feel free to change the format of the soc_device string in any > name, That's not what I was asking for... The match should be numeric but the knowledge of what the number is should come from a symbolic #define. > > If T4240 is affected, then so are the reduced-core variants T4160 and > > T4080, > > but 0x824000 doesn't match them (Yangbo's patch had the same problem). > > And > > please don't respond with "0x824*" > > > > You also didn't strip out the E bit of SVR which indicates encryption > > capability and nothing else (Yangbo's patch did not have this problem > > because > > it used SVR_SOC_VER). > > Ok, that should be easy enough to fix in the soc_device driver. No, because the soc_device driver doesn't know whether the consumer of the ID cares about the E bit. > > What happens if the revision condition is more complicated, such as <= > > 0x20 > > with 0x21 being fine? Multiple quirk entries where before we had as > > simple > > comparison? > > I guess yes. I would really hope that there is no need to use this interface > pervasively, it's really just to work around the cases where there is no > way to pass the information in DT otherwise. How does putting it in the DT work when you have multiple versions of the same SoC, some of which have the bug and some which don't? > > I fail to see how this approach is an improvement (much less one that > > needs to > > hold up a patchset that is fixing a problem and is not touching any > > generic > > code). Why does this need to be a string? > > A string is what user space gets in /sys/devices/soc/*, It is rare that the kernel accesses information in the exact same way that userspace does. And once we expose this to userspace we're stuck with it, so exporting anything other than a simple number is even less desirable. > and we already have > code that does the same things there to work around quirks, here we just > use the same interface in a completely generic way. Note that not every > SoC family uses numbers in the same way, some have multiple subrevisions, > some have names etc. Where is the need for a "completely generic way" for one piece of vendor -specific code to get information that is inherently specific to that vendor, that is supplied by code specific to that vendor? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] soc: fsl: add GUTS driver for QorIQ platforms
On Thu, 2016-06-02 at 10:43 +0200, Arnd Bergmann wrote: > On Wednesday, June 1, 2016 8:47:22 PM CEST Scott Wood wrote: > > On Mon, 2016-05-30 at 15:15 +0200, Arnd Bergmann wrote: > > > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c > > > new file mode 100644 > > > index ..2f30698f5bcf > > > --- /dev/null > > > +++ b/drivers/soc/fsl/guts.c > > > @@ -0,0 +1,130 @@ > > > +/* > > > + * Freescale QorIQ Platforms GUTS Driver > > > + * > > > + * Copyright (C) 2016 Freescale Semiconductor, Inc. > > > + * > > > + * This program is free software; you can redistribute it and/or modify > > > + * it under the terms of the GNU General Public License as published by > > > + * the Free Software Foundation; either version 2 of the License, or > > > + * (at your option) any later version. > > > + */ > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#define GUTS_PVR 0x0a0 > > > +#define GUTS_SVR 0x0a4 > > > + > > > +struct guts { > > > + void __iomem *regs; > > > > We already have a struct to define guts. Why are you not using it? Why > > do > > you consider using it to be "abuse"? What if we want to move more guts > > functionality into this driver? > > This structure was in the original patch, I left it in there, only > removed the inclusion of the powerpc header file, which seemed to > be misplaced. I'm not refering "struct guts". I'm referring to changing "struct ccsr_guts __iomem *regs" into "void __iomem *regs". And it's not a powerpc header file. > > > +/* > > > + * Table for matching compatible strings, for device tree > > > + * guts node, for Freescale QorIQ SOCs. > > > + */ > > > +static const struct of_device_id fsl_guts_of_match[] = { > > > + /* For T4 & B4 Series SOCs */ > > > + { .compatible = "fsl,qoriq-device-config-1.0", .data = "T4/B4 > > > series" }, > > [snip] > > > + { .compatible = "fsl,qoriq-device-config-2.0", .data = "P > > > series" > > > > As noted in my comment on patch 3/4, these descriptions are reversed. > > > > They're also incomplete. t2080 has device config 2.0. t1040 is described > > as > > 2.0 though it should probably be 2.1 (or better, drop the generic > > compatible > > altogether). > > Ok. Ideally I think we'd even look up the specific SoC names from the > SVC rather than the compatible string. I just didn't have a good list > for those to put in the driver. The list is in arch/powerpc/include/asm/mpc85xx.h but I don't know why we need to convert it to a string in the first place. > > > > + /* > > > + * syscon devices default to little-endian, but on powerpc we > > > have > > > + * existing device trees with big-endian maps and an absent > > > endianess > > > + * "big-property" > > > + */ > > > + if (!IS_ENABLED(CONFIG_POWERPC) && > > > + !of_property_read_bool(dev->of_node, "big-endian")) > > > + guts->little_endian = true; > > > > This is not a syscon device (Yangbo's patch to add a guts node on ls2080 > > is > > the only guts node that says "syscon", and that was a leftover from > > earlier > > revisions and should probably be removed). Even if it were, where is it > > documented that syscon defaults to little-endian? > > Documentation/devicetree/bindings/regmap/regmap.txt > > We had a little screwup here, basically regmap (and by consequence, syscon) > always defaulted to little-endian way before that was documented, so it's > too late to change it, What causes a device node to fall under the jurisdiction of regmap.txt? Again, these nodes do not claim "syscon" compatibility. > although I agree it would have made sense to document > regmap to default to big-endian on powerpc. Please don't. It's enough of a mess as is; no need to start throwing in architecture ifdefs. > > Documentation/devicetree/bindings/common-properties.txt says that the > > individual binding specifies the default. The default for this node > > should be > > big-endian because that's what existed before there was a need to describe > > the > > endianness. And we need an update to the guts binding to specify that. > > Good point. This proably mea
Re: [PATCH 2/4] soc: fsl: add GUTS driver for QorIQ platforms
On Mon, 2016-05-30 at 15:15 +0200, Arnd Bergmann wrote: > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c > new file mode 100644 > index ..2f30698f5bcf > --- /dev/null > +++ b/drivers/soc/fsl/guts.c > @@ -0,0 +1,130 @@ > +/* > + * Freescale QorIQ Platforms GUTS Driver > + * > + * Copyright (C) 2016 Freescale Semiconductor, Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define GUTS_PVR 0x0a0 > +#define GUTS_SVR 0x0a4 > + > +struct guts { > + void __iomem *regs; We already have a struct to define guts. Why are you not using it? Why do you consider using it to be "abuse"? What if we want to move more guts functionality into this driver? > + bool little_endian; > + struct soc_device_attribute soc; > +}; > + > +static u32 fsl_guts_get_svr(struct guts *guts) > +{ > + if (guts->little_endian) > + return ioread32(guts->regs + GUTS_SVR); > + else > + return ioread32be(guts->regs + GUTS_SVR); > +} > + > +static u32 fsl_guts_get_pvr(struct guts *guts) > +{ > + if (guts->little_endian) > + return ioread32(guts->regs + GUTS_PVR); > + else > + return ioread32be(guts->regs + GUTS_PVR); > +} You've removed the fallback to mfspr() on PPC, which would be helpful in some virtualized environments where we don't have the guts node (but do have other directly assigned devices). Of course, this is a consequence of the conversion into a platform device. > + > +/* > + * Table for matching compatible strings, for device tree > + * guts node, for Freescale QorIQ SOCs. > + */ > +static const struct of_device_id fsl_guts_of_match[] = { > + /* For T4 & B4 Series SOCs */ > + { .compatible = "fsl,qoriq-device-config-1.0", .data = "T4/B4 > series" }, [snip] > + { .compatible = "fsl,qoriq-device-config-2.0", .data = "P series" As noted in my comment on patch 3/4, these descriptions are reversed. They're also incomplete. t2080 has device config 2.0. t1040 is described as 2.0 though it should probably be 2.1 (or better, drop the generic compatible altogether). > + /* > + * syscon devices default to little-endian, but on powerpc we have > + * existing device trees with big-endian maps and an absent > endianess > + * "big-property" > + */ > + if (!IS_ENABLED(CONFIG_POWERPC) && > + !of_property_read_bool(dev->of_node, "big-endian")) > + guts->little_endian = true; This is not a syscon device (Yangbo's patch to add a guts node on ls2080 is the only guts node that says "syscon", and that was a leftover from earlier revisions and should probably be removed). Even if it were, where is it documented that syscon defaults to little-endian? Documentation/devicetree/bindings/common-properties.txt says that the individual binding specifies the default. The default for this node should be big-endian because that's what existed before there was a need to describe the endianness. And we need an update to the guts binding to specify that. > + > + guts->regs = devm_ioremap_resource(dev, 0); > + if (!guts->regs) { > + ret = -ENOMEM; > + kfree(guts); > + goto out; > + } > + > + fsl_guts_init(dev, guts); > + ret = 0; > +out: > + return ret; > +} > + > +static struct platform_driver fsl_soc_guts = { > + .probe = fsl_guts_probe, > + .driver.of_match_table = fsl_guts_of_match, > +}; > + > +module_platform_driver(fsl_soc_guts); Again, this means that the information is not available during early boot, such as in the clock driver. Thus we would not be able to convert clk-qoriq's direct mfspr(SPRN_SVR) into an soc_device_match() (or anything else that makes use of this file), nor would we be able to move its access of the guts RCW registers into this driver. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/4] Revert "powerpc/fsl: Move fsl_guts.h out of arch/powerpc"
On Mon, 2016-05-30 at 15:18 +0200, Arnd Bergmann wrote: > All users of this driver are PowerPC specific and the header file > has no business in the global include/linux/ hierarchy, so move > it back before anyone starts using it on ARM. > > This reverts commit 948486544713492f00ac8a9572909101ea892cb0. > > Signed-off-by: Arnd Bergmann> --- > This part of the series is not required for the eSDHC quirk, > but it restores the asm/fsl_guts.h header so it doesn't accidentally > get abused for this in the future. I found two drivers outside of > arch/powerpc that already accessed the registers directly, but the > functions look fairly contained, and can be easily hidden in an > #ifdef CONFIG_PPC NACK Besides adding ifdef pollution for no good reason, this register block is used on some ARM chips as well. Why is it a problem if "anyone starts using it on ARM"? BTW, of all the mailing lists you included on this CC, you seem to have left off the PPC list (I've added it). -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
On Mon, 2016-05-30 at 15:16 +0200, Arnd Bergmann wrote: > This is a rewrite of an earlier patch from Yangbo Lu, adding a quirk > for the NXP QorIQ T4240 in the detection of the host device version. > > Unfortunately, this device cannot be detected using the compatible > string, as we have to support existing DTS files that use the generic > "fsl,t4240-esdhc" identifier but that have other host versions that > are correctly detected. > > Signed-off-by: Arnd Bergmann> > diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of > -esdhc.c > index 3f34d354f1fc..1d4814fe4cb2 100644 > --- a/drivers/mmc/host/sdhci-of-esdhc.c > +++ b/drivers/mmc/host/sdhci-of-esdhc.c > @@ -73,14 +73,16 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host, > static u16 esdhc_readw_fixup(struct sdhci_host *host, >int spec_reg, u32 value) > { > + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > + struct sdhci_esdhc *esdhc = sdhci_pltfm_priv(pltfm_host); > u16 ret; > int shift = (spec_reg & 0x2) * 8; > > if (spec_reg == SDHCI_HOST_VERSION) > - ret = value & 0x; > - else > - ret = (value >> shift) & 0x; > - return ret; > + return esdhc->vendor_ver << SDHCI_VENDOR_VER_SHIFT | > +esdhc->spec_ver; > + > + return (value >> shift) & 0x; > } > > static u8 esdhc_readb_fixup(struct sdhci_host *host, > @@ -562,16 +564,32 @@ static const struct sdhci_pltfm_data > sdhci_esdhc_le_pdata = { > .ops = _esdhc_le_ops, > }; > > +#define T4240_HOST_VER ((VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) | > SDHCI_SPEC_200) > +static const struct soc_device_attribute esdhc_t4240_quirk = { > + /* T4240 revision < 0x20 uses vendor version 23, SDHCI version 200 > */ > + { .soc_id = "T4*(0x824000)", .revision = "0x[01]?", > + .data = (void *)(uintptr_t)(T4240_HOST_VER) }, Why should this code need to care that the string begins with "T4"? This creates dual maintenance if that were to change. It's also broken because T4240 has compatible = "fsl,t4240-device-config", "fsl,qoriq-device-config -2.0" and thus with these patches it would incorrectly show up as "P series (0x824000)". The compatible string of this node was never meant to be a key for choosing a string to describe the system to userspace. 0x824000 is a magic number which should be represented symbolically. If T4240 is affected, then so are the reduced-core variants T4160 and T4080, but 0x824000 doesn't match them (Yangbo's patch had the same problem). And please don't respond with "0x824*" You also didn't strip out the E bit of SVR which indicates encryption capability and nothing else (Yangbo's patch did not have this problem because it used SVR_SOC_VER). What happens if the revision condition is more complicated, such as <= 0x20 with 0x21 being fine? Multiple quirk entries where before we had as simple comparison? I fail to see how this approach is an improvement (much less one that needs to hold up a patchset that is fixing a problem and is not touching any generic code). Why does this need to be a string? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
On Thu, 2016-05-05 at 13:10 +0200, Arnd Bergmann wrote: > On Thursday 05 May 2016 09:41:32 Yangbo Lu wrote: > > > -Original Message- > > > From: Arnd Bergmann [mailto:a...@arndb.de] > > > Sent: Thursday, May 05, 2016 4:32 PM > > > To: linuxppc-...@lists.ozlabs.org > > > Cc: Yangbo Lu; linux-...@vger.kernel.org; devicet...@vger.kernel.org; > > > linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; > > > linux-...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux- > > > foundation.org; net...@vger.kernel.org; Mark Rutland; > > > ulf.hans...@linaro.org; Russell King; Bhupesh Sharma; Joerg Roedel; > > > Santosh Shilimkar; Yang-Leo Li; Scott Wood; Rob Herring; Claudiu Manoil; > > > Kumar Gala; Xiaobo Xie; Qiang Zhao > > > Subject: Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240- > > > R1.0-R2.0 > > > > > > On Thursday 05 May 2016 11:12:30 Yangbo Lu wrote: > > > > IIRC, it is the same IP block as i.MX and Arnd's point is this won't > > > > even compile on !PPC. It is things like this that prevent sharing the > > > > driver. > > > > The whole point of using the MMIO SVR instead of the PPC SPR is so that > > it will work on ARM... The guts driver should build on any platform as > > long as OF is enabled, and if it doesn't find a node to bind to it will > > return 0 for SVR, and the eSDHC driver will continue (after printing an > > error that should be removed) without the ability to test for errata > > based on SVR. > > It feels like a bad design to have to come up with a different > method for each SoC type here when they all do the same thing > and want to identify some variant of the chip to do device > specific quirks. > > As far as I'm concerned, every driver in drivers/soc that needs to > export a symbol to be used by a device driver is an indication that > we don't have the right set of abstractions yet. There are cases > that are not worth abstracting because the functionality is rather > obscure and only a couple of drivers for one particular chip > ever need it. > > Finding out the version of the SoC does not look like this case. I'm open to new ways of abstracting this, but can that please be discussed after these patches are merged? This patchset is fixing a problem, the existing abstraction is unappealing and not widely adopted, a new abstraction is not ready, and we're only touching code for our hardware. Oh, and the existing abstraction isn't even "existing". I don't see any examples where soc_device is being used like this -- or even any way for a driver (the one consuming the information, not the soc "driver") to get a reference to the soc_device that's been registered short of searching for the device object by name -- and you're asking for new functionality in drivers/base/soc.c. > > > I think the first four patches take care of building for ARM, > > > but the problem remains if you want to enable COMPILE_TEST as > > > we need for certain automated checking. > > > > What specific problem is there with COMPILE_TEST? > > COMPILE_TEST is solvable here and the way it is implemented in this > case (selecting FSL_GUTS from the driver) indeed looks like it works > correctly, but it's still awkward that this means building the > SoC specific ID stuff into the vmlinux binary for any driver that > uses something like that for a particular SoC. Please keep in mind that this is a Freescale-specific driver... it's not as if we're attaching this dependency to common SDHCI code. > > > > > Dealing with Si revs is a common problem. We should have a > > > > common solution. There is soc_device for this purpose. > > > > > > Exactly. The last time this came up, I think we agreed to implement a > > > helper using glob_match() on the soc_device strings. Unfortunately > > > this hasn't happened then, but I'd still prefer that over yet another > > > vendor-specific way of dealing with the generic issue. > > > > soc_device would require encoding the SVR as a string and then decoding > > the string, which is more complicated and error prone than having > > platform-specific code test a platform-specific number. > > You already need to encode it as a string to register the soc_device, No we don't, because we don't already register a soc_device on arm64 or ppc (and it looks like whatever does get registered on at least some relevant arm32 chips is not particularly useful). > and the driver just needs to pass a glob string, so the only part that > is missing is the generic function that takes the string from the >
Re: [v7, 0/5] Fix eSDHC host version register bug
On Fri, 2016-04-01 at 11:07 +0800, Yangbo Lu wrote: > This patchset is used to fix a host version register bug in the T4240-R1.0 > -R2.0 > eSDHC controller. To get the SoC version and revision, it's needed to add > the > GUTS driver to access the global utilities registers. > > So, the first three patches are to add the GUTS driver. > The following two patches are to enable GUTS driver support to get SVR in > eSDHC > driver and fix host version for T4240. > > Yangbo Lu (5): > ARM64: dts: ls2080a: add device configuration node > soc: fsl: add GUTS driver for QorIQ platforms > dt: move guts devicetree doc out of powerpc directory > powerpc/fsl: move mpc85xx.h to include/linux/fsl > mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0 Acked-by: Scott Wood <o...@buserror.net> -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v6, 3/5] dt: move guts devicetree doc out of powerpc directory
On 03/17/2016 12:06 PM, Rob Herring wrote: > On Wed, Mar 09, 2016 at 06:08:49PM +0800, Yangbo Lu wrote: >> Move guts devicetree doc to Documentation/devicetree/bindings/soc/fsl/ >> since it's used by not only PowerPC but also ARM. And add a specification >> for 'little-endian' property. >> >> Signed-off-by: Yangbo Lu>> --- >> Changes for v2: >> - None >> Changes for v3: >> - None >> Changes for v4: >> - Added this patch >> Changes for v5: >> - Modified the description for little-endian property >> Changes for v6: >> - None >> --- >> Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt | 3 +++ >> 1 file changed, 3 insertions(+) >> rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%) >> >> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt >> b/Documentation/devicetree/bindings/soc/fsl/guts.txt >> similarity index 91% >> rename from Documentation/devicetree/bindings/powerpc/fsl/guts.txt >> rename to Documentation/devicetree/bindings/soc/fsl/guts.txt >> index b71b203..07adca9 100644 >> --- a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt >> +++ b/Documentation/devicetree/bindings/soc/fsl/guts.txt >> @@ -25,6 +25,9 @@ Recommended properties: >> - fsl,liodn-bits : Indicates the number of defined bits in the LIODN >> registers, for those SOCs that have a PAMU device. >> >> + - little-endian : Indicates that the global utilities block is little >> + endian. The default is big endian. > > The default is "the native endianness of the system". So absence on an > ARM system would be LE. No. For this binding, the default is big-endian, because that's what existed for this device before an endian property was added. "endianness of the system" is not a well-defined concept. > This property is valid for any simple-bus device, Since when does simple-bus mean anything more than that the nodes underneath it can be used without bus-specific knowledge? > so it isn't really required to document per device. You can, but > your description had better match the documented behaviour. Documented where? In fact, Documentation/devicetree/bindings/common-properties.txt explicitly says of the endian properties, "If a binding supports these properties, then the binding should also specify the default behavior if none of these properties are present." -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v6, 2/5] soc: fsl: add GUTS driver for QorIQ platforms
On 03/09/2016 04:18 AM, Yangbo Lu wrote: > +#ifdef CONFIG_FSL_GUTS > +u32 fsl_guts_get_svr(void); > +int fsl_guts_init(void); > +#endif Don't ifdef prototypes (when not providing a stub alternative). -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v6, 5/5] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
On 03/14/2016 02:29 AM, Yangbo Lu wrote: >> -Original Message- >> From: Arnd Bergmann [mailto:a...@arndb.de] >> Sent: Monday, March 14, 2016 6:26 AM >> To: linuxppc-...@lists.ozlabs.org >> Cc: Yangbo Lu; devicet...@vger.kernel.org; linux-arm- >> ker...@lists.infradead.org; linux-ker...@vger.kernel.org; linux- >> c...@vger.kernel.org; linux-...@vger.kernel.org; iommu@lists.linux- >> foundation.org; net...@vger.kernel.org; linux-...@vger.kernel.org; >> ulf.hans...@linaro.org; Zhao Qiang; Russell King; Bhupesh Sharma; Joerg >> Roedel; Santosh Shilimkar; Scott Wood; Rob Herring; Claudiu Manoil; Kumar >> Gala; Yang-Leo Li; Xiaobo Xie >> Subject: Re: [v6, 5/5] mmc: sdhci-of-esdhc: fix host version for T4240- >> R1.0-R2.0 >> >> On Wednesday 09 March 2016 18:08:51 Yangbo Lu wrote: >>> @@ -567,10 +580,20 @@ static void esdhc_init(struct platform_device >> *pdev, struct sdhci_host *host) >>> struct sdhci_pltfm_host *pltfm_host; >>> struct sdhci_esdhc *esdhc; >>> u16 host_ver; >>> + u32 svr; >>> >>> pltfm_host = sdhci_priv(host); >>> esdhc = sdhci_pltfm_priv(pltfm_host); >>> >>> + fsl_guts_init(); >>> + svr = fsl_guts_get_svr(); >>> + if (svr) { >>> + esdhc->soc_ver = SVR_SOC_VER(svr); >>> + esdhc->soc_rev = SVR_REV(svr); >>> + } else { >>> + dev_err(>dev, "Failed to get SVR value!\n"); >>> + } >>> + >> >> This makes the driver non-portable. Better identify the specific >> workarounds based on the compatible string for this device, or add a >> boolean DT property for the quirk. >> >> Arnd > > [Lu Yangbo-B47093] Hi Arnd, we did have a discussion about using DTS in v1 > before. > https://patchwork.kernel.org/patch/6834221/ > > We don’t have a separate DTS file for each revision of an SOC and if we did, > we'd constantly have people using the wrong one. > In addition, the device tree is stable ABI and errata are often discovered > after device tree are deployed. > See the link for details. > > So we decide to read SVR from the device-config/guts MMIO block other than > using DTS. > Thanks. Also note that this driver is already only for fsl-specific hardware, and it will still work even if fsl_guts doesn't find anything to bind to -- it just wouldn't be able to detect errata based on SVR in that case. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v6, 5/5] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
On 03/17/2016 12:06 PM, Arnd Bergmann wrote: > On Thursday 17 March 2016 12:01:01 Rob Herring wrote: >> On Mon, Mar 14, 2016 at 05:45:43PM +0000, Scott Wood wrote: > >>>>> This makes the driver non-portable. Better identify the specific >>>>> workarounds based on the compatible string for this device, or add a >>>>> boolean DT property for the quirk. >>>>> >>>>>Arnd >>>> >>>> [Lu Yangbo-B47093] Hi Arnd, we did have a discussion about using DTS in v1 >>>> before. >>>> https://patchwork.kernel.org/patch/6834221/ >>>> >>>> We don’t have a separate DTS file for each revision of an SOC and if we >>>> did, we'd constantly have people using the wrong one. >>>> In addition, the device tree is stable ABI and errata are often discovered >>>> after device tree are deployed. >>>> See the link for details. >>>> >>>> So we decide to read SVR from the device-config/guts MMIO block other than >>>> using DTS. >>>> Thanks. >>> >>> Also note that this driver is already only for fsl-specific hardware, >>> and it will still work even if fsl_guts doesn't find anything to bind to >>> -- it just wouldn't be able to detect errata based on SVR in that case. >> >> IIRC, it is the same IP block as i.MX and Arnd's point is this won't >> even compile on !PPC. It is things like this that prevent sharing the >> driver. The whole point of using the MMIO SVR instead of the PPC SPR is so that it will work on ARM... The guts driver should build on any platform as long as OF is enabled, and if it doesn't find a node to bind to it will return 0 for SVR, and the eSDHC driver will continue (after printing an error that should be removed) without the ability to test for errata based on SVR. > I think the first four patches take care of building for ARM, > but the problem remains if you want to enable COMPILE_TEST as > we need for certain automated checking. What specific problem is there with COMPILE_TEST? >> Dealing with Si revs is a common problem. We should have a >> common solution. There is soc_device for this purpose. > > Exactly. The last time this came up, I think we agreed to implement a > helper using glob_match() on the soc_device strings. Unfortunately > this hasn't happened then, but I'd still prefer that over yet another > vendor-specific way of dealing with the generic issue. soc_device would require encoding the SVR as a string and then decoding the string, which is more complicated and error prone than having platform-specific code test a platform-specific number. And when would it get registered on arm64, which doesn't have platform code? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/fsl: Fix the dependency check for PAMU driver.
On Thu, 2015-05-14 at 23:11 +0530, Varun Sethi wrote: Fix the build dependency for the PAMU driver. PPC32 build dependecy is incorrect. Add the CORENET_GENERIC build dependency for PAMU driver. Signed-off-by: Varun Sethi varun.se...@freescale.com --- drivers/iommu/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 1ae4e54..4ace8db 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -50,7 +50,7 @@ config OF_IOMMU config FSL_PAMU bool Freescale IOMMU support - depends on PPC32 + depends on CORENET_GENERIC depends on PPC_E500MC || COMPILE_TEST select IOMMU_API select GENERIC_ALLOCATOR CORENET_GENERIC is for board support. There is no guarantee that all corenet boards will use it. You already depend on PPC_E500MC; why do you need anything else (besides probably getting rid of || COMPILE_TEST which is useless if you do add CORENET_GENERIC, because CORENET_GENERIC implies PPC_E500MC)? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device()
On Sat, 2014-02-15 at 12:19 -0600, Yoder Stuart-B08248 wrote: -Original Message- From: Greg KH [mailto:gre...@linuxfoundation.org] Sent: Saturday, February 15, 2014 11:34 AM To: Yoder Stuart-B08248 Cc: Antonios Motakis; alex.william...@redhat.com; kvm...@lists.cs.columbia.edu; iommu@lists.linux-foundation.org; linux- ker...@vger.kernel.org; t...@virtualopensystems.com; a.r...@virtualopensystems.com; kim.phill...@linaro.org; jan.kis...@siemens.com; k...@vger.kernel.org; Bhushan Bharat-R65777; Wood Scott-B07421; christoffer.d...@linaro.org; ag...@suse.de; Sethi Varun- B16395; will.dea...@arm.com; Tejun Heo; Rafael J. Wysocki; Guenter Roeck; Toshi Kani; Joe Perches; Dmitry Kasatkin; Michal Hocko; Bjorn Helgaas Subject: Re: [RFC PATCH v4 01/10] driver core: export driver_probe_device() On Sat, Feb 15, 2014 at 04:33:44PM +, Stuart Yoder wrote: Are you in principle opposed to any mechanism that would allow 2 drivers to be resident/active and allow a sysadmin to explicitly bind a particular device instance to the driver of their choice? No, that works today with the bind/unbind/new_id files, it's just that you don't like it :) We don't like it because of the ambiguities/race-conditions with the current situation. Plus, it's semantically weird (a.k.a. a hack). The user isn't trying to bind an entire type of device to the vfio driver, but rather a specific device. Races and similar ugliness is often what you get when you try to pile things on top of the wrong abstraction. That you can hack around the races with a userspace loop (and hope that no damage was done by the wrong driver in the meantime -- packets sent, filesystems automounted, other inappropriate I/O performed, driver unbind bugs/unwillingness encountered, etc) is not a particularly satisfying answer. At best the race fixup will end up being a poorly tested code path (if the person scripting userspace thinks of doing it at all). It also doesn't work today because there is no new_id for platform devices, and the matching situation for platform devices is more complicated than on PCI, so it would be more awkward to implement and more awkward to use. We can apply enough grease and pound the square peg through the round hole if we must, but we'd like to first exhaust our options for doing it in a simple, straightforward, robust, and semantically sensible manner -- especially since once we start supporting the new_id approach for vfio binding on platform devices it'll be ABI that we're stuck with. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v4 06/10] VFIO_PLATFORM: Read and write support for the device fd
On Mon, 2014-02-10 at 15:45 -0700, Alex Williamson wrote: On Sat, 2014-02-08 at 18:29 +0100, Antonios Motakis wrote: VFIO returns a file descriptor which we can use to manipulate the memory regions of the device. Since some memory regions we cannot mmap due to security concerns, we also allow to read and write to this file descriptor directly. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com Tested-by: Alvise Rigo a.r...@virtualopensystems.com --- drivers/vfio/platform/vfio_platform.c | 128 +- 1 file changed, 125 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c index f7db5c0..ee96078 100644 --- a/drivers/vfio/platform/vfio_platform.c +++ b/drivers/vfio/platform/vfio_platform.c @@ -55,7 +55,8 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev) region.addr = res-start; region.size = resource_size(res); - region.flags = 0; + region.flags = VFIO_REGION_INFO_FLAG_READ + | VFIO_REGION_INFO_FLAG_WRITE; vdev-region[i] = region; } @@ -150,13 +151,134 @@ static long vfio_platform_ioctl(void *device_data, static ssize_t vfio_platform_read(void *device_data, char __user *buf, size_t count, loff_t *ppos) { - return 0; + struct vfio_platform_device *vdev = device_data; + unsigned int *io; + int i; + + for (i = 0; i vdev-num_regions; i++) { + struct vfio_platform_region region = vdev-region[i]; + unsigned int done = 0; + loff_t off; + + if ((*ppos region.addr) +|| (*ppos + count - 1) = (region.addr + region.size)) + continue; Perhaps there's something to be said for vfio-pci's use of fixed offsets to have a direct offset to index lookup. + + io = ioremap_nocache(region.addr, region.size); This must incur some overhead per access. There's mmap() if you want fast... Given the limited ioremap space on 32-bit, I can see not wanting to map everything that the user has open all the time -- but in that case, wouldn't it be better to just map one page here rather than the whole region? + + off = *ppos - region.addr; + + while (count) { + size_t filled; + + if (count = 4 !(off % 4)) { + u32 val; + + val = ioread32(io + off); + if (copy_to_user(buf, val, 4)) + goto err; For vfio-pci we've decided that these interfaces are always little endian, have you considered whether it makes sense to do something similar here? Thanks, ioread32() is little endian -- but since read() puts its result in the caller's memory buffer (rather than a register return), I think it makes more sense to preserve byte-invariance -- similar to the conclusion of the recent KVM MMIO API clarification discussion. Then the VFIO user would use the same type of access (byte swapped or not) to access the read() buffer that they would have used to access the register directly. Forcing little endian is a better fit for PCI (which is inherently little endian) than for platform devices which can be either endianness. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
My e-mail address is scottw...@freescale.com, not IMCEAEX-_O=MMS_OU=EXTERNAL+20+28FYDIBOHF25SPDLT +29_CN=RECIPIENTS_CN=f0faac8d7e74473a9ee1c45b068d8...@namprd03.prod.outlook.com On Tue, 2013-12-10 at 05:37 +, bharat.bhus...@freescale.com wrote: -Original Message- From: Wood Scott-B07421 Sent: Saturday, December 07, 2013 12:55 AM To: Bhushan Bharat-R65777 Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) If the administrator does not opt into this partial loss of isolation, then once you run out of MSI groups, new users should not be able to set up MSIs. So mean vfio should use Legacy when out of MSI banks? Yes, if the administrator hasn't granted permission to share. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote: -Original Message- From: Wood Scott-B07421 Sent: Friday, December 06, 2013 5:52 AM To: Bhushan Bharat-R65777 Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Wednesday, November 27, 2013 9:39 PM To: 'Alex Williamson' Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) If we just provide the size of MSI bank to userspace then userspace cannot do anything wrong. So userspace does not know address, so it cannot mmap and cause any interference by directly reading/writing. That's security through obscurity... Couldn't the malicious user find out the address via other means, such as experimentation on another system over which they have full control? What would happen if the user reads from their device's PCI config space? Or gets the information via some back door in the PCI device they own? Or pokes throughout the address space looking for something that generates an interrupt to its own device? So how to solve this problem, Any suggestion ? We have to map one window in PAMU for MSIs and a malicious user can ask its device to do DMA to MSI window region with any pair of address and data, which can lead to unexpected MSIs in system? I don't think there are any solutions other than to limit each bank to one user, unless the admin turns some knob that says they're OK with the partial loss of isolation. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
On Thu, 2013-12-05 at 22:17 -0600, Bharat Bhushan wrote: -Original Message- From: Wood Scott-B07421 Sent: Friday, December 06, 2013 5:31 AM To: Bhushan Bharat-R65777 Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) On Sun, 2013-11-24 at 23:33 -0600, Bharat Bhushan wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, November 22, 2013 2:31 AM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) On Thu, 2013-11-21 at 14:47 -0600, Scott Wood wrote: They can interfere. Want to be sure of how they can interfere? If more than one VFIO user shares the same MSI group, one of the users can send MSIs to another user, by using the wrong interrupt within the bank. Unexpected MSIs could cause misbehavior or denial of service. With this hardware, the only way to prevent that is to make sure that a bank is not shared by multiple protection contexts. For some of our users, though, I believe preventing this is less important than the performance benefit. So should we let this patch series in without protection? No, there should be some sort of opt-in mechanism similar to IOMMU-less VFIO -- but not the same exact one, since one is a much more serious loss of isolation than the other. Can you please elaborate opt-in mechanism? The system should be secure by default. If the administrator wants to relax protection in order to accomplish some functionality, that should require an explicit request such as a write to a sysfs file. I think we need some sort of ownership model around the msi banks then. Otherwise there's nothing preventing another userspace from attempting an MSI based attack on other users, or perhaps even on the host. VFIO can't allow that. Thanks, We have very few (3 MSI bank on most of chips), so we can not assign one to each userspace. That depends on how many users there are. What I think we can do is: - Reserve one MSI region for host. Host will not share MSI region with Guest. - For upto 2 Guest (MAX msi with host - 1) give then separate MSI sub regions - Additional Guest will share MSI region with other guest. Any better suggestion are most welcome. If the administrator does not opt into this partial loss of isolation, then once you run out of MSI groups, new users should not be able to set up MSIs. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
On Fri, 2013-12-06 at 12:30 -0700, Alex Williamson wrote: On Fri, 2013-12-06 at 12:59 -0600, Scott Wood wrote: On Thu, 2013-12-05 at 22:11 -0600, Bharat Bhushan wrote: -Original Message- From: Wood Scott-B07421 Sent: Friday, December 06, 2013 5:52 AM To: Bhushan Bharat-R65777 Cc: Alex Williamson; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Wednesday, November 27, 2013 9:39 PM To: 'Alex Williamson' Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) If we just provide the size of MSI bank to userspace then userspace cannot do anything wrong. So userspace does not know address, so it cannot mmap and cause any interference by directly reading/writing. That's security through obscurity... Couldn't the malicious user find out the address via other means, such as experimentation on another system over which they have full control? What would happen if the user reads from their device's PCI config space? Or gets the information via some back door in the PCI device they own? Or pokes throughout the address space looking for something that generates an interrupt to its own device? So how to solve this problem, Any suggestion ? We have to map one window in PAMU for MSIs and a malicious user can ask its device to do DMA to MSI window region with any pair of address and data, which can lead to unexpected MSIs in system? I don't think there are any solutions other than to limit each bank to one user, unless the admin turns some knob that says they're OK with the partial loss of isolation. Even if the admin does opt-in to an allow_unsafe_interrupts options, it should still be reasonably difficult for one guest to interfere with the other. I don't think we want to rely on the blind luck of making the full MSI bank accessible to multiple guests and hoping they don't step on each other. That probably means that vfio needs to manage the space rather than the guest. Thanks, Yes, the MSIs within a given bank would be allocated by the host kernel in any case (presumably by the MSI driver, not VFIO itself). This is just about what happens if the MSI page is written to outside of the normal mechanism. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
On Sun, 2013-11-24 at 23:33 -0600, Bharat Bhushan wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, November 22, 2013 2:31 AM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) On Thu, 2013-11-21 at 14:47 -0600, Scott Wood wrote: On Thu, 2013-11-21 at 13:43 -0700, Alex Williamson wrote: On Thu, 2013-11-21 at 11:20 +, Bharat Bhushan wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, November 21, 2013 12:17 AM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; bhelg...@google.com; ag...@suse.de; Wood Scott-B07421; Yoder Stuart-B08248; iommu@lists.linux-foundation.org; linux- p...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) Is VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT per aperture (ie. each vfio user has $COUNT regions at their disposal exclusively)? Number of msi-bank count is system wide and not per aperture, But will be setting windows for banks in the device aperture. So say if we are direct assigning 2 pci device (both have different iommu group, so 2 aperture in iommu) to VM. Now qemu can make only one call to know how many msi-banks are there but it must set sub-windows for all banks for both pci device in its respective aperture. I'm still confused. What I want to make sure of is that the banks are independent per aperture. For instance, if we have two separate userspace processes operating independently and they both chose to use msi bank zero for their device, that's bank zero within each aperture and doesn't interfere. Or another way to ask is can a malicious user interfere with other users by using the wrong bank. Thanks, They can interfere. Want to be sure of how they can interfere? If more than one VFIO user shares the same MSI group, one of the users can send MSIs to another user, by using the wrong interrupt within the bank. Unexpected MSIs could cause misbehavior or denial of service. With this hardware, the only way to prevent that is to make sure that a bank is not shared by multiple protection contexts. For some of our users, though, I believe preventing this is less important than the performance benefit. So should we let this patch series in without protection? No, there should be some sort of opt-in mechanism similar to IOMMU-less VFIO -- but not the same exact one, since one is a much more serious loss of isolation than the other. I think we need some sort of ownership model around the msi banks then. Otherwise there's nothing preventing another userspace from attempting an MSI based attack on other users, or perhaps even on the host. VFIO can't allow that. Thanks, We have very few (3 MSI bank on most of chips), so we can not assign one to each userspace. That depends on how many users there are. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
On Thu, 2013-11-28 at 03:19 -0600, Bharat Bhushan wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Wednesday, November 27, 2013 9:39 PM To: 'Alex Williamson' Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Monday, November 25, 2013 10:08 PM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart- B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) On Mon, 2013-11-25 at 05:33 +, Bharat Bhushan wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, November 22, 2013 2:31 AM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linux-...@vger.kernel.org; ag...@suse.de; Yoder Stuart-B08248; iommu@lists.linux-foundation.org; bhelg...@google.com; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) On Thu, 2013-11-21 at 14:47 -0600, Scott Wood wrote: On Thu, 2013-11-21 at 13:43 -0700, Alex Williamson wrote: On Thu, 2013-11-21 at 11:20 +, Bharat Bhushan wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, November 21, 2013 12:17 AM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; bhelg...@google.com; ag...@suse.de; Wood Scott-B07421; Yoder Stuart-B08248; iommu@lists.linux-foundation.org; linux- p...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU) Is VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT per aperture (ie. each vfio user has $COUNT regions at their disposal exclusively)? Number of msi-bank count is system wide and not per aperture, But will be setting windows for banks in the device aperture. So say if we are direct assigning 2 pci device (both have different iommu group, so 2 aperture in iommu) to VM. Now qemu can make only one call to know how many msi-banks are there but it must set sub-windows for all banks for both pci device in its respective aperture. I'm still confused. What I want to make sure of is that the banks are independent per aperture. For instance, if we have two separate userspace processes operating independently and they both chose to use msi bank zero for their device, that's bank zero within each aperture and doesn't interfere. Or another way to ask is can a malicious user interfere with other users by using the wrong bank. Thanks, They can interfere. Want to be sure of how they can interfere? What happens if more than one user selects the same MSI bank? Minimally, wouldn't that result in the IOMMU blocking transactions from the previous user once the new user activates their mapping? Yes and no; With current implementation yes but with a minor change no. Later in this response I will explain how. With this hardware, the only way to prevent that is to make sure that a bank is not shared by multiple protection contexts. For some of our users, though, I believe preventing this is less important than the performance benefit. So should we let this patch series in without protection? No. I think we need some sort of ownership model around the msi banks then. Otherwise there's nothing preventing another userspace from attempting an MSI based attack on other users, or perhaps even on the host. VFIO can't allow that. Thanks, We have very few (3 MSI bank on most of chips), so we can not assign one to each userspace. What we can do is host and userspace does not share a MSI bank while userspace will share a MSI bank. Then you probably need VFIO to own the MSI bank and program devices into it rather than exposing the MSI banks to userspace to let them have direct access. Overall idea of exposing the details of msi regions to userspace are 1) User space can define the aperture size to fit MSI mapping in IOMMU. 2) setup iova for a MSI banks; which is just after guest memory
Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes
On Thu, 2013-11-14 at 21:16 -0600, Sethi Varun-B16395 wrote: Haiying/Scott, Forgot to mention this, the PAMU driver has to handle stash destination settings both for power and dsp cores (on B4 platform). For the dsp cores we would expect the physical core id (not controlled by Linux). To make the interface consistent, I would expect the caller (for iommu_set_attr) to pass the physical core id. That sounds like you need two different interfaces. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes
On Mon, 2013-11-18 at 20:42 -0600, Varun Sethi wrote: For the DSP case again we have to set up the stash attribute. Are you saying that this should be a separate attribute? Not necessarily a separate attribute, but there should be some way to distinguish whether you're providing a Linux cpu number or some external stash destination. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes
On Thu, 2013-11-14 at 14:30 -0500, Haiying Wang wrote: In the case we miss to bring up some cpus, we need to make sure we can find the correct cpu nodes in the device tree based on the given logical cpu index from the caller. Signed-off-by: Haiying Wang haiying.w...@freescale.com --- drivers/iommu/fsl_pamu.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index cba0498..a9ab57b 100644 --- a/drivers/iommu/fsl_pamu.c +++ b/drivers/iommu/fsl_pamu.c @@ -539,6 +539,7 @@ u32 get_stash_id(u32 stash_dest_hint, u32 vcpu) Should probably also s/vcpu/cpu/g as vcpu makes no sense outside of virtualization code. u32 cache_level; int len, found = 0; int i; + u32 cpuid = get_hard_smp_processor_id(vcpu); s/cpuid/phys_cpu/ or similar -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/7] powerpc: Add interface to get msi region information
On Tue, 2013-10-08 at 10:47 -0600, Bjorn Helgaas wrote: On Thu, Oct 3, 2013 at 11:19 PM, Bhushan Bharat-R65777 r65...@freescale.com wrote: I don't know enough about VFIO to understand why these new interfaces are needed. Is this the first VFIO IOMMU driver? I see vfio_iommu_spapr_tce.c and vfio_iommu_type1.c but I don't know if they're comparable to the Freescale PAMU. Do other VFIO IOMMU implementations support MSI? If so, do they handle the problem of mapping the MSI regions in a different way? PAMU is an aperture type of IOMMU while other are paging type, So they are completely different from what PAMU is and handle that differently. This is not an explanation or a justification for adding new interfaces. I still have no idea what an aperture type IOMMU is, other than that it is different. But I see that Alex is working on this issue with you in a different thread, so I'm sure you guys will sort it out. PAMU is a very constrained IOMMU that cannot do arbitrary page mappings. Due to these constraints, we cannot map the MSI I/O page at its normal address while also mapping RAM at the address we want. The address we can map it at depends on the addresses of other mappings, so it can't be hidden in the IOMMU driver -- the user needs to be in control. Another difference is that (if I understand correctly) PCs handle MSIs specially, via interrupt remapping, rather than being translated as a normal memory access through the IOMMU. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/7] powerpc: Add interface to get msi region information
On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: @@ -376,6 +405,7 @@ static int fsl_of_msi_probe(struct platform_device *dev) int len; u32 offset; static const u32 all_avail[] = { 0, NR_MSI_IRQS }; + static int bank_index; match = of_match_device(fsl_of_msi_ids, dev-dev); if (!match) @@ -419,8 +449,8 @@ static int fsl_of_msi_probe(struct platform_device *dev) dev-dev.of_node-full_name); goto error_out; } - msi-msiir_offset = - features-msiir_offset + (res.start 0xf); + msi-msiir = res.start + features-msiir_offset; + printk(msi-msiir = %llx\n, msi-msiir); dev_dbg or remove } msi-feature = features-fsl_pic_ip; @@ -470,6 +500,7 @@ static int fsl_of_msi_probe(struct platform_device *dev) } } + msi-bank_index = bank_index++; What if multiple MSIs are boing probed in parallel? bank_index is not atomic. diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h index 8225f86..6bd5cfc 100644 --- a/arch/powerpc/sysdev/fsl_msi.h +++ b/arch/powerpc/sysdev/fsl_msi.h @@ -29,12 +29,19 @@ struct fsl_msi { struct irq_domain *irqhost; unsigned long cascade_irq; - - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */ + dma_addr_t msiir; /* MSIIR Address in CCSR */ Are you sure dma_addr_t is right here, versus phys_addr_t? It implies that it's the output of the DMA API, but I don't think the DMA API is used in the MSI driver. Perhaps it should be, but we still want the raw physical address to pass on to VFIO. void __iomem *msi_regs; u32 feature; int msi_virqs[NR_MSI_REG]; + /* + * During probe each bank is assigned a index number. + * index number ranges from 0 to 2^32. + * Example MSI bank 1 = 0 + * MSI bank 2 = 1, and so on. + */ + int bank_index; 2^32 doesn't fit in int (nor does 2^32 - 1). Just say that indices start at 0. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/7] powerpc: Add interface to get msi region information
On Tue, 2013-10-08 at 17:25 -0600, Bjorn Helgaas wrote: - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */ + dma_addr_t msiir; /* MSIIR Address in CCSR */ Are you sure dma_addr_t is right here, versus phys_addr_t? It implies that it's the output of the DMA API, but I don't think the DMA API is used in the MSI driver. Perhaps it should be, but we still want the raw physical address to pass on to VFIO. I don't know what msiir is used for, but if it's an address you program into a PCI device, then it's a dma_addr_t even if you didn't get it from the DMA API. Maybe bus_addr_t would have been a more suggestive name than dma_addr_t. That said, I have no idea how this relates to VFIO. It's a bit awkward because it gets used both as something to program into a PCI device (and it's probably a bug that the DMA API doesn't get used), and also (if I understand the current plans correctly) as a physical address to give to VFIO to be a destination address in an IOMMU mapping. So I think the value we keep here should be a phys_addr_t (it comes straight from the MMIO address in the device tree), which gets trivially turned into a dma_addr_t by the non-VFIO code path because there's currently no translation there. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3 v13] iommu/fsl: Add additional iommu attributes required by the PAMU driver.
On 04/22/2013 12:31:55 AM, Varun Sethi wrote: Added the following domain attributes for the FSL PAMU driver: 1. Added new iommu stash attribute, which allows setting of the LIODN specific stash id parameter through IOMMU API. 2. Added an attribute for enabling/disabling DMA to a particular memory window. 3. Added domain attribute to check for PAMUV1 specific constraints. Signed-off-by: Varun Sethi varun.se...@freescale.com --- v13 changes: - created a new file include/linux/fsl_pamu_stash.h for stash attributes. v12 changes: - Moved PAMU specifc stash ids and structures to PAMU header file. - no change in v11. - no change in v10. include/linux/fsl_pamu_stash.h | 39 +++ include/linux/iommu.h | 16 2 files changed, 55 insertions(+), 0 deletions(-) create mode 100644 include/linux/fsl_pamu_stash.h diff --git a/include/linux/fsl_pamu_stash.h b/include/linux/fsl_pamu_stash.h new file mode 100644 index 000..caa1b21 --- /dev/null +++ b/include/linux/fsl_pamu_stash.h @@ -0,0 +1,39 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright (C) 2013 Freescale Semiconductor, Inc. + * + */ + +#ifndef __FSL_PAMU_STASH_H +#define __FSL_PAMU_STASH_H + +/* cache stash targets */ +enum pamu_stash_target { + PAMU_ATTR_CACHE_L1 = 1, + PAMU_ATTR_CACHE_L2, + PAMU_ATTR_CACHE_L3, +}; + +/* + * This attribute allows configuring stashig specific parameters + * in the PAMU hardware. + */ + +struct pamu_stash_attribute { + u32 cpu;/* cpu number */ + u32 cache; /* cache to stash to: L1,L2,L3 */ +}; + +#endif /* __FSL_PAMU_STASH_H */ diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2727810..c5dc2b9 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -57,10 +57,26 @@ struct iommu_domain { #define IOMMU_CAP_CACHE_COHERENCY 0x1 #define IOMMU_CAP_INTR_REMAP 0x2 /* isolates device intrs */ +/* + * Following constraints are specifc to PAMUV1: FSL_PAMUV1 + * -aperture must be power of 2, and naturally aligned + * -number of windows must be power of 2, and address space size + * of each window is determined by aperture size / # of windows + * -the actual size of the mapped region of a window must be power + * of 2 starting with 4KB and physical address must be naturally + * aligned. + * DOMAIN_ATTR_FSL_PAMUV1 corresponds to the above mentioned contraints. + * The caller can invoke iommu_domain_get_attr to check if the underlying + * iommu implementation supports these constraints. + */ + enum iommu_attr { DOMAIN_ATTR_GEOMETRY, DOMAIN_ATTR_PAGING, DOMAIN_ATTR_WINDOWS, + DOMAIN_ATTR_PAMU_STASH, + DOMAIN_ATTR_PAMU_ENABLE, + DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_MAX, Please be consistent on whether PAMU gets an FSL_ namespace prefix (I'd prefer that it does). -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc (v3)
On 04/11/2013 07:56:59 AM, Joerg Roedel wrote: On Tue, Apr 09, 2013 at 01:22:15AM +, Yoder Stuart-B08248 wrote: What happens if a normal unmap call is done on the MSI iova? Do we need a separate unmap? I was thinking a normal unmap on an MSI windows would be an error...but I'm not set on that. I put the msi unmap there to make things symmetric, a normal unmap would work as well...and then we could drop the msi unmap. Hmm, this API semantic isn't very clean. When you explicitly map the MSI banks a clean API would also allow to unmap them. But that is not possible in your design because the kernel is responsible for mapping MSIs and you can't unmap a MSI bank that is in use by the kernel. Why is it not possible to unmap them? Once they've been mapped, they're just like any other IOMMU mapping. If the user breaks MSI for their own devices by unmapping the MSI page, that's their problem. So since the kernel owns the MSI setup anyways it should also take care of mapping the MSI banks. What is the reason to not let the kernel allocate the MSI banks top-down from the end of the DMA window space? It's less flexible, and possibly more complicated. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc (v3)
On 04/04/2013 05:10:27 PM, Yoder Stuart-B08248 wrote: /* * VFIO_IOMMU_PAMU_UNMAP_MSI_BANK * * Unmaps the MSI bank at the specified iova. * Caller provides struct vfio_pamu_msi_bank_unmap with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_unmap { __u32 argsz; __u32 flags; /* no flags currently */ __u64 iova; /* the iova to be unmapped to */ }; #define VFIO_IOMMU_PAMU_UNMAP_MSI_BANK _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_bank_unmap ) What happens if a normal unmap call is done on the MSI iova? Do we need a separate unmap? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/03/2013 01:32:26 PM, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 5:50 PM, Scott Wood scottw...@freescale.com wrote: On 04/02/2013 04:38:45 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: VFIO_IOMMU_MAP_MSI(iova, size) Not sure how you mean size to be used -- for MPIC it would be 4K per bank, and you can only map one bank at a time (which bank you're mapping should be a parameter, if only so that the kernel doesn't have to keep iteration state for you). The intent was for user space to tell the kernel which windows to use for MSI. So I envisioned a total size of window-size * msi-bank-count. Size doesn't tell the kernel *which* banks to use, only how many. If it already knows which banks are used by the group, then it also knows how many are used. And size is misleading because the mapping is not generally going to be contiguous. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/03/2013 02:09:45 PM, Stuart Yoder wrote: Would is be possible for userspace to simply leave room for MSI bank mapping (how much room could be determined by something like VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can DMA_MAP starting at the 0x0 address of the aperture, growing up, and VFIO will map banks on demand at the top of the aperture, growing down? Wouldn't that avoid a lot of issues with userspace needing to know anything about MSI banks (other than count) and coordinating irq numbers and enabling handlers? This is basically option #A in the original proposals sent. I like this approach, in that it is simpler and keeps user space mostly out of this...which is consistent with how things are done on x86. User space just needs to know how many windows to leave at the top of the aperture. The kernel then has the flexibility to use those windows how it wants. But one question, is when should the kernel actually map (and unmap) the MSI banks. I think userspace should explicitly request it. Userspace still wouldn't need to know anything but the count: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); // The kernel now knows where each bank has been mapped, and can update PCI config space appropriately. One thing we need to do is enable the aperture...and current thinking is that is done on the first DMA_MAP. What if there are no other mappings required? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/03/2013 02:43:06 PM, Stuart Yoder wrote: On Wed, Apr 3, 2013 at 2:18 PM, Scott Wood scottw...@freescale.com wrote: On 04/03/2013 02:09:45 PM, Stuart Yoder wrote: Would is be possible for userspace to simply leave room for MSI bank mapping (how much room could be determined by something like VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can DMA_MAP starting at the 0x0 address of the aperture, growing up, and VFIO will map banks on demand at the top of the aperture, growing down? Wouldn't that avoid a lot of issues with userspace needing to know anything about MSI banks (other than count) and coordinating irq numbers and enabling handlers? This is basically option #A in the original proposals sent. I like this approach, in that it is simpler and keeps user space mostly out of this...which is consistent with how things are done on x86. User space just needs to know how many windows to leave at the top of the aperture. The kernel then has the flexibility to use those windows how it wants. But one question, is when should the kernel actually map (and unmap) the MSI banks. I think userspace should explicitly request it. Userspace still wouldn't need to know anything but the count: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); // The kernel now knows where each bank has been mapped, and can update PCI config space appropriately. And the overall aperture enable/disable would occur on the first dma/msi map() and last dma/msi unmap()? Yes. We may want the optional ability to do an overall enable/disable for reasons we discussed a while ago, but in the absence of an explicit disable the domain would be enabled on first map. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 10:37:20 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 17:50 -0500, Scott Wood wrote: On 04/02/2013 04:38:45 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote: C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long as it's reasonable. If you mean development and maintenance effort, keeping things well separated should help. We don't need to change DMA_MAP. If we can simply add a new type 2 ioctl that allows user space to set which windows are MSIs, it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. So maybe 2 ioctls: VFIO_IOMMU_GET_MSI_COUNT Do you mean a count of actual MSIs or a count of MSI banks used by the whole VFIO group? I hope the latter, which would clarify how this is distinct from DEVICE_GET_IRQ_INFO. Is hotplug even on the table? Presumably dynamically adding a device could bring along additional MSI banks? I'm not sure -- maybe we could say that hotplug can add banks, but not remove them or change the order, so userspace would just need to check if the number of banks changed, and map the extras. The current VFIO MSI support has the host handling everything about MSI. The user never programs an MSI vector to the physical device, they set up everything through ioctl. On interrupt, we simply trigger an eventfd and leave it to things like KVM irqfd or QEMU to do the right thing in a virtual machine. Here the MSI vector has to go through a PAMU window to hit the correct MSI bank. So that means it has some component of the iova involved, which we're proposing here is controlled by userspace (whether that vector uses an offset from 0x1000 or 0x depending on which window slot is used to make the MSI bank). I assume we're still working in a model where the physical interrupt fires into the host and a host-based interrupt handler triggers an eventfd, right? Yes (subject to possible future optimizations). So that means the vector also has host components so we trigger the correct ISR. How is that coordinated? Everything but the iova component needs to come from the host MSI allocator. Would is be possible for userspace to simply leave room for MSI bank mapping (how much room could be determined by something like VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can DMA_MAP starting at the 0x0 address of the aperture, growing up, and VFIO will map banks on demand at the top of the aperture, growing down? Wouldn't that avoid a lot of issues with userspace needing to know anything about MSI banks (other than count) and coordinating irq numbers and enabling handlers? This would restrict a (possibly unlikely) use case where the user wants to map something near the top of the aperture but has another place MSIs can go (or is willing to live without MSIs). Otherwise it could be workable, as long as we can require an explicit MSI enabling on a device to happen after the aperture and subwindow count are set up. I'm not sure it would really buy anything over having userspace iterate over the MSI bank count, though -- it would probably be a bit more complicated. On x86 MSI count is very device specific, which means it wold be a VFIO_DEVICE_* ioctl (actually VFIO_DEVICE_GET_IRQ_INFO does this for us on x86). The trouble with it being a device ioctl is that you need to get the device FD, but the IOMMU protection needs to be established before you can get that... so there's an ordering problem if you need it from the device before configuring the IOMMU. Thanks, What do you mean by IOMMU protection needs to be established? Wouldn't we just start with no mappings in place? If no mappings blocks all DMA, sure, that's fine. Once the VFIO device FD is accessible
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 10:12:31 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 17:44 -0500, Scott Wood wrote: On 04/02/2013 04:32:04 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote: On 04/02/2013 03:32:17 PM, Alex Williamson wrote: On x86 the interrupt remapper handles this transparently when MSI is enabled and userspace never gets direct access to the device MSI address/data registers. x86 has a totally different mechanism here, as far as I understand -- even before you get into restrictions on mappings. So what control will userspace have over programming the actually MSI vectors on PAMU? Not sure what you mean -- PAMU doesn't get explicitly involved in MSIs. It's just another 4K page mapping (per relevant MSI bank). If you want isolation, you need to make sure that an MSI group is only used by one VFIO group, and that you're on a chip that has alias pages with just one MSI bank register each (newer chips do, but the first chip to have a PAMU didn't). How does a user figure this out? The user's involvement could be limited to setting a policy knob of whether that degree of isolation is required (if required and unavailable, all devices using an MSI bank would be forced into the same group). We'd need to do something with MSI allocation so that we avoid using an MSI bank with more than one IOMMU group where possible. I'm not sure about the details yet, or how practical this is. There might need to be some MSI bank assignment done as part of the VFIO device binding process, if there are going to be more VFIO groups than there are MSI banks (reserving one bank for host use). -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 03:32:17 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 17:32 +, Yoder Stuart-B08248 wrote: 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) for MSIs, and it would be up to the kernel to create the mappings. This approach requires some specific semantics (leaving 3 windows) and it potentially gets a little weird-- when should the kernel actually create the MSI mappings? When should they be unmapped? Some convention would need to be established. VFIO would have control of SET/GET_ATTR, right? So we could reduce the number exposed to userspace on GET and transparently add MSI entries on SET. What do you mean by reduce the number exposed? Userspace decides how many entries there are, but it must be a power of two beteen 1 and 256. On x86 the interrupt remapper handles this transparently when MSI is enabled and userspace never gets direct access to the device MSI address/data registers. x86 has a totally different mechanism here, as far as I understand -- even before you get into restrictions on mappings. What kind of restrictions do you have around adding and removing windows while the aperture is enabled? Subwindows can be modified while the aperture is enabled, but the aperture size and number of subwindows cannot be changed. B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). It's up to the kernel to determine which bank gets mapped where. So, this option puts user space in control of which windows are used for MSIs and when MSIs are mapped/unmapped. There would need to be some semantics as to how this is used-- it only makes sense This could also be done as another type2 ioctl extension. Again, what is type2, specifically? If someone else is adding their own IOMMU that is kind of, sort of like PAMU, how would they know if it's close enough? What assumptions can a user make when they see that they're dealing with type2? What's the value to userspace in determining which windows are used by which banks? That depends on who programs the MSI config space address. What is important is userspace controlling which iovas will be dedicated to this, in case it wants to put something else there. It sounds like the case that there are X banks and if userspace wants to use MSI it needs to leave X windows available for that. Is this just buying userspace a few more windows to allow them the choice between MSI or RAM? Well, there could be that. But also, userspace will generally have a much better idea of the type of mappings it's creating, so it's easier to keep everything explicit at the kernel/user interface than require more complicated code in the kernel to figure things out automatically (not just for MSIs but in general). If the kernel automatically creates the MSI mappings, when does it assume that userspace is done creating its own? What if userspace doesn't need any DMA other than the MSIs? What if userspace wants to continue dynamically modifying its other mappings? C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 03:38:42 PM, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood scottw...@freescale.com wrote: On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote: Alex, We are in the process of implementing vfio-pci support for the Freescale IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite different than x86, and will involve creating a 'type 2' vfio implementation. For each device's DMA mappings, PAMU has an overall aperture and a number of windows. All sizes and window counts must be power of 2. To illustrate, below is a mapping for a 256MB guest, including guest memory (backed by 64MB huge pages) and some windows for MSIs: Total aperture: 512MB # of windows: 8 win gphys/ # iovaphys size --- 0 0x 0xX_XX00 64MB 1 0x0400 0xX_XX00 64MB 2 0x0800 0xX_XX00 64MB 3 0x0C00 0xX_XX00 64MB 4 0x1000 0xf_fe044000 4KB// msi bank 1 5 0x1400 0xf_fe045000 4KB// msi bank 2 6 0x1800 0xf_fe046000 4KB// msi bank 3 7- - disabled There are a couple of updates needed to the vfio user-kernel interface that we would like your feedback on. 1. IOMMU geometry The kernel IOMMU driver now has an interface (see domain_set_attr, domain_get_attr) that lets us set the domain geometry using attributes. We want to expose that to user space, so envision needing a couple of new ioctls to do this: VFIO_IOMMU_SET_ATTR VFIO_IOMMU_GET_ATTR Note that this means attributes need to be updated for user-API appropriateness, such as using fixed-size types. 2. MSI window mappings The more problematic question is how to deal with MSIs. We need to create mappings for up to 3 MSI banks that a device may need to target to generate interrupts. The Linux MSI driver can allocate MSIs from the 3 banks any way it wants, and currently user space has no way of knowing which bank may be used for a given device. There are 3 options we have discussed and would like your direction: A. Implicit mappings -- with this approach user space would not explicitly map MSIs. User space would be required to set the geometry so that there are 3 unused windows (the last 3 windows) Where does userspace get the number 3 from? E.g. on newer chips there are 4 MSI banks. Maybe future chips have even more. Ok, then make the number 4. The chance of more MSI banks in future chips is nil, What makes you so sure? Especially since you seem to be presenting this as not specifically an MPIC API. and if it ever happened user space could adjust. What bit of API is going to tell it that it needs to adjust? Also, practically speaking since memory is typically allocate in powers of 2 way you need to approximately double the window geometry anyway. Only if your existing mapping needs fit exactly in a power of two. B. Explicit mapping using DMA map flags. The idea is that a new flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that a mapping is to be created for the supplied iova. No vaddr is given though. So in the above example there would be a a dma map at 0x1000 for 24KB (and no vaddr). A single 24 KiB mapping wouldn't work (and why 24KB? What if only one MSI group is involved in this VFIO group? What if four MSI groups are involved?). You'd need to either have a naturally aligned, power-of-two sized mapping that covers exactly the pages you want to map and no more, or you'd need to create a separate mapping for each MSI bank, and due to PAMU subwindow alignment restrictions these mappings could not be contiguous in iova-space. You're right, a single 24KB mapping wouldn't work-- in the case of 3 MSI banks perhaps we could just do one 64MB*3 mapping to identify which windows are used for MSIs. Where did the assumption of a 64MiB subwindow size come from? If only one MSI bank was involved the kernel could get clever and only enable the banks actually needed. I'd rather see cleverness kept in userspace. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 04:08:27 PM, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote: This could also be done as another type2 ioctl extension. Again, what is type2, specifically? If someone else is adding their own IOMMU that is kind of, sort of like PAMU, how would they know if it's close enough? What assumptions can a user make when they see that they're dealing with type2? We will define that as part of the type2 implementation. Highly unlikely anything but a PAMU will comply. So then why not just call it pamu instead of being obfuscatory? There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long as it's reasonable. If you mean development and maintenance effort, keeping things well separated should help. We don't need to change DMA_MAP. If we can simply add a new type 2 ioctl that allows user space to set which windows are MSIs, And what specifically does that ioctl do? It causes new mappings to be created, right? So you're changing (or at least adding to) the DMA map mechanism. it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. I don't see enough complexity in the mmap approach for anything to be vastly less complex in comparison. I think you're building the mmap approach up in your head to be a lot worse that it would actually be. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 04:16:11 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 15:54 -0500, Stuart Yoder wrote: The number of windows is always power of 2 (and max is 256). And to reduce PAMU cache pressure you want to use the fewest number of windows you can.So, I don't see practically how we could transparently steal entries to add the MSIs. Either user space knows to leave empty windows for MSIs and by convention the kernel knows which windows those are (as in option #A) or explicitly tell the kernel which windows (as in option #B). Ok, apparently I don't understand the API. Is it something like userspace calls GET_ATTR and finds out that there are 256 available windows, userspace determines that it needs 8 for RAM and then it has an MSI device, so it needs to call SET_ATTR and ask for 16? That seems prone to exploitation by the first userspace to allocate it's aperture, What exploitation? It's not as if there is a pool of 256 global windows that users allocate from. The subwindow count is just how finely divided the aperture is. The only way one user will affect another is through cache contention (which is why we want the minimum number of subwindows that we can get away with). but I'm also not sure why userspace could specify the (non-power of 2) number of windows it needs for RAM, then VFIO would see that the devices attached have MSI and add those windows and align to a power of 2. If you double the subwindow count without userspace knowing, you have to double the aperture as well (and you may need to grow up or down depending on alignment). This means you also need to halve the maximum aperture that userspace can request. And you need to expose a different number of maximum subwindows in the IOMMU API based on whether we might have MSIs of this type. It's ugly and awkward, and removes the possibility for userspace to place the MSIs in some unused slot in the middle, or not use MSIs at all. -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc
On 04/02/2013 04:38:45 PM, Alex Williamson wrote: On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood scottw...@freescale.com wrote: C. Explicit mapping using normal DMA map. The last idea is that we would introduce a new ioctl to give user-space an fd to the MSI bank, which could be mmapped. The flow would be something like this: -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD -user space mmaps the fd, getting a vaddr -user space does a normal DMA map for desired iova This approach makes everything explicit, but adds a new ioctl applicable most likely only to the PAMU (type2 iommu). And the DMA_MAP of that mmap then allows userspace to select the window used? This one seems like a lot of overhead, adding a new ioctl, new fd, mmap, special mapping path, etc. There's going to be special stuff no matter what. This would keep it separated from the IOMMU map code. I'm not sure what you mean by overhead here... the runtime overhead of setting things up is not particularly relevant as long as it's reasonable. If you mean development and maintenance effort, keeping things well separated should help. We don't need to change DMA_MAP. If we can simply add a new type 2 ioctl that allows user space to set which windows are MSIs, it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. So maybe 2 ioctls: VFIO_IOMMU_GET_MSI_COUNT Do you mean a count of actual MSIs or a count of MSI banks used by the whole VFIO group? VFIO_IOMMU_MAP_MSI(iova, size) Not sure how you mean size to be used -- for MPIC it would be 4K per bank, and you can only map one bank at a time (which bank you're mapping should be a parameter, if only so that the kernel doesn't have to keep iteration state for you). How are MSIs related to devices on PAMU? PAMU doesn't care about MSIs. The relation of individual MSIs to a device is standard PCI stuff. Each MSI bank (which is part of the MPIC, not PAMU) can hold numerous MSIs. The VFIO user would want to map all MSI banks that are in use by any of the devices in the group. Ideally we'd let the VFIO grouping influence the allocation of MSIs. On x86 MSI count is very device specific, which means it wold be a VFIO_DEVICE_* ioctl (actually VFIO_DEVICE_GET_IRQ_INFO does this for us on x86). The trouble with it being a device ioctl is that you need to get the device FD, but the IOMMU protection needs to be established before you can get that... so there's an ordering problem if you need it from the device before configuring the IOMMU. Thanks, What do you mean by IOMMU protection needs to be established? Wouldn't we just start with no mappings in place? -Scott ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu