On Sat, Dec 8, 2018 at 2:40 AM Robin Murphy wrote:
>
> On 2018-12-07 7:28 pm, Souptick Joarder wrote:
> > On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote:
> >>
> >> On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
> +int vm_insert_range(struct vm_area_struct *vma, unsigned
On 2018-12-07 7:28 pm, Souptick Joarder wrote:
On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote:
On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
+ struct page **pages, unsigned long page_c
On Fri, Dec 7, 2018 at 7:17 PM Robin Murphy wrote:
>
> On 06/12/2018 18:43, Souptick Joarder wrote:
> > Convert to use vm_insert_range() to map range of kernel
> > memory to user vma.
> >
> > Signed-off-by: Souptick Joarder
> > Reviewed-by: Matthew Wilcox
> > ---
> > drivers/iommu/dma-iommu.c
On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote:
>
> On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
> > > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> > > + struct page **pages, unsigned long page_count)
> > > +{
> > > + unsigned lo
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.
Signed-off-by: Christoph Hellwig
---
arch/alpha/include/asm/dma-mapping.h | 2 +-
arch/arc/mm/cache.c | 2 +-
a
With the bypass support for the direct mapping we might not always have
methods to call, so use the proper APIs instead. The only downside is
that we will create two dma-debug entries for each mapping if
CONFIG_DMA_DEBUG is enabled.
Signed-off-by: Christoph Hellwig
---
drivers/pci/controller/vm
From: Robin Murphy
Rather than checking the DMA attribute at each callsite, just pass it
through for acpi_dma_configure() to handle directly. That can then deal
with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly
installing dummy DMA ops instead of just skipping setup entirel
From: Robin Murphy
The dummy DMA ops are currently used by arm64 for any device which has
an invalid ACPI description and is thus barred from using DMA due to not
knowing whether is is cache-coherent or not. Factor these out into
general dma-mapping code so that they can be referenced from other
dma_get_required_mask should really be with the rest of the DMA mapping
implementation instead of in drivers/base as a lone outlier.
Signed-off-by: Christoph Hellwig
---
drivers/base/platform.c | 31 ---
kernel/dma/mapping.c| 34 +-
This isn't exactly a slow path routine, but it is not super critical
either, and moving it out of line will help to keep the include chain
clean for the following DMA indirection bypass work.
Signed-off-by: Christoph Hellwig
---
include/linux/dma-mapping.h | 12 ++--
kernel/dma/mapping.c
All architectures except for sparc64 use the dma-direct code in some
form, and even for sparc64 we had the discussion of a direct mapping
mode a while ago. In preparation for directly calling the direct
mapping code don't bother having it optionally but always build the
code in. This is a minor h
There is no need to have all setup and coherent allocation / freeing
routines inline. Move them out of line to keep the implemeation
nicely encapsulated and save some kernel text size.
Signed-off-by: Christoph Hellwig
---
arch/powerpc/include/asm/dma-mapping.h | 1 -
include/linux/dma-mapping
The two functions are exactly the same, so don't bother implementing
them twice.
Signed-off-by: Christoph Hellwig
---
include/linux/dma-mapping.h | 19 ++-
1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
ind
We can just call the regular calls after adding offset the the address instead
of reimplementing them.
Signed-off-by: Christoph Hellwig
---
include/linux/dma-debug.h | 27
include/linux/dma-mapping.h | 34 +-
kernel/dma/debug.c | 42
While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits. Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce
No need to duplicate the mapping logic.
Signed-off-by: Christoph Hellwig
---
kernel/dma/direct.c | 14 +-
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index edb24f94ea1e..d45306473c90 100644
--- a/kernel/dma/direct.c
+++ b/ke
Hi all,
a while ago Jesper reported major performance regressions due to the
spectre v2 mitigations in his XDP forwarding workloads. A large part
of that is due to the DMA mapping API indirect calls.
It turns out that the most common implementation of the DMA API is the
direct mapping case, and
Only report report a DMA addressability report once to avoid spewing the
kernel log with repeated message. Also provide a stack trace to make it
easy to find the actual caller that caused the problem.
Last but not least move the actual check into the fast path and only
leave the error reporting i
We can use DMA_MAPPING_ERROR instead, which already maps to the same
value.
Signed-off-by: Christoph Hellwig
---
drivers/xen/swiotlb-xen.c | 4 ++--
include/linux/swiotlb.h | 3 ---
kernel/dma/swiotlb.c | 4 ++--
3 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/xen/s
Instead of providing a special dma_mark_clean hook just for ia64, switch
ia64 to use the normal arch_sync_dma_for_cpu hooks instead.
This means that we now also set the PG_arch_1 bit for pages in the
swiotlb buffer, which isn't stricly needed as we will never execute code
out of the swiotlb buffer
Sorry for the delay, I wanted to do a little more performance analysis
before continuing.
On 27/11/2018 18:10, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2018 at 05:55:20PM +, Jean-Philippe Brucker wrote:
+ if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) ||
+ !virtio_has_fea
Next step: 13c1fdec5682b6e13257277fa16aa31f342d167d (powerpc/dma: move
pci_dma_dev_setup_swiotlb to fsl_pci.c)
git checkout 13c1fdec5682b6e13257277fa16aa31f342d167d
Result: The PASEMI onboard ethernet works and the X5000 boots.
— Christian
Sent from my iPhone
> On 7. Dec 2018, at 14:45, Chris
On 30/11/2018 11:14, John Garry wrote:
From: Ganapatrao Kulkarni
Hi Joerg,
A friendly reminder. Can you please let me know your position on this patch?
Cheers,
John
Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
respective device NUMA node. The ternary operator w
On 07/12/2018 17:05, Christoph Hellwig wrote:
So I'd really prefer if we had a separate dummy.c file, like in
my take on your previous patch here:
http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025
http://git.infradead.org/users/hch/misc.git/commitdi
On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
> > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> > + struct page **pages, unsigned long page_count)
> > +{
> > + unsigned long uaddr = addr;
> > + int ret = 0, i;
>
> Some of the sites bei
So I'd really prefer if we had a separate dummy.c file, like in
my take on your previous patch here:
http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025
http://git.infradead.org/users/hch/misc.git/commitdiff/596bde76e5944a3f4beb8c2769067ca88dda127a
Oth
Hi all,
Tangential to Christoph's RFC for mitigating indirect call overhead in
common DMA mapping scenarios[1], this is a little reshuffle to prevent the
CONFIG_ACPI_CCA_REQUIRED case from getting in the way. This would best go
via the dma-mapping tree, so reviews and acks welcome.
Robin.
[1] ht
The dummy DMA ops are currently used by arm64 for any device which has
an invalid ACPI description and is thus barred from using DMA due to not
knowing whether is is cache-coherent or not. Factor these out into
general dma-mapping code so that they can be referenced from other
common code paths. In
Rather than checking the DMA attribute at each callsite, just pass it
through for acpi_dma_configure() to handle directly. That can then deal
with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly
installing dummy DMA ops instead of just skipping setup entirely. This
will then fre
On Fri, 7 Dec 2018 16:44:35 +0100
Jesper Dangaard Brouer wrote:
> On Fri, 7 Dec 2018 02:21:42 +0100
> Christoph Hellwig wrote:
>
> > On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote:
> > > On 06/12/2018 20:00, Christoph Hellwig wrote:
> > >> On Thu, Dec 06, 2018 at 06:54:17PM
On Fri, 7 Dec 2018 02:21:42 +0100
Christoph Hellwig wrote:
> On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote:
> > On 06/12/2018 20:00, Christoph Hellwig wrote:
> >> On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote:
> >>> I'm pretty sure we used to assign dummy_dma_ops
Hi all,
the ARM imx27/31 ports and various sh boards use
dma_declare_coherent_memory on main memory taken from the memblock
allocator.
Is there any good reason these couldn't be switched to CMA areas?
Getting rid of these magic dma_declare_coherent_memory area would
help making the dma allocator
On 06/12/2018 18:39, Souptick Joarder wrote:
Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.
As this pattern is common across different drivers, it can
be generalized by creating a new functi
I will work at the weekend to figure out where the problematic commit is.
— Christian
Sent from my iPhone
> On 7. Dec 2018, at 15:09, Christoph Hellwig wrote:
>
>> On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote:
>> Christoph Hellwig writes:
>>
>>> Ben / Michael,
>>>
>>> ca
On 12/7/18 7:16 AM, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> (level 1 and 2) to be allocated within the first 4GB of RAM, even
> on 64-bit systems.
>
> For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 pl
Em Fri, 7 Dec 2018 00:09:45 +0530
Souptick Joarder escreveu:
> Previouly drivers have their own way of mapping range of
> kernel pages/memory into user vma and this was done by
> invoking vm_insert_page() within a loop.
>
> As this pattern is common across different drivers, it can
> be generali
On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote:
> Christoph Hellwig writes:
>
> > Ben / Michael,
> >
> > can we get this one queued up for 4.21 to prepare for the DMA work later
> > on?
>
> I was hoping the PASEMI / NXP regressions could be solved before
> merging.
>
> My p502
On 06/12/2018 18:43, Souptick Joarder wrote:
Convert to use vm_insert_range() to map range of kernel
memory to user vma.
Signed-off-by: Souptick Joarder
Reviewed-by: Matthew Wilcox
---
drivers/iommu/dma-iommu.c | 13 +++--
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git
On 06 December 2018 at 11:55AM, Christian Zigotzky wrote:
On 05 December 2018 at 3:05PM, Christoph Hellwig wrote:
Thanks. Can you try a few stepping points in the tree?
First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
(the first one) applied?
Second with all commits up to 5da1
On 07/12/2018 05:49, Dongli Zhang wrote:
On 12/07/2018 12:12 AM, Joe Jin wrote:
Hi Dongli,
Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs():
I assume the call of swiotlb_tbl_map_single() might be frequent in some
situations, e.g., when 'swiotlb=force'.
That's why I declare
On Thu, Dec 06, 2018 at 02:39:15PM -0700, Yu Zhao wrote:
> Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device")
>
> Signed-off-by: Yu Zhao
> ---
> drivers/iommu/amd_iommu.c | 9 -
> 1 file changed, 8 insertions(+), 1 deletion(-)
Applied, thanks.
___
Christoph Hellwig writes:
> Ben / Michael,
>
> can we get this one queued up for 4.21 to prepare for the DMA work later
> on?
I was hoping the PASEMI / NXP regressions could be solved before
merging.
My p5020ds is booting fine with this series, so I'm not sure why it's
causing problems on Chris
Hi,
On Mon, Nov 26, 2018 at 07:29:45AM +, Tian, Kevin wrote:
> btw Baolu just reminded me one thing which is worthy of noting here.
> 'primary' vs. 'aux' concept makes sense only when we look from a device
> p.o.v. That binding relationship is not (*should not be*) carry-and-forwarded
> cross
On Fri, Dec 07, 2018 at 10:22:52AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote:
> > There are use cases where we want to allow nesting of one terminal lock
> > underneath another terminal-like lock. That new lock type is called
> > nestable terminal lo
On Mon, Nov 19, 2018 at 01:55:18PM -0500, Waiman Long wrote:
> By making the object hash locks nestable terminal locks, we can avoid
> a bunch of unnecessary lockdep validations as well as saving space
> in the lockdep tables.
So the 'problem'; which you've again not explained; is that debugobject
On Thu, Dec 06, 2018 at 05:42:16PM +, Robin Murphy wrote:
> For sure - although I am now wondering whether "mapped" is perhaps a little
> ambiguous in the naming, since the answer to "can I use the API" is yes even
> when the device may currently be attached to an identity/passthrough domain
>
Hi Robin,
On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy wrote:
>
> On 04/12/2018 11:01, Vivek Gautam wrote:
> > Qualcomm SoCs have an additional level of cache called as
> > System cache, aka. Last level cache (LLC). This cache sits right
> > before the DDR, and is tightly coupled with the memory c
On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote:
> There are use cases where we want to allow nesting of one terminal lock
> underneath another terminal-like lock. That new lock type is called
> nestable terminal lock which can optionally allow the acquisition of
> no more than one regu
On Mon, Nov 19, 2018 at 01:55:16PM -0500, Waiman Long wrote:
> The db->lock is a raw spinlock and so the lock hold time is supposed
> to be short. This will not be the case when printk() is being involved
> in some of the critical sections. In order to avoid the long hold time,
> in case some messa
On Mon, Nov 19, 2018 at 01:55:12PM -0500, Waiman Long wrote:
> A terminal lock is a lock where further locking or unlocking on another
> lock is not allowed. IOW, no forward dependency is permitted.
>
> With such a restriction in place, we don't really need to do a full
> validation of the lock ch
On Fri, Dec 7, 2018 at 4:05 PM Matthew Wilcox wrote:
>
> On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote:
> > +#ifdef CONFIG_ZONE_DMA32
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
>
> This name doesn't make any sense. Why not
On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote:
> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
This name doesn't make any sense. Why not ARM_V7S_TABLE_SLAB_FLAGS ?
> +#else
> +#define ARM_V7S_TABLE_GFP_
52 matches
Mail list logo