Re: [PATCH 00/14] DMA-mapping framework redesign preparation
On Friday 23 December 2011, Marek Szyprowski wrote: The solution we found is to introduce a new public dma mapping functions with additional attributes argument: dma_alloc_attrs and dma_free_attrs(). This way all different kinds of architecture specific buffer mappings can be hidden behind the attributes without the need of creating several versions of dma_alloc_ function. Since the patches are now in linux-next, we should make sure that they can actually get merged into 3.4. I've looked at all the patches again and found them to be straightforward and helpful, I hope we can get them merged next time. Please add my Reviewed-by: Arnd Bergmann a...@arndb.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 00/14] DMA-mapping framework redesign preparation
Hello, To help everyone in testing and adapting our patches for his hardware platform I've rebased our patches onto the latest v3.2 Linux kernel and prepared a few GIT branches in our public repository. These branches contain our memory management related patches posted in the following threads: [PATCHv18 0/11] Contiguous Memory Allocator: http://www.spinics.net/lists/linux-mm/msg28125.html later called CMAv18, [PATCH 00/14] DMA-mapping framework redesign preparation: http://www.spinics.net/lists/linux-sh/msg09777.html and [PATCH 0/8 v4] ARM: DMA-mapping framework redesign: http://www.spinics.net/lists/arm-kernel/msg151147.html with the following update: http://www.spinics.net/lists/arm-kernel/msg154889.html later called DMAv5. These branches are available in our public GIT repository: git://git.infradead.org/users/kmpark/linux-samsung http://git.infradead.org/users/kmpark/linux-samsung/ The following branches are available: 1) 3.2-cma-v18 Vanilla Linux v3.2 with fixed CMA v18 patches (first patch replaced with the one from v17 to fix SMP issues, see the respective thread). 2) 3.2-dma-v5 Vanilla Linux v3.2 + iommu/next (IOMMU maintainer's patches) branch with DMA-preparation and DMA-mapping framework redesign patches. 3) 3.2-cma-v18-dma-v5 Previous two branches merged together (DMA-mapping on top of CMA) 4) 3.2-cma-v18-dma-v5-exynos Previous branch rebased on top of iommu/next + kgene/for-next (Samsung SoC platform maintainer's patches) with new Exynos4 IOMMU driver by KyongHo Cho and relevant glue code. 5) 3.2-dma-v5-exynos Branch from point 2 rebased on top of iommu/next + kgene/for-next (Samsung SoC maintainer's patches) with new Exynos4 IOMMU driver by KyongHo Cho and relevant glue code. I hope everyone will find a branch that suits his needs. :) Best regards -- Marek Szyprowski Samsung Poland RD Center ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 00/14] DMA-mapping framework redesign preparation
Hello, On Tuesday, December 27, 2011 6:53 PM James Bottomley wrote: On Tue, 2011-12-27 at 09:25 +0100, Marek Szyprowski wrote: [...] Usually these drivers don't touch the buffer data at all, so the mapping in kernel virtual address space is not needed. We can introduce DMA_ATTRIB_NO_KERNEL_MAPPING attribute which lets kernel to skip/ignore creation of kernel virtual mapping. This way we can save previous vmalloc area and simply some mapping operation on a few architectures. I really think this wants to be a separate function. dma_alloc_coherent is for allocating memory to be shared between the kernel and a driver; we already have dma_map_sg for mapping userspace I/O as an alternative interface. This feels like it's something different again rather than an option to dma_alloc_coherent. That is just a starting point for the discussion. I thought about this API a bit and came to conclusion that there is no much difference between a dma_alloc_coherent which creates a mapping in kernel virtual space and the one that does not. It is just a hint from the driver that it will not use that mapping at all. Of course this attribute makes sense only together with adding a dma_mmap_attrs() call, because otherwise drivers won't be able to get access to the buffer data. This depends. On Virtually indexed systems like PA-RISC, there are two ways of making a DMA range coherent. One is to make the range uncached. This is incredibly slow and not what we do by default, but it can be used to make multiple mappings coherent. The other is to load the virtual address up as a coherence index into the IOMMU. This makes it a full peer in the coherence process, but means we can only designate a single virtual range to be coherent (not multiple mappings unless they happen to be congruent). Perhaps it doesn't matter that much, since I don't see a use for this on PA, but if any other architecture works the same, you'd have to designate a single mapping as the coherent one and essentially promise not to use the other mapping if we followed our normal coherence protocols. Obviously, the usual range we currently make coherent is the kernel mapping (that's actually the only virtual address we have by the time we're deep in the iommu code), so designating a different virtual address would need some surgery to the guts of the iommu code. I see, in this case not much can be achieved by dropping the kernel mapping for the allocated buffer. I'm also not sure how to mmap the buffer into userspace meet the cpu requirements? Is it possible to use non-cached mapping in userspace together with coherent mapping in kernel virtual space? However on some other architectures this attribute allows using HIGH_MEM for the allocated coherent buffer. The other possibility is to allocate it in chunks and map them contiguously into dma address space. With NO_KERNEL_MAPPING attribute we avoid consuming vmalloc range for the newly allocated buffer for which we cannot use the linear mapping (because it is scattered). Of course this attribute will be implemented by the architectures where it gives some benefits. All other can simply ignore it and return plain coherent buffer with ordinary kernel virtual mapping. The driver will just ignore it. Best regards -- Marek Szyprowski Samsung Poland RD Center ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 00/14] DMA-mapping framework redesign preparation
Hello, On Friday, December 23, 2011 5:35 PM Matthew Wilcox wrote: On Fri, Dec 23, 2011 at 01:27:19PM +0100, Marek Szyprowski wrote: The first issue we identified is the fact that on some platform (again, mainly ARM) there are several functions for allocating DMA buffers: dma_alloc_coherent, dma_alloc_writecombine and dma_alloc_noncoherent Is this write-combining from the point of view of the device (ie iommu), or from the point of view of the CPU, or both? It is about write-combining from the CPU point of view. Right now there are no devices with such advanced memory interface to do write combining on the DMA side, but I believe that they might appear at some point in the future as well. The next step in dma mapping framework update is the introduction of dma_mmap/dma_mmap_attrs() function. There are a number of drivers (mainly V4L2 and ALSA) that only exports the DMA buffers to user space. Creating a userspace mapping with correct page attributes is not an easy task for the driver. Also the DMA-mapping framework is the only place where the complete information about the allocated pages is available, especially if the implementation uses IOMMU controller to provide a contiguous buffer in DMA address space which is scattered in physical memory space. Surely we only need a helper which drivrs can call from their mmap routine to solve this? On ARM architecture it is already implemented this way and a bunch of drivers use dma_mmap_coherent/dma_mmap_writecombine calls. We would like to standardize these calls across all architectures. Usually these drivers don't touch the buffer data at all, so the mapping in kernel virtual address space is not needed. We can introduce DMA_ATTRIB_NO_KERNEL_MAPPING attribute which lets kernel to skip/ignore creation of kernel virtual mapping. This way we can save previous vmalloc area and simply some mapping operation on a few architectures. I really think this wants to be a separate function. dma_alloc_coherent is for allocating memory to be shared between the kernel and a driver; we already have dma_map_sg for mapping userspace I/O as an alternative interface. This feels like it's something different again rather than an option to dma_alloc_coherent. That is just a starting point for the discussion. I thought about this API a bit and came to conclusion that there is no much difference between a dma_alloc_coherent which creates a mapping in kernel virtual space and the one that does not. It is just a hint from the driver that it will not use that mapping at all. Of course this attribute makes sense only together with adding a dma_mmap_attrs() call, because otherwise drivers won't be able to get access to the buffer data. On coherent architectures where dma_alloc_coherent is just a simple wrapper around alloc_pages_exact() such attribute can be simply ignored without any impact on the drivers (that's the main idea behind dma attributes!). However such hint will help a lot on non-coherent architectures where additional work need to be done to provide a cohenent mapping in kernel address space. It also saves some precious kernel resources like vmalloc address range. Best regards -- Marek Szyprowski Samsung Poland RD Center ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 00/14] DMA-mapping framework redesign preparation
Hello eveyone, On Linaro Memory Management meeting in Budapest (May 2011) we have discussed about the design of DMA mapping framework. We tried to identify the drawbacks and limitations as well as to provide some a solution for them. The discussion was mainly about ARM architecture, but some of the conclusions need to be applied to cross-architecture code. The first issue we identified is the fact that on some platform (again, mainly ARM) there are several functions for allocating DMA buffers: dma_alloc_coherent, dma_alloc_writecombine and dma_alloc_noncoherent (not functional now). For each of them there is a match dma_free_* function. This gives us quite a lot of functions in the public API and complicates things when we need to have several different implementations for different devices selected in runtime (if IOMMU controller is available only for a few devices in the system). Also the drivers which use less common variants are less portable because of the lacks of dma_alloc_writecombine on other architectures. The solution we found is to introduce a new public dma mapping functions with additional attributes argument: dma_alloc_attrs and dma_free_attrs(). This way all different kinds of architecture specific buffer mappings can be hidden behind the attributes without the need of creating several versions of dma_alloc_ function. dma_alloc_coherent() can be wrapped on top of new dma_alloc_attrs() with NULL attrs parameter. dma_alloc_writecombine and dma_alloc_noncoherent can be implemented as a simple wrappers which sets attributes to DMA_ATTRS_WRITECOMBINE or DMA_ATTRS_NON_CONSISTENT respectively. These new attributes will be implemented only on the architectures that really support them, the others will simply ignore them defaulting to the dma_alloc_coherent equivalent. The next step in dma mapping framework update is the introduction of dma_mmap/dma_mmap_attrs() function. There are a number of drivers (mainly V4L2 and ALSA) that only exports the DMA buffers to user space. Creating a userspace mapping with correct page attributes is not an easy task for the driver. Also the DMA-mapping framework is the only place where the complete information about the allocated pages is available, especially if the implementation uses IOMMU controller to provide a contiguous buffer in DMA address space which is scattered in physical memory space. Usually these drivers don't touch the buffer data at all, so the mapping in kernel virtual address space is not needed. We can introduce DMA_ATTRIB_NO_KERNEL_MAPPING attribute which lets kernel to skip/ignore creation of kernel virtual mapping. This way we can save previous vmalloc area and simply some mapping operation on a few architectures. This patch series is a preparation for the above changes in the public dma mapping API. The main goal is to modify dma_map_ops structure and let all users to use for implementation of the new public funtions. The proof-of-concept patches for ARM architecture have been already posted a few times and now they are working resonably well. They perform conversion to dma_map_ops based implementation and add support for generic IOMMU-based dma mapping implementation. To get them merged we first need to get acceptance for the changes in the common, cross-architecture structures. More information about these patches can be found in the following threads: http://www.spinics.net/lists/linux-mm/msg19856.html http://www.spinics.net/lists/linux-mm/msg21241.html http://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000571.html http://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000577.html http://www.spinics.net/lists/linux-mm/msg25490.html The patches are prepared on top of Linux Kernel v3.2-rc6. I would appreciate any comments and help with getting this patch series into linux-next tree. The idea apllied in this patch set have been also presented during the Kernel Summit 2011 and ELC-E 2011 in Prague, in the presentation 'ARM DMA-Mapping Framework Redesign and IOMMU integration'. I'm really sorry if I missed any of the relevant architecture mailing lists. I've did my best to include everyone. Feel free to forward this patchset to all interested developers and maintainers. I've already feel like a nasty spammer. Best regards Marek Szyprowski Samsung Poland RD Center Patch summary: Andrzej Pietrasiewicz (9): X86: adapt for dma_map_ops changes MIPS: adapt for dma_map_ops changes PowerPC: adapt for dma_map_ops changes IA64: adapt for dma_map_ops changes SPARC: adapt for dma_map_ops changes Alpha: adapt for dma_map_ops changes SH: adapt for dma_map_ops changes Microblaze: adapt for dma_map_ops changes Unicore32: adapt for dma_map_ops changes Marek Szyprowski (5): common: dma-mapping: introduce alloc_attrs and free_attrs methods common: dma-mapping: remove old alloc_coherent and free_coherent methods common: dma-mapping: introduce mmap method common: DMA-mapping: add WRITE_COMBINE
Re: [PATCH 00/14] DMA-mapping framework redesign preparation
On Fri, Dec 23, 2011 at 01:27:19PM +0100, Marek Szyprowski wrote: The first issue we identified is the fact that on some platform (again, mainly ARM) there are several functions for allocating DMA buffers: dma_alloc_coherent, dma_alloc_writecombine and dma_alloc_noncoherent Is this write-combining from the point of view of the device (ie iommu), or from the point of view of the CPU, or both? The next step in dma mapping framework update is the introduction of dma_mmap/dma_mmap_attrs() function. There are a number of drivers (mainly V4L2 and ALSA) that only exports the DMA buffers to user space. Creating a userspace mapping with correct page attributes is not an easy task for the driver. Also the DMA-mapping framework is the only place where the complete information about the allocated pages is available, especially if the implementation uses IOMMU controller to provide a contiguous buffer in DMA address space which is scattered in physical memory space. Surely we only need a helper which drivrs can call from their mmap routine to solve this? Usually these drivers don't touch the buffer data at all, so the mapping in kernel virtual address space is not needed. We can introduce DMA_ATTRIB_NO_KERNEL_MAPPING attribute which lets kernel to skip/ignore creation of kernel virtual mapping. This way we can save previous vmalloc area and simply some mapping operation on a few architectures. I really think this wants to be a separate function. dma_alloc_coherent is for allocating memory to be shared between the kernel and a driver; we already have dma_map_sg for mapping userspace I/O as an alternative interface. This feels like it's something different again rather than an option to dma_alloc_coherent. -- Matthew Wilcox Intel Open Source Technology Centre Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 00/14] DMA-mapping framework redesign preparation
On Fri, 2011-12-23 at 09:35 -0700, Matthew Wilcox wrote: I really think this wants to be a separate function. dma_alloc_coherent is for allocating memory to be shared between the kernel and a driver; we already have dma_map_sg for mapping userspace I/O as an alternative interface. This feels like it's something different again rather than an option to dma_alloc_coherent. Depends. There can be some interesting issues with some of the ARM stuff out there (and to a lesser extent older ppc embedded stuff). For example, some devices really want a physically contiguous chunk, and are not cache coherent. In that case, you can't keep the linear mapping around. But you also don't waste your precious kernel virtual space creating a separate non-cachable mapping for those. In general, dma mapping attributes as a generic feature make sense, whether this specific attribute does or not though. And we probably want space for platform specific attributes, for example, FSL embedded iommu's have interesting features for directing data toward a specific core cache etc... that we might want to expose using such attributes. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev