Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
On Thu, Jul 22, 2010 at 08:51:51AM +0100, Russell King - ARM Linux wrote: On Wed, Jul 21, 2010 at 08:50:26PM -0700, Zach Pfeffer wrote: On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote: On Tue, 13 Jul 2010 10:02:23 +0100 Zach Pfeffer said this new VCM infrastructure can be useful for video4linux. However, I don't think we need 3,000-lines another abstraction layer to solve video4linux's issue nicely. Its only 3000 lines because I haven't converted the code to use function pointers. I don't understand - you've made this claim a couple of times. I can't see how converting the code to use function pointers (presumably to eliminate those switch statements) would reduce the number of lines of code. Please explain (or show via new patches) how does converting this to function pointers significantly reduce the number of lines of code. We might then be able to put just _one_ of these issues to bed. Aye. Its getting worked on. Once its done I'll push it. Getting back to the point. There is no API that can handle large buffer allocation and sharing with low-level attribute control for virtual address spaces outside the CPU. I think we've dealt with the attribute issue to death now. Shall we repeat it again? I think the only point of agreement is that all mappings must have compatible attributes, the issue of multiple mappings is still outstanding, as is needing more fine grained control of the attributes of a set of compatible mappings (I still need to digest your examples a little). The DMA API et al. take a CPU centric view of virtual space management, sharing has to be explicitly written and external virtual space management is left up to device driver writers. I think I've also shown that not to be the case with example code. The code behind the DMA API can be changed on a per-device basis (currently on ARM we haven't supported that because no one's asked for it yet) so that it can support multiple IOMMUs even of multiple different types. I'm seeing that now. As I become more familiar with the DMA API the way forward may become more clear to me. I certainly appreciate the time you've spent discussing things and the code examples you've listed. For example, it fairly clear how I can use a scatter list to describe a mapping of big buffers. I can start down this path and see what shakes out. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Thu, Jul 22, 2010 at 08:34:55AM +0100, Russell King - ARM Linux wrote: On Wed, Jul 21, 2010 at 09:25:28PM -0700, Zach Pfeffer wrote: Yes it is a problem, as Russell has brought up, but there's something I probably haven't communicated well. I'll use the following example: There are 3 devices: A CPU, a decoder and a video output device. All 3 devices need to map the same 12 MB buffer at the same time. Why do you need the same buffer mapped by the CPU? Let's take your example of a video decoder and video output device. Surely the CPU doesn't want to be writing to the same memory region used for the output picture as the decoder is writing to. So what's the point of mapping that memory into the CPU's address space? It may, especially if you're doing some software post processing. Also by mapping all the buffers its extremly fast to pass the buffers around in this senario - the buffer passing becomes a simple signal. Surely the video output device doesn't need to see the input data to the decoder either? No, but other devices may (like the CPU). Surely, all you need is: 1. a mapping for the CPU for a chunk of memory to pass data to the decoder. 2. a mapping for the decoder to see the chunk of memory to receive data from the CPU. 3. a mapping for the decoder to see a chunk of memory used for the output video buffer. 4. a mapping for the output device to see the video buffer. So I don't see why everything needs to be mapped by everything else. That's fair, but we do share buffers and we do have many, very large mappings, and we do need to pull these from a separate pools because they need to exhibit a particular allocation profile. I agree with you that things should work like you've listed, but with Qualcomm's ARM multimedia engines we're seeing some different usage scenarios. Its the giant buffers, needing to use our own buffer allocator, the need to share and the need to swap out virtual IOMMU space (which we haven't talked about much) which make the DMA API seem like a mismatch. (we haven't even talked about graphics usage ;) ). -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Thu, Jul 22, 2010 at 08:39:17AM +0100, Russell King - ARM Linux wrote: On Wed, Jul 21, 2010 at 09:30:34PM -0700, Zach Pfeffer wrote: This goes to the nub of the issue. We need a lot of 1 MB physically contiguous chunks. The system is going to fragment and we'll never get our 12 1 MB chunks that we'll need, since the DMA API allocator uses the system pool it will never succeed. By the DMA API allocator I assume you mean the coherent DMA interface, The DMA coherent API and DMA streaming APIs are two separate sub-interfaces of the DMA API and are not dependent on each other. I didn't know that, but yes. As far as I can tell they both allocate memory from the VM. We'd need a way to hook in our our own minimized mapping allocator. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
On Thu, Jul 22, 2010 at 01:47:36PM +0900, FUJITA Tomonori wrote: On Wed, 21 Jul 2010 20:50:26 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote: On Tue, 13 Jul 2010 10:02:23 +0100 Zach Pfeffer said this new VCM infrastructure can be useful for video4linux. However, I don't think we need 3,000-lines another abstraction layer to solve video4linux's issue nicely. Its only 3000 lines because I haven't converted the code to use function pointers. The main point is adding a new abstraction that don't provide the huge benefit. I disagree. In its current form the API may not be appropriate for inclusion into the kernel, but it provides a common framework for handling a class of problems that have been solved many times in the kernel: large buffer management, IOMMU interoperation and fine grained mapping control. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Thu, Jul 22, 2010 at 01:43:26PM +0900, FUJITA Tomonori wrote: On Wed, 21 Jul 2010 21:30:34 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: On Wed, Jul 21, 2010 at 10:44:37AM +0900, FUJITA Tomonori wrote: On Tue, 20 Jul 2010 15:20:01 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: I'm not saying that it's reasonable to pass (or even allocate) a 1MB buffer via the DMA API. But given a bunch of large chunks of memory, is there any API that can manage them (asked this on the other thread as well)? What is the problem about mapping a 1MB buffer with the DMA API? Possibly, an IOMMU can't find space for 1MB but it's not the problem of the DMA API. This goes to the nub of the issue. We need a lot of 1 MB physically contiguous chunks. The system is going to fragment and we'll never get our 12 1 MB chunks that we'll need, since the DMA API allocator uses the system pool it will never succeed. For this reason we reserve a pool of 1 MB chunks (and 16 MB, 64 KB etc...) to satisfy our requests. This same use case is seen on most embedded media engines that are getting built today. We don't need a new abstraction to reserve some memory. If you want pre-allocated memory pool per device (and share them with some), the DMA API can for coherent memory (see dma_alloc_from_coherent). You can extend the DMA API if necessary. That function won't work for us. We can't use bitmap_find_free_region(), we need to use our own allocator. If anything we need a dma_alloc_from_custom(my_allocator). Take a look at: mm: iommu: A physical allocator for the VCMM vcm_alloc_max_munch() -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote: On Tue, 13 Jul 2010 10:02:23 +0100 Zach Pfeffer said this new VCM infrastructure can be useful for video4linux. However, I don't think we need 3,000-lines another abstraction layer to solve video4linux's issue nicely. Its only 3000 lines because I haven't converted the code to use function pointers. I can't find any reasonable reasons that we need to merge VCM; seems that the combination of the current APIs (or with some small extensions) can work for the issues that VCM tries to solve. Getting back to the point. There is no API that can handle large buffer allocation and sharing with low-level attribute control for virtual address spaces outside the CPU. At this point if you need to work with big buffers, 1 MB and 16 MB etc, and map those big buffers to non-CPU virtual spaces you need to explicitly carve them out and set up the mappings and sharing by hand. Its reasonable to have an API that can do this especially since IOMMUs are going to become more prevalent. The DMA API et al. take a CPU centric view of virtual space management, sharing has to be explicitly written and external virtual space management is left up to device driver writers. Given a system where each device has an IOMMU or a MMU the whole concept of a scatterlist goes away. The VCM API gets a jump on it. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Tue, Jul 20, 2010 at 09:44:12PM -0400, Timothy Meade wrote: On Tue, Jul 20, 2010 at 8:44 PM, Zach Pfeffer zpfef...@codeaurora.org wrote: On Mon, Jul 19, 2010 at 05:21:35AM -0400, Tim HRM wrote: On Fri, Jul 16, 2010 at 8:01 PM, Larry Bassel lbas...@codeaurora.org wrote: On 16 Jul 10 08:58, Russell King - ARM Linux wrote: On Thu, Jul 15, 2010 at 08:48:36PM -0400, Tim HRM wrote: Interesting, since I seem to remember the MSM devices mostly conduct IO through regions of normal RAM, largely accomplished through ioremap() calls. Without more public domain documentation of the MSM chips and AMSS interfaces I wouldn't know how to avoid this, but I can imagine it creates a bit of urgency for Qualcomm developers as they attempt to upstream support for this most interesting SoC. As the patch has been out for RFC since early April on the linux-arm-kernel mailing list (Subject: [RFC] Prohibit ioremap() on kernel managed RAM), and no comments have come back from Qualcomm folk. We are investigating the impact of this change on us, and I will send out more detailed comments next week. The restriction on creation of multiple V:P mappings with differing attributes is also fairly hard to miss in the ARM architecture specification when reading the sections about caches. Larry Bassel -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. Hi Larry and Qualcomm people. I'm curious what your reason for introducing this new api (or adding to dma) is. ?Specifically how this would be used to make the memory mapping of the MSM chip dynamic in contrast to the fixed _PHYS defines in the Android and Codeaurora trees. The MSM has many integrated engines that allow offloading a variety of workloads. These engines have always addressed memory using physical addresses, because of this we had to reserve large (10's MB) buffers at boot. These buffers are never freed regardless of whether an engine is actually using them. As you can imagine, needing to reserve memory for all time on a device that doesn't have a lot of memory in the first place is not ideal because that memory could be used for other things, running apps, etc. To solve this problem we put IOMMUs in front of a lot of the engines. IOMMUs allow us to map physically discontiguous memory into a virtually contiguous address range. This means that we could ask the OS for 10 MB of pages and map all of these into our IOMMU space and the engine would still see a contiguous range. I see. Much like I suspected, this is used to replace the static regime of the earliest Android kernel. You mention placing IOMMUs in front of the A11 engines, you are involved in this architecture as an engineer or similar? I'm involved to the extent of designing and implementing VCM and, finding it useful for this class of problems, trying push it upstream. Is there a reason a cooperative approach using RPC or another mechanism is not used for memory reservation, this is something that can be accomplished fully on APPS side? It can be accomplished a few ways. At this point we let the application processor manage the buffers. Other cooperative approaches have been talked about. As you can see in the short, but voluminous cannon of MSM Linux support there is a degree of RPC used to communicate with other nodes in the system. As time progresses the cannon of code shows this usage going down. In reality, limitations in the hardware meant that we needed to map memory using larger mappings to minimize the number of TLB misses. This, plus the number of IOMMUs and the extreme use cases we needed to design for led us to a generic design. This generic design solved our problem and the general mapping problem. We thought other people, who had this same big-buffer interoperation problem would also appreciate a common API that was built with their needs in mind so we pushed our idea up. I'm also interested in how this ability to map memory regions as files for devices like KGSL/DRI or PMEM might work and why this is better suited to that purpose than existing methods, where this fits into camera preview and other issues that have been dealt with in these trees in novel ways (from my perspective). The file based approach was driven by Android's buffer passing scheme and the need to write userspace drivers for multimedia, etc... So the Android file backed approach is obiviated by GEM and other mechanisms? Aye. Thanks you for you help, Timothy Meade -tmzt #htc-linux (facebook.com/HTCLinux) -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Mon, Jul 19, 2010 at 12:44:49AM -0700, Eric W. Biederman wrote: Zach Pfeffer zpfef...@codeaurora.org writes: On Thu, Jul 15, 2010 at 09:55:35AM +0100, Russell King - ARM Linux wrote: On Wed, Jul 14, 2010 at 06:29:58PM -0700, Zach Pfeffer wrote: The VCM ensures that all mappings that map a given physical buffer: IOMMU mappings, CPU mappings and one-to-one device mappings all map that buffer using the same (or compatible) attributes. At this point the only attribute that users can pass is CACHED. In the absence of CACHED all accesses go straight through to the physical memory. So what you're saying is that if I have a buffer in kernel space which I already have its virtual address, I can pass this to VCM and tell it !CACHED, and it'll setup another mapping which is not cached for me? Not quite. The existing mapping will be represented by a reservation from the prebuilt VCM of the VM. This reservation has been marked non-cached. Another reservation on a IOMMU VCM, also marked non-cached will be backed with the same physical memory. This is legal in ARM, allowing the vcm_back call to succeed. If you instead passed cached on the second mapping, the first mapping would be non-cached and the second would be cached. If the underlying architecture supported this than the vcm_back would go through. How does this compare with the x86 pat code? First, thanks for asking this question. I wasn't aware of the x86 pat code and I got to read about it. From my initial read the VCM differs in 2 ways: 1. The attributes are explicitly set on virtual address ranges. These reservations can then map physical memory with these attributes. 2. We explicitly allow multiple mappings (as long as the attributes are compatible). One such mapping may come from a IOMMU's virtual address space while another comes from the CPUs virtual address space. These mappings may exist at the same time. You are aware that multiple V:P mappings for the same physical page with different attributes are being outlawed with ARMv6 and ARMv7 due to speculative prefetching. The cache can be searched even for a mapping specified as 'normal, uncached' and you can get cache hits because the data has been speculatively loaded through a separate cached mapping of the same physical page. I didn't know that. Thanks for the heads up. FYI, during the next merge window, I will be pushing a patch which makes ioremap() of system RAM fail, which should be the last core code creator of mappings with different memory types. This behaviour has been outlawed (as unpredictable) in the architecture specification and does cause problems on some CPUs. That's fair enough, but it seems like it should only be outlawed for those processors on which it breaks. To my knowledge mismatch of mapping attributes is a problem on most cpus on every architecture. I don't see it making sense to encourage coding constructs that will fail in the strangest most difficult to debug ways. Yes it is a problem, as Russell has brought up, but there's something I probably haven't communicated well. I'll use the following example: There are 3 devices: A CPU, a decoder and a video output device. All 3 devices need to map the same 12 MB buffer at the same time. Once this buffer has served its purpose it gets freed and goes back into the pool of big buffers. When the same usage case exists again the buffer needs to get reallocated and the same devices need to map to it. This usage case does exist, not only for Qualcomm but for all of these SoC media engines that have started running Linux. The VCM API attempts to cover this case for the Linux kernel. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Wed, Jul 21, 2010 at 10:44:37AM +0900, FUJITA Tomonori wrote: On Tue, 20 Jul 2010 15:20:01 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: I'm not saying that it's reasonable to pass (or even allocate) a 1MB buffer via the DMA API. But given a bunch of large chunks of memory, is there any API that can manage them (asked this on the other thread as well)? What is the problem about mapping a 1MB buffer with the DMA API? Possibly, an IOMMU can't find space for 1MB but it's not the problem of the DMA API. This goes to the nub of the issue. We need a lot of 1 MB physically contiguous chunks. The system is going to fragment and we'll never get our 12 1 MB chunks that we'll need, since the DMA API allocator uses the system pool it will never succeed. For this reason we reserve a pool of 1 MB chunks (and 16 MB, 64 KB etc...) to satisfy our requests. This same use case is seen on most embedded media engines that are getting built today. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Tue, Jul 20, 2010 at 09:54:33PM +0100, Russell King - ARM Linux wrote: On Tue, Jul 20, 2010 at 01:45:17PM -0700, Zach Pfeffer wrote: You can also conflict in access permissions which can and do conflict (which are what multiple mappings are all about...some buffer can get some access, while others get different access). Access permissions don't conflict between mappings - each mapping has unique access permissions. Yes. Bad choice of words. The VCM API allows the same memory to be mapped as long as it makes sense and allows those attributes that can change to be specified. It could be the alternative, globally applicable approach, your looking for and request in your patch. I very much doubt it - there's virtually no call for creating an additional mapping of existing kernel memory with different permissions. The only time kernel memory gets remapped is with vmalloc(), where we want to create a virtually contiguous mapping from a collection of (possibly) non-contiguous pages. Such allocations are always created with R/W permissions. There are some cases where the vmalloc APIs are used to create mappings with different memory properties, but as already covered, this has become illegal with ARMv6 and v7 architectures. So no, VCM doesn't help because there's nothing that could be solved here. Creating read-only mappings is pointless, and creating mappings with different memory type, sharability or cache attributes is illegal. I don't think its pointless; it may have limited utility but things like read-only mappings can be useful. Without the VCM API (or something like it) there will just be a bunch of duplicated code that's basically doing ioremap. This code will probably fail to configure its mappings correctly, in which case your patch is a bad idea because it'll spawn bugs all over the place instead of at a know location. We could instead change ioremap to match the attributes of System RAM if that's what its mapping. And as I say, what is the point of creating another identical mapping to the one we already have? As you say probably not much. We do still have a problem (and other people have it as well) we need to map in large contiguous buffers with various attributes and point the kernel and various engines at them. This seems like something that would be globally useful. The feedback I've gotten is that we should just keep our usage private to our mach-msm branch. I've got a couple of questions: Do you think a global solution to this problem is appropriate? What would that solution need to look like, transparent huge pages? How should people change various mapping attributes for these large sections of memory? -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Mon, Jul 19, 2010 at 05:21:35AM -0400, Tim HRM wrote: On Fri, Jul 16, 2010 at 8:01 PM, Larry Bassel lbas...@codeaurora.org wrote: On 16 Jul 10 08:58, Russell King - ARM Linux wrote: On Thu, Jul 15, 2010 at 08:48:36PM -0400, Tim HRM wrote: Interesting, since I seem to remember the MSM devices mostly conduct IO through regions of normal RAM, largely accomplished through ioremap() calls. Without more public domain documentation of the MSM chips and AMSS interfaces I wouldn't know how to avoid this, but I can imagine it creates a bit of urgency for Qualcomm developers as they attempt to upstream support for this most interesting SoC. As the patch has been out for RFC since early April on the linux-arm-kernel mailing list (Subject: [RFC] Prohibit ioremap() on kernel managed RAM), and no comments have come back from Qualcomm folk. We are investigating the impact of this change on us, and I will send out more detailed comments next week. The restriction on creation of multiple V:P mappings with differing attributes is also fairly hard to miss in the ARM architecture specification when reading the sections about caches. Larry Bassel -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. Hi Larry and Qualcomm people. I'm curious what your reason for introducing this new api (or adding to dma) is. Specifically how this would be used to make the memory mapping of the MSM chip dynamic in contrast to the fixed _PHYS defines in the Android and Codeaurora trees. The MSM has many integrated engines that allow offloading a variety of workloads. These engines have always addressed memory using physical addresses, because of this we had to reserve large (10's MB) buffers at boot. These buffers are never freed regardless of whether an engine is actually using them. As you can imagine, needing to reserve memory for all time on a device that doesn't have a lot of memory in the first place is not ideal because that memory could be used for other things, running apps, etc. To solve this problem we put IOMMUs in front of a lot of the engines. IOMMUs allow us to map physically discontiguous memory into a virtually contiguous address range. This means that we could ask the OS for 10 MB of pages and map all of these into our IOMMU space and the engine would still see a contiguous range. In reality, limitations in the hardware meant that we needed to map memory using larger mappings to minimize the number of TLB misses. This, plus the number of IOMMUs and the extreme use cases we needed to design for led us to a generic design. This generic design solved our problem and the general mapping problem. We thought other people, who had this same big-buffer interoperation problem would also appreciate a common API that was built with their needs in mind so we pushed our idea up. I'm also interested in how this ability to map memory regions as files for devices like KGSL/DRI or PMEM might work and why this is better suited to that purpose than existing methods, where this fits into camera preview and other issues that have been dealt with in these trees in novel ways (from my perspective). The file based approach was driven by Android's buffer passing scheme and the need to write userspace drivers for multimedia, etc... -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Thu, Jul 15, 2010 at 09:55:35AM +0100, Russell King - ARM Linux wrote: On Wed, Jul 14, 2010 at 06:29:58PM -0700, Zach Pfeffer wrote: The VCM ensures that all mappings that map a given physical buffer: IOMMU mappings, CPU mappings and one-to-one device mappings all map that buffer using the same (or compatible) attributes. At this point the only attribute that users can pass is CACHED. In the absence of CACHED all accesses go straight through to the physical memory. So what you're saying is that if I have a buffer in kernel space which I already have its virtual address, I can pass this to VCM and tell it !CACHED, and it'll setup another mapping which is not cached for me? Not quite. The existing mapping will be represented by a reservation from the prebuilt VCM of the VM. This reservation has been marked non-cached. Another reservation on a IOMMU VCM, also marked non-cached will be backed with the same physical memory. This is legal in ARM, allowing the vcm_back call to succeed. If you instead passed cached on the second mapping, the first mapping would be non-cached and the second would be cached. If the underlying architecture supported this than the vcm_back would go through. You are aware that multiple V:P mappings for the same physical page with different attributes are being outlawed with ARMv6 and ARMv7 due to speculative prefetching. The cache can be searched even for a mapping specified as 'normal, uncached' and you can get cache hits because the data has been speculatively loaded through a separate cached mapping of the same physical page. I didn't know that. Thanks for the heads up. FYI, during the next merge window, I will be pushing a patch which makes ioremap() of system RAM fail, which should be the last core code creator of mappings with different memory types. This behaviour has been outlawed (as unpredictable) in the architecture specification and does cause problems on some CPUs. That's fair enough, but it seems like it should only be outlawed for those processors on which it breaks. We've also the issue of multiple mappings with differing cache attributes which needs addressing too... The VCM has been architected to handle these things. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Wed, Jul 14, 2010 at 10:59:38AM +0900, FUJITA Tomonori wrote: On Tue, 13 Jul 2010 05:14:21 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: You mean that you want to specify this alignment attribute every time you create an IOMMU mapping? Then you can set segment_boundary_mask every time you create an IOMMU mapping. It's odd but it should work. Kinda. I want to forget about IOMMUs, devices and CPUs. I just want to create a mapping that has the alignment I specify, regardless of the mapper. The mapping is created on a VCM and the VCM is associated with a mapper: a CPU, an IOMMU'd device or a direct mapped device. Sounds like you can do the above with the combination of the current APIs, create a virtual address and then an I/O address. Yes, and that's what the implementation does - and all the other implementations that need to do this same thing. Why not solve the problem once? The above can't be a reason to add a new infrastructure includes more than 3,000 lines. Right now its 3000 lines because I haven't converted to a function pointer based implementation. Once I do that the size of the implementation will shrink and the code will act as a lib. Users pass buffer mappers and the lib will ease the management of of those buffers. Another possible solution is extending struct dma_attrs. We could add the alignment attribute to it. That may be useful, but in the current DMA-API may be seen as redundant info. If there is real requirement, we can extend the DMA-API. If the DMA-API contained functions to allocate virtual space separate from physical space and reworked how chained buffers functioned it would probably work - but then things start to look like the VCM API which does graph based map management. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
On Wed, Jul 14, 2010 at 09:34:03PM +0200, Joerg Roedel wrote: On Mon, Jul 12, 2010 at 10:21:05PM -0700, Zach Pfeffer wrote: Joerg Roedel wrote: The DMA-API already does this with the help of IOMMUs if they are present. What is the benefit of your approach over that? The grist to the DMA-API mill is the opaque scatterlist. Each scatterlist element brings together a physical address and a bus address that may be different. The set of scatterlist elements constitute both the set of physical buffers and the mappings to those buffers. My approach separates these two things into a struct physmem which contains the set of physical buffers and a struct reservation which contains the set of bus addresses (or device addresses). Each element in the struct physmem may be of various lengths (without resorting to chaining). A map call maps the one set to the other. Okay, thats a different concept, where is the benefit? The benefit is that virtual address space and physical address space are managed independently. This may be useful if you want to reuse the same set of physical buffers, a user simply maps them when they're needed. It also means that different physical memories could be targeted and a virtual allocation could map those memories without worrying about where they were. This whole concept is just a logical extension of the already existing separation between pages and page frames... in fact the separation between physical memory and what is mapped to that memory is fundamental to the Linux kernel. This approach just says that arbitrary long buffers should work the same way. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Wed, Jul 14, 2010 at 11:05:36PM +0100, Russell King - ARM Linux wrote: On Wed, Jul 14, 2010 at 01:11:49PM -0700, Zach Pfeffer wrote: If the DMA-API contained functions to allocate virtual space separate from physical space and reworked how chained buffers functioned it would probably work - but then things start to look like the VCM API which does graph based map management. Every additional virtual mapping of a physical buffer results in additional cache aliases on aliasing caches, and more workload for developers to sort out the cache aliasing issues. What does VCM to do mitigate that? The VCM ensures that all mappings that map a given physical buffer: IOMMU mappings, CPU mappings and one-to-one device mappings all map that buffer using the same (or compatible) attributes. At this point the only attribute that users can pass is CACHED. In the absence of CACHED all accesses go straight through to the physical memory. The architecture of the VCM allows these sorts of consistency checks to be made since all mappers of a given physical resource are tracked. This is feasible because the physical resources we're tracking are typically large. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Wed, Jul 14, 2010 at 06:29:58PM -0700, Zach Pfeffer wrote: On Wed, Jul 14, 2010 at 11:05:36PM +0100, Russell King - ARM Linux wrote: On Wed, Jul 14, 2010 at 01:11:49PM -0700, Zach Pfeffer wrote: If the DMA-API contained functions to allocate virtual space separate from physical space and reworked how chained buffers functioned it would probably work - but then things start to look like the VCM API which does graph based map management. Every additional virtual mapping of a physical buffer results in additional cache aliases on aliasing caches, and more workload for developers to sort out the cache aliasing issues. What does VCM to do mitigate that? The VCM ensures that all mappings that map a given physical buffer: IOMMU mappings, CPU mappings and one-to-one device mappings all map that buffer using the same (or compatible) attributes. At this point the only attribute that users can pass is CACHED. In the absence of CACHED all accesses go straight through to the physical memory. The architecture of the VCM allows these sorts of consistency checks to be made since all mappers of a given physical resource are tracked. This is feasible because the physical resources we're tracking are typically large. A few more things... In addition to CACHED, the VCMM can support different cache policies as long as the architecture can support it - they get passed down through the device map call. In addition, handling physical mappings in the VCMM enables it to perform refcounting on the physical chunks (ie, to see how many virtual spaces it's been mapped to, including the kernel's). This allows it to turn on any coherency protocols that are available in hardware (ie, setting the shareable bit on something that is mapped to more than one virtual space). That same attribute can be left off on a buffer that has only one virtual mapping (ie, scratch buffers used by one device only). It is then up to the underlying system to deal with that shared attribute - to enable redirection if it's supported, or to force something to be non-cacheable, etc. Doing it all through the VCMM allows all these mechanisms be worked out once per architecture and then reused. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Wed, Jul 14, 2010 at 06:47:34PM -0700, Eric W. Biederman wrote: Zach Pfeffer zpfef...@codeaurora.org writes: On Wed, Jul 14, 2010 at 11:05:36PM +0100, Russell King - ARM Linux wrote: On Wed, Jul 14, 2010 at 01:11:49PM -0700, Zach Pfeffer wrote: If the DMA-API contained functions to allocate virtual space separate from physical space and reworked how chained buffers functioned it would probably work - but then things start to look like the VCM API which does graph based map management. Every additional virtual mapping of a physical buffer results in additional cache aliases on aliasing caches, and more workload for developers to sort out the cache aliasing issues. What does VCM to do mitigate that? The VCM ensures that all mappings that map a given physical buffer: IOMMU mappings, CPU mappings and one-to-one device mappings all map that buffer using the same (or compatible) attributes. At this point the only attribute that users can pass is CACHED. In the absence of CACHED all accesses go straight through to the physical memory. The architecture of the VCM allows these sorts of consistency checks to be made since all mappers of a given physical resource are tracked. This is feasible because the physical resources we're tracking are typically large. On x86 this is implemented in the pat code, and could reasonably be generalized to be cross platform. This is controlled by HAVE_PFNMAP_TRACKING and with entry points like track_pfn_vma_new. Given that we already have an implementation that tracks the cached vs non-cached attribute using the dma api. I don't see that the API has to change. An implementation of the cached vs non-cached status for arm and other architectures is probably appropriate. It is definitely true that getting your mapping caching attributes out of sync can be a problem. Sure, but we're still stuck with needing lots of scatterlist list elements and needing to copy them to share physical buffers. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
On Tue, Jul 13, 2010 at 02:59:08PM +0900, FUJITA Tomonori wrote: On Mon, 12 Jul 2010 22:46:59 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: Joerg Roedel wrote: On Fri, Jul 02, 2010 at 12:33:51AM -0700, Zach Pfeffer wrote: Daniel Walker wrote: So if we include this code which map implementations could you collapse into this implementations ? Generally , what currently existing code can VCMM help to eliminate? In theory, it can eliminate all code the interoperates between IOMMU, CPU and non-IOMMU based devices and all the mapping code, alignment, mapping attribute and special block size support that's been implemented. Thats a very abstract statement. Can you point to particular code files and give a rough sketch how it could be improved using VCMM? I can. Not to single out a particular subsystem, but the video4linux code contains interoperation code to abstract the difference between sg buffers, vmalloc buffers and physically contiguous buffers. The VCMM is an attempt to provide a framework where these and all the other buffer types can be unified. Why video4linux can't use the DMA API? Doing DMA with vmalloc'ed buffers is a thing that we should avoid (there are some exceptions like xfs though). I'm not sure, but I know that it makes the distinction. From video4linux/videobuf: media/videobuf-dma-sg.h /* Physically scattered */ media/videobuf-vmalloc.h /* vmalloc() buffers*/ media/videobuf-dma-contig.h /* Physically contiguous */ -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
On Tue, Jul 13, 2010 at 03:03:25PM +0900, FUJITA Tomonori wrote: On Mon, 12 Jul 2010 22:57:06 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: FUJITA Tomonori wrote: On Thu, 08 Jul 2010 16:59:52 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: The problem I'm trying to solve boils down to this: map a set of contiguous physical buffers to an aligned IOMMU address. I need to allocate the set of physical buffers in a particular way: use 1 MB contiguous physical memory, then 64 KB, then 4 KB, etc. and I need to align the IOMMU address in a particular way. Sounds like the DMA API already supports what you want. You can set segment_boundary_mask in struct device_dma_parameters if you want to align the IOMMU address. See IOMMU implementations that support dma_get_seg_boundary() properly. That function takes the wrong argument in a VCM world: unsigned long dma_get_seg_boundary(struct device *dev); The boundary should be an attribute of the device side mapping, independent of the device. This would allow better code reuse. You mean that you want to specify this alignment attribute every time you create an IOMMU mapping? Then you can set segment_boundary_mask every time you create an IOMMU mapping. It's odd but it should work. Kinda. I want to forget about IOMMUs, devices and CPUs. I just want to create a mapping that has the alignment I specify, regardless of the mapper. The mapping is created on a VCM and the VCM is associated with a mapper: a CPU, an IOMMU'd device or a direct mapped device. Another possible solution is extending struct dma_attrs. We could add the alignment attribute to it. That may be useful, but in the current DMA-API may be seen as redundant info. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Joerg Roedel wrote: On Fri, Jul 02, 2010 at 12:09:02AM -0700, Zach Pfeffer wrote: Hari Kanigeri wrote: He demonstrated the usage of his code in one of the emails he sent out initially. Did you go over that, and what (or how many) step would you use with the current code to do the same thing? -- So is this patch set adding layers and abstractions to help the User ? If the idea is to share some memory across multiple devices, I guess you can achieve the same by calling the map function provided by iommu module and sharing the mapped address to the 10's or 100's of devices to access the buffers. You would only need a dedicated virtual pool per IOMMU device to manage its virtual memory allocations. Yeah, you can do that. My idea is to get away from explicit addressing and encapsulate the device address to physical address link into a mapping. The DMA-API already does this with the help of IOMMUs if they are present. What is the benefit of your approach over that? The grist to the DMA-API mill is the opaque scatterlist. Each scatterlist element brings together a physical address and a bus address that may be different. The set of scatterlist elements constitute both the set of physical buffers and the mappings to those buffers. My approach separates these two things into a struct physmem which contains the set of physical buffers and a struct reservation which contains the set of bus addresses (or device addresses). Each element in the struct physmem may be of various lengths (without resorting to chaining). A map call maps the one set to the other. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Joerg Roedel wrote: On Thu, Jul 01, 2010 at 03:00:17PM -0700, Zach Pfeffer wrote: Additionally, the current IOMMU interface does not allow users to associate one page table with multiple IOMMUs [...] Thats not true. Multiple IOMMUs are completly handled by the IOMMU drivers. In the case of the IOMMU-API backend drivers this also includes the ability to use page-tables on multiple IOMMUs. Yeah. I see that now. Since the particular topology is run-time configurable all of these use-cases and more can be expressed without pushing the topology into the low-level IOMMU driver. The IOMMU driver has to know about the topology anyway because it needs to know which IOMMU it needs to program for a particular device. Perhaps, but why not create a VCM which can be shared across all mappers in the system? Why bury it in a device driver and make all IOMMU device drivers managed their own virtual spaces? Practically this would entail a minor refactor to the fledging IOMMU interface; adding associate and activate ops. Already, there are ~20 different IOMMU map implementations in the kernel. Had the Linux kernel had the VCMM, many of those implementations could have leveraged the mapping and topology management of a VCMM, while focusing on a few key hardware specific functions (map this physical address, program the page table base register). I partially agree here. All the IOMMU implementations in the Linux kernel have a lot of functionality in common where code could be shared. Work to share code has been done in the past by Fujita Tomonori but there are more places to work on. I am just not sure if a new front-end API is the right way to do this. I don't really think its a new front end API. Its just an API that allows easier mapping manipulation than the current APIs. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Joerg Roedel wrote: On Fri, Jul 02, 2010 at 12:33:51AM -0700, Zach Pfeffer wrote: Daniel Walker wrote: So if we include this code which map implementations could you collapse into this implementations ? Generally , what currently existing code can VCMM help to eliminate? In theory, it can eliminate all code the interoperates between IOMMU, CPU and non-IOMMU based devices and all the mapping code, alignment, mapping attribute and special block size support that's been implemented. Thats a very abstract statement. Can you point to particular code files and give a rough sketch how it could be improved using VCMM? I can. Not to single out a particular subsystem, but the video4linux code contains interoperation code to abstract the difference between sg buffers, vmalloc buffers and physically contiguous buffers. The VCMM is an attempt to provide a framework where these and all the other buffer types can be unified. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Joerg Roedel wrote: On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote: Andi Kleen wrote: Hmm? dma_map_* does not change any CPU mappings. It only sets up DMA mapping(s). Sure, but I was saying that iommu_map() doesn't just set up the IOMMU mappings, its sets up both the iommu and kernel buffer mappings. What do you mean by kernel buffer mappings? In-kernel mappings whose addresses can be dereferenced. That assumes that all the IOMMUs on the system support the same page table format, right? Actually no. Since the VCMM abstracts a page-table as a Virtual Contiguous Region (VCM) a VCM can be associated with any device, regardless of their individual page table format. The IOMMU-API abstracts a page-table as a domain which can also be associated with any device (behind an iommu). It does, but only by convention. The domain member is just a big catchall void *. It would be more useful to factor out a VCM abstraction, with associated ops. As it stands all IOMMU device driver writters have to re-invent IOMMU virtual address management. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
FUJITA Tomonori wrote: On Thu, 08 Jul 2010 16:59:52 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: The problem I'm trying to solve boils down to this: map a set of contiguous physical buffers to an aligned IOMMU address. I need to allocate the set of physical buffers in a particular way: use 1 MB contiguous physical memory, then 64 KB, then 4 KB, etc. and I need to align the IOMMU address in a particular way. Sounds like the DMA API already supports what you want. You can set segment_boundary_mask in struct device_dma_parameters if you want to align the IOMMU address. See IOMMU implementations that support dma_get_seg_boundary() properly. That function takes the wrong argument in a VCM world: unsigned long dma_get_seg_boundary(struct device *dev); The boundary should be an attribute of the device side mapping, independent of the device. This would allow better code reuse. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
Russell King - ARM Linux wrote: On Wed, Jul 07, 2010 at 03:44:27PM -0700, Zach Pfeffer wrote: The DMA API handles the allocation and use of DMA channels. It can configure physical transfer settings, manage scatter-gather lists, etc. You're confused about what the DMA API is. You're talking about the DMA engine subsystem (drivers/dma) not the DMA API (see Documentation/DMA-API.txt, include/linux/dma-mapping.h, and arch/arm/include/asm/dma-mapping.h) Thanks for the clarification. The VCM allows all device buffers to be passed between all devices in the system without passing those buffers through each domain's API. This means that instead of writing code to interoperate between DMA engines, IOMMU mapped spaces, CPUs and physically addressed devices the user can simply target a device with a buffer using the same API regardless of how that device maps or otherwise accesses the buffer. With the DMA API, if we have a SG list which refers to the physical pages (as a struct page, offset, length tuple), the DMA API takes care of dealing with CPU caches and IOMMUs to make the data in the buffer visible to the target device. It provides you with a set of cookies referring to the SG lists, which may be coalesced if the IOMMU can do so. If you have a kernel virtual address, the DMA API has single buffer mapping/unmapping functions to do the same thing, and provide you with a cookie to pass to the device to refer to that buffer. These cookies are whatever the device needs to be able to access the buffer - for instance, if system SDRAM is located at 0xc000 virtual, 0x8000 physical and 0x4000 as far as the DMA device is concerned, then the cookie for a buffer at 0xc000 virtual will be 0x4000 and not 0x8000. It sounds like I've got some work to do. I appreciate the feedback. The problem I'm trying to solve boils down to this: map a set of contiguous physical buffers to an aligned IOMMU address. I need to allocate the set of physical buffers in a particular way: use 1 MB contiguous physical memory, then 64 KB, then 4 KB, etc. and I need to align the IOMMU address in a particular way. I also need to swap out the IOMMU address spaces and map the buffers into the kernel. I have this all solved, but it sounds like I'll need to migrate to the DMA API to upstream it. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
This patch contains the documentation for the API, termed the Virtual Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, VM to device and device to IOMMU interoperation code to be refactored into platform independent code. Comments, suggestions and criticisms are welcome and wanted. Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org --- Documentation/vcm.txt | 587 + 1 files changed, 587 insertions(+), 0 deletions(-) create mode 100644 Documentation/vcm.txt diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt new file mode 100644 index 000..1c6a8be --- /dev/null +++ b/Documentation/vcm.txt @@ -0,0 +1,587 @@ +What is this document about? + + +This document covers how to use the Virtual Contiguous Memory Manager +(VCMM), how the first implementation works with a specific low-level +Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used +from user-space. It also contains a section that describes why something +like the VCMM is needed in the kernel. + +If anything in this document is wrong, please send patches to the +maintainer of this file, listed at the bottom of the document. + + +The Virtual Contiguous Memory Manager += + +The VCMM was built to solve the system-wide memory mapping issues that +occur when many bus-masters have IOMMUs. + +An IOMMU maps device addresses to physical addresses. It also insulates +the system from spurious or malicious device bus transactions and allows +fine-grained mapping attribute control. The Linux kernel core does not +contain a generic API to handle IOMMU mapped memory; device driver writers +must implement device specific code to interoperate with the Linux kernel +core. As the number of IOMMUs increases, coordinating the many address +spaces mapped by all discrete IOMMUs becomes difficult without in-kernel +support. + +The VCMM API enables device independent IOMMU control, virtual memory +manager (VMM) interoperation and non-IOMMU enabled device interoperation +by treating devices with or without IOMMUs and all CPUs with or without +MMUs, their mapping contexts and their mappings using common +abstractions. Physical hardware is given a generic device type and mapping +contexts are abstracted into Virtual Contiguous Memory (VCM) +regions. Users reserve memory from VCMs and back their reservations +with physical memory. + +Why the VCMM is Needed +-- + +Driver writers who control devices with IOMMUs must contend with device +control and memory management. Driver writers have a large device driver +API that they can leverage to control their devices, but they are lacking +a unified API to help them program mappings into IOMMUs and share those +mappings with other devices and CPUs in the system. + +Sharing is complicated by Linux's CPU-centric VMM. The CPU-centric model +generally makes sense because average hardware only contains a MMU for the +CPU and possibly a graphics MMU. If every device in the system has one or +more MMUs the CPU-centric memory management (MM) programming model breaks +down. + +Abstracting IOMMU device programming into a common API has already begun +in the Linux kernel. It was built to abstract the difference between AMD +and Intel IOMMUs to support x86 virtualization on both platforms. The +interface is listed in include/linux/iommu.h. It contains +interfaces for mapping and unmapping as well as domain management. This +interface has not gained widespread use outside the x86; PA-RISC, Alpha +and SPARC architectures and ARM and PowerPC platforms all use their own +mapping modules to control their IOMMUs. The VCMM contains an IOMMU +programming layer, but since its abstraction supports map management +independent of device control, the layer is not used directly. This +higher-level view enables a new kernel service, not just an IOMMU +interoperation layer. + +The General Idea: Map Management using Graphs +- + +Looking at mapping from a system-wide perspective reveals a general graph +problem. The VCMM's API is built to manage the general mapping graph. Each +node that talks to memory, either through an MMU or directly (physically +mapped) can be thought of as the device-end of a mapping edge. The other +edge is the physical memory (or intermediate virtual space) that is +mapped. + +In the direct-mapped case the device is assigned a one-to-one MMU. This +scheme allows direct mapped devices to participate in general graph +management. + +The CPU nodes can also be brought under the same mapping abstraction with +the use of a light overlay on the existing VMM. This light overlay allows +VMM-managed mappings to interoperate with the common API. The light +overlay enables this without substantial modifications to the existing +VMM. + +In addition to CPU nodes that are running Linux (and the VMM), remote CPU +nodes that may
[RFC 2/3] mm: iommu: A physical allocator for the VCMM
The Virtual Contiguous Memory Manager (VCMM) needs a physical pool to allocate from. It breaks up the pool into sub-pools of same-sized chunks. In particular, it breaks the pool it manages into sub-pools of 1 MB, 64 KB and 4 KB chunks. When a user makes a request, this allocator satisfies that request from the sub-pools using a maximum-munch strategy. This strategy attempts to satisfy a request using the largest chunk-size without over-allocating, then moving on to the next smallest size without over-allocating and finally completing the request with the smallest sized chunk, over-allocating if necessary. The maximum-munch strategy allows physical page allocation for small TLBs that need to map a given range using the minimum number of mappings. Although the allocator has been configured for 1 MB, 64 KB and 4 KB chunks, it can be easily extended to other chunk sizes. Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org --- arch/arm/mm/vcm_alloc.c | 425 + include/linux/vcm_alloc.h | 70 2 files changed, 495 insertions(+), 0 deletions(-) create mode 100644 arch/arm/mm/vcm_alloc.c create mode 100644 include/linux/vcm_alloc.h diff --git a/arch/arm/mm/vcm_alloc.c b/arch/arm/mm/vcm_alloc.c new file mode 100644 index 000..e592e71 --- /dev/null +++ b/arch/arm/mm/vcm_alloc.c @@ -0,0 +1,425 @@ +/* Copyright (c) 2010, Code Aurora Forum. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA. + */ + +#include linux/kernel.h +#include linux/slab.h +#include linux/module.h +#include linux/vcm_alloc.h +#include linux/string.h +#include asm/sizes.h + +/* Amount of memory managed by VCM */ +#define TOTAL_MEM_SIZE SZ_32M + +static unsigned int base_pa = 0x8000; +int basicalloc_init; + +int chunk_sizes[NUM_CHUNK_SIZES] = {SZ_1M, SZ_64K, SZ_4K}; +int init_num_chunks[] = { + (TOTAL_MEM_SIZE/2) / SZ_1M, + (TOTAL_MEM_SIZE/4) / SZ_64K, + (TOTAL_MEM_SIZE/4) / SZ_4K +}; +#define LAST_SZ() (ARRAY_SIZE(chunk_sizes) - 1) + +#define vcm_alloc_err(a, ...) \ + pr_err(ERROR %s %i a, __func__, __LINE__, ##__VA_ARGS__) + +struct phys_chunk_head { + struct list_head head; + int num; +}; + +struct phys_mem { + struct phys_chunk_head heads[ARRAY_SIZE(chunk_sizes)]; +} phys_mem; + +static int is_allocated(struct list_head *allocated) +{ + /* This should not happen under normal conditions */ + if (!allocated) { + vcm_alloc_err(no allocated\n); + return 0; + } + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + return !list_empty(allocated); +} + +static int count_allocated_size(enum chunk_size_idx idx) +{ + int cnt = 0; + struct phys_chunk *chunk, *tmp; + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + list_for_each_entry_safe(chunk, tmp, +phys_mem.heads[idx].head, list) { + if (is_allocated(chunk-allocated)) + cnt++; + } + + return cnt; +} + + +int vcm_alloc_get_mem_size(void) +{ + return TOTAL_MEM_SIZE; +} +EXPORT_SYMBOL(vcm_alloc_get_mem_size); + + +int vcm_alloc_blocks_avail(enum chunk_size_idx idx) +{ + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + return phys_mem.heads[idx].num; +} +EXPORT_SYMBOL(vcm_alloc_blocks_avail); + + +int vcm_alloc_get_num_chunks(void) +{ + return ARRAY_SIZE(chunk_sizes); +} +EXPORT_SYMBOL(vcm_alloc_get_num_chunks); + + +int vcm_alloc_all_blocks_avail(void) +{ + int i; + int cnt = 0; + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + for (i = 0; i ARRAY_SIZE(chunk_sizes); ++i) + cnt += vcm_alloc_blocks_avail(i); + return cnt; +} +EXPORT_SYMBOL(vcm_alloc_all_blocks_avail); + + +int vcm_alloc_count_allocated(void) +{ + int i; + int cnt = 0; + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + for (i = 0; i ARRAY_SIZE
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Andi Kleen wrote: The standard Linux approach to such a problem is to write a library that drivers can use for common functionality, not put a middle layer inbetween. Libraries are much more flexible than layers. I've been thinking about this statement. Its very true. I use the genalloc lib which is a great piece of software to manage VCMs (domains in linux/iommu.h parlance?). On our hardware we have 3 things we have to do, use the minimum set of mappings to map a buffer because of the extremely small TLBs in all the IOMMUs we have to support, use special virtual alignments and direct various multimedia flows through certain IOMMUs. To support this we: 1. Use the genalloc lib to allocate virtual space for our IOMMUs, allowing virtual alignment to be specified. 2. Have a maxmunch allocator that manages our own physical pool. I think I may be able to support this using the iommu interface and some util functions. The big thing that's lost is the unified topology management, but as demonstrated that may fall out from a refactor. Anyhow, sounds like a few things to try. Thanks for the feedback so far. I'll do some refactoring and see what's missing. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Andi Kleen wrote: The VCMM provides a more abstract, global view with finer-grained control of each mapping a user wants to create. For instance, the semantics of iommu_map preclude its use in setting up just the IOMMU side of a mapping. With a one-sided map, two IOMMU devices can be Hmm? dma_map_* does not change any CPU mappings. It only sets up DMA mapping(s). Sure, but I was saying that iommu_map() doesn't just set up the IOMMU mappings, its sets up both the iommu and kernel buffer mappings. Additionally, the current IOMMU interface does not allow users to associate one page table with multiple IOMMUs unless the user explicitly That assumes that all the IOMMUs on the system support the same page table format, right? Actually no. Since the VCMM abstracts a page-table as a Virtual Contiguous Region (VCM) a VCM can be associated with any device, regardless of their individual page table format. As I understand your approach would help if you have different IOMMus with an different low level interface, which just happen to have the same pte format. Is that very likely? I would assume if you have lots of copies of the same IOMMU in the system then you could just use a single driver with multiple instances that share some state for all of them. That model would fit in the current interfaces. There's no reason multiple instances couldn't share the same allocation data structure. And if you have lots of truly different IOMMUs then they likely won't be able to share PTEs at the hardware level anyways, because the formats are too different. See VCM's above. The VCMM takes the long view. Its designed for a future in which the number of IOMMUs will go up and the ways in which these IOMMUs are composed will vary from system to system, and may vary at runtime. Already, there are ~20 different IOMMU map implementations in the kernel. Had the Linux kernel had the VCMM, many of those implementations could have leveraged the mapping and topology management of a VCMM, while focusing on a few key hardware specific functions (map this physical address, program the page table base register). The standard Linux approach to such a problem is to write a library that drivers can use for common functionality, not put a middle layer in between. Libraries are much more flexible than layers. That's true up to the, is this middle layer so useful that its worth it point. The VM is a middle layer, you could make the same argument about it, the mapping code isn't too hard, just map in the memory that you need and be done with it. But the VM middle layer provides a clean separation between page frames and pages which turns out to be infinitely useful. The VCMM is built in the same spirit, It says things like, mapping is a global problem, I'm going to abstract entire virtual spaces and allow people arbitrary chuck size allocation, I'm not going to care that my device is physically mapping this buffer and this other device is a virtual, virtual device. That said I'm not sure there's all that much duplicated code anyways. A lot of the code is always IOMMU specific. The only piece which might be shareable is the mapping allocation, but I don't think that's very much of a typical driver In my old pci-gart driver the allocation was all only a few lines of code, although given it was somewhat dumb in this regard because it only managed a small remapping window. I agree that its not a lot of code, and that this layer may be a bit heavy, but I'd like to focus on is a global mapping view useful and if so is something like the graph management that the VCMM provides generally useful. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Hari Kanigeri wrote: He demonstrated the usage of his code in one of the emails he sent out initially. Did you go over that, and what (or how many) step would you use with the current code to do the same thing? -- So is this patch set adding layers and abstractions to help the User ? If the idea is to share some memory across multiple devices, I guess you can achieve the same by calling the map function provided by iommu module and sharing the mapped address to the 10's or 100's of devices to access the buffers. You would only need a dedicated virtual pool per IOMMU device to manage its virtual memory allocations. Yeah, you can do that. My idea is to get away from explicit addressing and encapsulate the device address to physical address link into a mapping. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Hari Kanigeri wrote: The VCMM takes the long view. Its designed for a future in which the number of IOMMUs will go up and the ways in which these IOMMUs are composed will vary from system to system, and may vary at runtime. Already, there are ~20 different IOMMU map implementations in the kernel. Had the Linux kernel had the VCMM, many of those implementations could have leveraged the mapping and topology management of a VCMM, while focusing on a few key hardware specific functions (map this physical address, program the page table base register). -- Sounds good. Did you think of a way to handle the cases where one of the Device that is using the mapped address crashed ? How is the physical address unbacked in this case ? Actually the API takes care of that by design. Since the physical space is managed apart from the mapper the mapper can crash and not affect the physical memory allocation. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Daniel Walker wrote: On Thu, 2010-07-01 at 15:00 -0700, Zach Pfeffer wrote: Additionally, the current IOMMU interface does not allow users to associate one page table with multiple IOMMUs unless the user explicitly wrote a muxed device underneith the IOMMU interface. This also could be done, but would have to be done for every such use case. Since the particular topology is run-time configurable all of these use-cases and more can be expressed without pushing the topology into the low-level IOMMU driver. The VCMM takes the long view. Its designed for a future in which the number of IOMMUs will go up and the ways in which these IOMMUs are composed will vary from system to system, and may vary at runtime. Already, there are ~20 different IOMMU map implementations in the kernel. Had the Linux kernel had the VCMM, many of those implementations could have leveraged the mapping and topology management of a VCMM, while focusing on a few key hardware specific functions (map this physical address, program the page table base register). So if we include this code which map implementations could you collapse into this implementations ? Generally , what currently existing code can VCMM help to eliminate? In theory, it can eliminate all code the interoperates between IOMMU, CPU and non-IOMMU based devices and all the mapping code, alignment, mapping attribute and special block size support that's been implemented. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Andi Kleen wrote: On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote: Andi Kleen wrote: The VCMM provides a more abstract, global view with finer-grained control of each mapping a user wants to create. For instance, the semantics of iommu_map preclude its use in setting up just the IOMMU side of a mapping. With a one-sided map, two IOMMU devices can be Hmm? dma_map_* does not change any CPU mappings. It only sets up DMA mapping(s). Sure, but I was saying that iommu_map() doesn't just set up the IOMMU mappings, its sets up both the iommu and kernel buffer mappings. Normally the data is already in the kernel or mappings, so why would you need another CPU mapping too? Sometimes the CPU code has to scatter-gather, but that is considered acceptable (and if it really cannot be rewritten to support sg it's better to have an explicit vmap operation) In general on larger systems with many CPUs changing CPU mappings also gets expensive (because you have to communicate with all cores), and is not a good idea on frequent IO paths. That's all true, but what a VCMM allows is for these trade-offs to be made by the user for future systems. It may not be too expensive to change the IO path around on future chips or the user may be okay with the performance penalty. A VCMM doesn't enforce a policy on the user, it lets the user make their own policy. Additionally, the current IOMMU interface does not allow users to associate one page table with multiple IOMMUs unless the user explicitly That assumes that all the IOMMUs on the system support the same page table format, right? Actually no. Since the VCMM abstracts a page-table as a Virtual Contiguous Region (VCM) a VCM can be associated with any device, regardless of their individual page table format. But then there is no real page table sharing, isn't it? The real information should be in the page tables, nowhere else. Yeah, and the implementation ensures that it. The VCMM just adds a few fields like start_addr, len and the device. The device still manages the its page-tables. The standard Linux approach to such a problem is to write a library that drivers can use for common functionality, not put a middle layer in between. Libraries are much more flexible than layers. That's true up to the, is this middle layer so useful that its worth it point. The VM is a middle layer, you could make the same argument about it, the mapping code isn't too hard, just map in the memory that you need and be done with it. But the VM middle layer provides a clean separation between page frames and pages which turns out to be Actually we use both PFNs and struct page *s in many layers up and down, there's not really any layering in that. Sure, but the PFNs and the struct page *s are the middle layer. Its just that things haven't been layered on top of them. A VCMM is the higher level abstraction, since it allows the size of the PFs to vary and the consumers of the VCM's to be determined at run-time. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 1/3 v3] mm: iommu: An API to unify IOMMU, CPU and device memory management
This patch contains the documentation for the API, termed the Virtual Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, VM to device and device to IOMMU interoperation code to be refactored into platform independent code. Comments, suggestions and criticisms are welcome and wanted. Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org --- Documentation/vcm.txt | 587 + 1 files changed, 587 insertions(+), 0 deletions(-) create mode 100644 Documentation/vcm.txt diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt new file mode 100644 index 000..1c6a8be --- /dev/null +++ b/Documentation/vcm.txt @@ -0,0 +1,587 @@ +What is this document about? + + +This document covers how to use the Virtual Contiguous Memory Manager +(VCMM), how the first implementation works with a specific low-level +Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used +from user-space. It also contains a section that describes why something +like the VCMM is needed in the kernel. + +If anything in this document is wrong, please send patches to the +maintainer of this file, listed at the bottom of the document. + + +The Virtual Contiguous Memory Manager += + +The VCMM was built to solve the system-wide memory mapping issues that +occur when many bus-masters have IOMMUs. + +An IOMMU maps device addresses to physical addresses. It also insulates +the system from spurious or malicious device bus transactions and allows +fine-grained mapping attribute control. The Linux kernel core does not +contain a generic API to handle IOMMU mapped memory; device driver writers +must implement device specific code to interoperate with the Linux kernel +core. As the number of IOMMUs increases, coordinating the many address +spaces mapped by all discrete IOMMUs becomes difficult without in-kernel +support. + +The VCMM API enables device independent IOMMU control, virtual memory +manager (VMM) interoperation and non-IOMMU enabled device interoperation +by treating devices with or without IOMMUs and all CPUs with or without +MMUs, their mapping contexts and their mappings using common +abstractions. Physical hardware is given a generic device type and mapping +contexts are abstracted into Virtual Contiguous Memory (VCM) +regions. Users reserve memory from VCMs and back their reservations +with physical memory. + +Why the VCMM is Needed +-- + +Driver writers who control devices with IOMMUs must contend with device +control and memory management. Driver writers have a large device driver +API that they can leverage to control their devices, but they are lacking +a unified API to help them program mappings into IOMMUs and share those +mappings with other devices and CPUs in the system. + +Sharing is complicated by Linux's CPU-centric VMM. The CPU-centric model +generally makes sense because average hardware only contains a MMU for the +CPU and possibly a graphics MMU. If every device in the system has one or +more MMUs the CPU-centric memory management (MM) programming model breaks +down. + +Abstracting IOMMU device programming into a common API has already begun +in the Linux kernel. It was built to abstract the difference between AMD +and Intel IOMMUs to support x86 virtualization on both platforms. The +interface is listed in include/linux/iommu.h. It contains +interfaces for mapping and unmapping as well as domain management. This +interface has not gained widespread use outside the x86; PA-RISC, Alpha +and SPARC architectures and ARM and PowerPC platforms all use their own +mapping modules to control their IOMMUs. The VCMM contains an IOMMU +programming layer, but since its abstraction supports map management +independent of device control, the layer is not used directly. This +higher-level view enables a new kernel service, not just an IOMMU +interoperation layer. + +The General Idea: Map Management using Graphs +- + +Looking at mapping from a system-wide perspective reveals a general graph +problem. The VCMM's API is built to manage the general mapping graph. Each +node that talks to memory, either through an MMU or directly (physically +mapped) can be thought of as the device-end of a mapping edge. The other +edge is the physical memory (or intermediate virtual space) that is +mapped. + +In the direct-mapped case the device is assigned a one-to-one MMU. This +scheme allows direct mapped devices to participate in general graph +management. + +The CPU nodes can also be brought under the same mapping abstraction with +the use of a light overlay on the existing VMM. This light overlay allows +VMM-managed mappings to interoperate with the common API. The light +overlay enables this without substantial modifications to the existing +VMM. + +In addition to CPU nodes that are running Linux (and the VMM), remote CPU +nodes that may
Re: [RFC 1/3] mm: iommu: An API to unify IOMMU, CPU and device memory management
Thank you for the corrections. I'm correcting them now. Some responses: Randy Dunlap wrote: +struct vcm *vcm_create(size_t start_addr, size_t len); Seems odd to use size_t for start_addr. I used size_t because I wanted to allow the start_addr the same range as len. Is there a better type to use? I see 'unsigned long' used throughout the mm code. Perhaps that's better for both the start_addr and len. +A Reservation is created and destroyed with: + +struct res *vcm_reserve(struct vcm *vcm, size_t len, uint32_t attr); s/uint32_t/u32/ ? Sure. +Associate and activate all three to their respective devices: + +avcm_iommu = vcm_assoc(vcm_iommu, dev_iommu, attr0); +avcm_onetoone = vcm_assoc(vcm_onetoone, dev_onetoone, attr1); +avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr2); error handling on vcm_assoc() failures? I'll add the deassociate call to the example. +res_iommu = vcm_reserve(vcm_iommu, SZ_2MB + SZ_4K, attr); +res_onetoone = vcm_reserve(vcm_onetoone, SZ_2MB + SZ_4K, attr); +res_vmm = vcm_reserve(vcm_vmm, SZ_2MB + SZ_4K, attr); error handling? I'll add it here too. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 1v2/3] mm: iommu: An API to unify IOMMU, CPU and device memory management
This patch contains the documentation for the API, termed the Virtual Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, VM to device and device to IOMMU interoperation code to be refactored into platform independent code. Comments, suggestions and criticisms are welcome and wanted. Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org --- Documentation/vcm.txt | 587 + 1 files changed, 587 insertions(+), 0 deletions(-) create mode 100644 Documentation/vcm.txt diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt new file mode 100644 index 000..b9029db --- /dev/null +++ b/Documentation/vcm.txt @@ -0,0 +1,587 @@ +What is this document about? + + +This document covers how to use the Virtual Contiguous Memory Manager +(VCMM), how the first implementation works with a specific low-level +Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used +from user-space. It also contains a section that describes why something +like the VCMM is needed in the kernel. + +If anything in this document is wrong, please send patches to the +maintainer of this file, listed at the bottom of the document. + + +The Virtual Contiguous Memory Manager += + +The VCMM was built to solve the system-wide memory mapping issues that +occur when many bus-masters have IOMMUs. + +An IOMMU maps device addresses to physical addresses. It also insulates +the system from spurious or malicious device bus transactions and allows +fine-grained mapping attribute control. The Linux kernel core does not +contain a generic API to handle IOMMU mapped memory; device driver writers +must implement device specific code to interoperate with the Linux kernel +core. As the number of IOMMUs increases, coordinating the many address +spaces mapped by all discrete IOMMUs becomes difficult without in-kernel +support. + +The VCMM API enables device independent IOMMU control, virtual memory +manager (VMM) interoperation and non-IOMMU enabled device interoperation +by treating devices with or without IOMMUs and all CPUs with or without +MMUs, their mapping contexts and their mappings using common +abstractions. Physical hardware is given a generic device type and mapping +contexts are abstracted into Virtual Contiguous Memory (VCM) +regions. Users reserve memory from VCMs and back their reservations +with physical memory. + +Why the VCMM is Needed +-- + +Driver writers who control devices with IOMMUs must contend with device +control and memory management. Driver writers have a large device driver +API that they can leverage to control their devices, but they are lacking +a unified API to help them program mappings into IOMMUs and share those +mappings with other devices and CPUs in the system. + +Sharing is complicated by Linux's CPU-centric VMM. The CPU-centric model +generally makes sense because average hardware only contains a MMU for the +CPU and possibly a graphics MMU. If every device in the system has one or +more MMUs the CPU-centric memory management (MM) programming model breaks +down. + +Abstracting IOMMU device programming into a common API has already begun +in the Linux kernel. It was built to abstract the difference between AMD +and Intel IOMMUs to support x86 virtualization on both platforms. The +interface is listed in include/linux/iommu.h. It contains +interfaces for mapping and unmapping as well as domain management. This +interface has not gained widespread use outside the x86; PA-RISC, Alpha +and SPARC architectures and ARM and PowerPC platforms all use their own +mapping modules to control their IOMMUs. The VCMM contains an IOMMU +programming layer, but since its abstraction supports map management +independent of device control, the layer is not used directly. This +higher-level view enables a new kernel service, not just an IOMMU +interoperation layer. + +The General Idea: Map Management using Graphs +- + +Looking at mapping from a system-wide perspective reveals a general graph +problem. The VCMM's API is built to manage the general mapping graph. Each +node that talks to memory, either through an MMU or directly (physically +mapped) can be thought of as the device-end of a mapping edge. The other +edge is the physical memory (or intermediate virtual space) that is +mapped. + +In the direct-mapped case the device is assigned a one-to-one MMU. This +scheme allows direct mapped devices to participate in general graph +management. + +The CPU nodes can also be brought under the same mapping abstraction with +the use of a light overlay on the existing VMM. This light overlay allows +VMM-managed mappings to interoperate with the common API. The light +overlay enables this without substantial modifications to the existing +VMM. + +In addition to CPU nodes that are running Linux (and the VMM), remote CPU +nodes that may
Re: [RFC 3/3] mm: iommu: The Virtual Contiguous Memory Manager
Andi Kleen wrote: Also for me it's still quite unclear why we would want this code at all... It doesn't seem to do anything you couldn't do with the existing interfaces. I don't know all that much about what Zach's done here, but from what he's said so far it looks like this help to manage lots of IOMMUs on a single system.. On x86 it seems like there's not all that many IOMMUs in comparison .. Zach mentioned 10 to 100 IOMMUs .. The current code can manage multiple IOMMUs fine. That's fair. The current code does manage multiple IOMMUs without issue for a static map topology. Its core function 'map' maps a physical chunk of some size into a IOMMU's address space and the kernel's address space for some domain. The VCMM provides a more abstract, global view with finer-grained control of each mapping a user wants to create. For instance, the symantics of iommu_map preclude its use in setting up just the IOMMU side of a mapping. With a one-sided map, two IOMMU devices can be pointed to the same physical memory without mapping that same memory into the kernel's address space. Additionally, the current IOMMU interface does not allow users to associate one page table with multiple IOMMUs unless the user explicitly wrote a muxed device underneith the IOMMU interface. This also could be done, but would have to be done for every such use case. Since the particular topology is run-time configurable all of these use-cases and more can be expressed without pushing the topology into the low-level IOMMU driver. The VCMM takes the long view. Its designed for a future in which the number of IOMMUs will go up and the ways in which these IOMMUs are composed will vary from system to system, and may vary at runtime. Already, there are ~20 different IOMMU map implementations in the kernel. Had the Linux kernel had the VCMM, many of those implementations could have leveraged the mapping and topology management of a VCMM, while focusing on a few key hardware specific functions (map this physical address, program the page table base register). -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 1/3] mm: iommu: An API to unify IOMMU, CPU and device memory management
This patch contains the documentation for the API, termed the Virtual Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, VM to device and device to IOMMU interoperation code to be refactored into platform independent code. Comments, suggestions and criticisms are welcome and wanted. Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org --- Documentation/vcm.txt | 583 + 1 files changed, 583 insertions(+), 0 deletions(-) create mode 100644 Documentation/vcm.txt diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt new file mode 100644 index 000..d29c757 --- /dev/null +++ b/Documentation/vcm.txt @@ -0,0 +1,583 @@ +What is this document about? + + +This document covers how to use the Virtual Contiguous Memory Manager +(VCMM), how the first implmentation works with a specific low-level +Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used +from user-space. It also contains a section that describes why something +like the VCMM is needed in the kernel. + +If anything in this document is wrong please send patches to the +maintainer of this file, listed at the bottom of the document. + + +The Virtual Contiguous Memory Manager += + +The VCMM was built to solve the system-wide memory mapping issues that +occur when many bus-masters have IOMMUs. + +An IOMMU maps device addresses to physical addresses. It also insulates +the system from spurious or malicious device bus transactions and allows +fine-grained mapping attribute control. The Linux kernel core does not +contain a generic API to handle IOMMU mapped memory; device driver writers +must implement device specific code to interoperate with the Linux kernel +core. As the number of IOMMUs increases, coordinating the many address +spaces mapped by all discrete IOMMUs becomes difficult without in-kernel +support. + +The VCMM API enables device independent IOMMU control, virtual memory +manager (VMM) interoperation and non-IOMMU enabled device interoperation +by treating devices with or without IOMMUs and all CPUs with or without +MMUs, their mapping contexts and their mappings using common +abstractions. Physical hardware is given a generic device type and mapping +contexts are abstracted into Virtual Contiguous Memory (VCM) +regions. Users reserve memory from VCMs and back their reservations +with physical memory. + +Why the VCMM is Needed +-- + +Driver writers who control devices with IOMMUs must contend with device +control and memory management. Driver writers have a large device driver +API that they can leverage to control their devices, but they are lacking +a unified API to help them program mappings into IOMMUs and share those +mappings with other devices and CPUs in the system. + +Sharing is complicated by Linux's CPU centric VMM. The CPU centric model +generally makes sense because average hardware only contains a MMU for the +CPU and possibly a graphics MMU. If every device in the system has one or +more MMUs the CPU centric memory management (MM) programming model breaks +down. + +Abstracting IOMMU device programming into a common API has already begun +in the Linux kernel. It was built to abstract the difference between AMDs +and Intels IOMMUs to support x86 virtualization on both platforms. The +interface is listed in kernel/include/linux/iommu.h. It contains +interfaces for mapping and unmapping as well as domain management. This +interface has not gained widespread use outside the x86; PA-RISC, Alpha +and SPARC architectures and ARM and PowerPC platforms all use their own +mapping modules to control their IOMMUs. The VCMM contains an IOMMU +programming layer, but since its abstraction supports map management +independent of device control, the layer is not used directly. This +higher-level view enables a new kernel service, not just an IOMMU +interoperation layer. + +The General Idea: Map Management using Graphs +- + +Looking at mapping from a system-wide perspective reveals a general graph +problem. The VCMMs API is built to manage the general mapping graph. Each +node that talks to memory, either through an MMU or directly (physically +mapped) can be thought of as the device-end of a mapping edge. The other +edge is the physical memory (or intermediate virtual space) that is +mapped. + +In the direct mapped case the device is assigned a one-to-one MMU. This +scheme allows direct mapped devices to participate in general graph +management. + +The CPU nodes can also be brought under the same mapping abstraction with +the use of a light overlay on the existing VMM. This light overlay allows +VMM managed mappings to interoperate with the common API. The light +overlay enables this without substantial modifications to the existing +VMM. + +In addition to CPU nodes that are running Linux (and the VMM), remote CPU +nodes that may
[RFC 2/3] mm: iommu: A physical allocator for the VCMM
The Virtual Contiguous Memory Manager (VCMM) needs a physical pool to allocate from. It breaks up the pool into sub-pools of same-sized chunks. In particular, it breaks the pool it manages into sub-pools of 1 MB, 64 KB and 4 KB chunks. When a user makes a request, this allocator satisfies that request from the sub-pools using a maximum-munch strategy. This strategy attempts to satisfy a request using the largest chunk-size without over-allocating, then moving on to the next smallest size without over-allocating and finally completing the request with the smallest sized chunk, over-allocating if necessary. The maximum-munch strategy allows physical page allocation for small TLBs that need to map a given range using the minimum number of mappings. Although the allocator has been configured for 1 MB, 64 KB and 4 KB chunks, it can be easily extended to other chunk sizes. Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org --- arch/arm/mm/vcm_alloc.c | 425 + include/linux/vcm_alloc.h | 70 2 files changed, 495 insertions(+), 0 deletions(-) create mode 100644 arch/arm/mm/vcm_alloc.c create mode 100644 include/linux/vcm_alloc.h diff --git a/arch/arm/mm/vcm_alloc.c b/arch/arm/mm/vcm_alloc.c new file mode 100644 index 000..e592e71 --- /dev/null +++ b/arch/arm/mm/vcm_alloc.c @@ -0,0 +1,425 @@ +/* Copyright (c) 2010, Code Aurora Forum. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA. + */ + +#include linux/kernel.h +#include linux/slab.h +#include linux/module.h +#include linux/vcm_alloc.h +#include linux/string.h +#include asm/sizes.h + +/* Amount of memory managed by VCM */ +#define TOTAL_MEM_SIZE SZ_32M + +static unsigned int base_pa = 0x8000; +int basicalloc_init; + +int chunk_sizes[NUM_CHUNK_SIZES] = {SZ_1M, SZ_64K, SZ_4K}; +int init_num_chunks[] = { + (TOTAL_MEM_SIZE/2) / SZ_1M, + (TOTAL_MEM_SIZE/4) / SZ_64K, + (TOTAL_MEM_SIZE/4) / SZ_4K +}; +#define LAST_SZ() (ARRAY_SIZE(chunk_sizes) - 1) + +#define vcm_alloc_err(a, ...) \ + pr_err(ERROR %s %i a, __func__, __LINE__, ##__VA_ARGS__) + +struct phys_chunk_head { + struct list_head head; + int num; +}; + +struct phys_mem { + struct phys_chunk_head heads[ARRAY_SIZE(chunk_sizes)]; +} phys_mem; + +static int is_allocated(struct list_head *allocated) +{ + /* This should not happen under normal conditions */ + if (!allocated) { + vcm_alloc_err(no allocated\n); + return 0; + } + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + return !list_empty(allocated); +} + +static int count_allocated_size(enum chunk_size_idx idx) +{ + int cnt = 0; + struct phys_chunk *chunk, *tmp; + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + list_for_each_entry_safe(chunk, tmp, +phys_mem.heads[idx].head, list) { + if (is_allocated(chunk-allocated)) + cnt++; + } + + return cnt; +} + + +int vcm_alloc_get_mem_size(void) +{ + return TOTAL_MEM_SIZE; +} +EXPORT_SYMBOL(vcm_alloc_get_mem_size); + + +int vcm_alloc_blocks_avail(enum chunk_size_idx idx) +{ + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + return phys_mem.heads[idx].num; +} +EXPORT_SYMBOL(vcm_alloc_blocks_avail); + + +int vcm_alloc_get_num_chunks(void) +{ + return ARRAY_SIZE(chunk_sizes); +} +EXPORT_SYMBOL(vcm_alloc_get_num_chunks); + + +int vcm_alloc_all_blocks_avail(void) +{ + int i; + int cnt = 0; + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + for (i = 0; i ARRAY_SIZE(chunk_sizes); ++i) + cnt += vcm_alloc_blocks_avail(i); + return cnt; +} +EXPORT_SYMBOL(vcm_alloc_all_blocks_avail); + + +int vcm_alloc_count_allocated(void) +{ + int i; + int cnt = 0; + + if (!basicalloc_init) { + vcm_alloc_err(no basicalloc_init\n); + return 0; + } + + for (i = 0; i ARRAY_SIZE
Re: [RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management
FUJITA Tomonori wrote: On Thu, 24 Jun 2010 23:48:50 -0700 Zach Pfeffer zpfef...@codeaurora.org wrote: Andi Kleen wrote: Zach Pfeffer zpfef...@codeaurora.org writes: This patch contains the documentation for and the main header file of the API, termed the Virtual Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, VM to device and device to IOMMU interoperation code to be refactored into platform independent code. I read all the description and it's still unclear what advantage this all has over the current architecture? At least all the benefits mentioned seem to be rather nebulous. Can you describe a concrete use case that is improved by this code directly? Sure. On a SoC with many IOMMUs (10-100), where each IOMMU may have its own set of page-tables or share page-tables, and where devices with and without IOMMUs and CPUs with or without MMUS want to communicate, an abstraction like the VCM helps manage all conceivable mapping topologies. In the same way that the Linux MM manages pages apart from page-frames, the VCMM allows the Linux MM to manage ideal memory regions, VCMs, apart from the actual memory region. One real scenario would be video playback from a file on a memory card. To read and display the video, a DMA engine would read blocks of data from the memory card controller into memory. These would typically be managed using a scatter-gather list. This list would be mapped into a contiguous buffer of the video decoder's IOMMU. The video decoder would write into a buffer mapped by the display engine's IOMMU as well as the CPU (if the kernel needed to intercept the buffers). In this instance, the video decoder's IOMMU and the display engine's IOMMU use different page-table formats. Using the VCM API, this topology can be created without worrying about the device's IOMMUs or how to map the buffers into the kernel, or how to interoperate with the scatter-gather list. The call flow would would go: Can you explain how you can't do the above with the existing API? Sure. You can do the same thing with the current API, but the VCM takes a wider view; the mapper is a parameter. Taking include/linux/iommu.h as a common interface, the key function is iommu_map(). This function maps a physical memory region, paddr, of gfp_order, to a virtual region starting at iova: extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, int gfp_order, int prot); Users who call this, kvm_iommu_map_pages() for instance, run similar loops: foreach page iommu_map(domain, va(page), ...) The VCM encapsulates this as vcm_back(). This function iterates over a set of physical regions and maps those physical regions to a virtual address space that has been associated with a mapper at run-time. The loop above, and the other loops (and other associated IOMMU software) that don't use the common interface like arch/powerpc/kernel/vio.c all do similar work. In the end the VCM's dynamic virtual region association mechanism (and multihomed physical memory targeting) allows all IOMMU mapping code in the system to use the same API. This may seem like syntactic sugar, but treating devices with IOMMUs (bus-masters), device with MMUs (CPUs) and devices without MMUs (DMA engines) as endpoints in a mapping graph allows new features to be developed. One such feature is system-wide memory migration (including memory that devices map). With a common API a loop like this can be written one place: foreach mapper of pa_region remap(mapper, new_pa_region) It could also be used for better power-management: foreach mapper of soon_to_be_powered_off_pa_region ask(mapper, soon_to_be_powered_off_pa_region) The VCM is just the first step. More concretely, the way the VCM works allows the transparent use and interoperation of different mapping chunk sizes. This is important in multimedia devices because IOMMU TLB misses may cause multimedia devices to miss their performance goals. Multi-chunk size support has been added for IOMMU mappers and wouldn't be hard to add to CPU mappers (CPU mappers still use 4KB). The general point of the VCMM is to allow users a higher level API than the current IOMMU abstraction provides that solves the general mapping problem. This means that all of the common mapping code would be written once. In addition, the API allows all the low level details of IOMMU programing and VM interoperation to be handled at the right level. Eventually the following functions could all be reworked and their users could call VCM functions. There are more IOMMUs (e.g. x86 has calgary, gart too). And what is the point of converting old IOMMUs (the majority of the below)? are there any potential users of your API for such old IOMMUs? That's a good question. I gave the list of the current IOMMU mapping functions to bring awareness to the fact that the general system-wide mapping
Re: [RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management
Andi Kleen wrote: Zach Pfeffer zpfef...@codeaurora.org writes: This patch contains the documentation for and the main header file of the API, termed the Virtual Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, VM to device and device to IOMMU interoperation code to be refactored into platform independent code. I read all the description and it's still unclear what advantage this all has over the current architecture? At least all the benefits mentioned seem to be rather nebulous. Can you describe a concrete use case that is improved by this code directly? Sure. On a SoC with many IOMMUs (10-100), where each IOMMU may have its own set of page-tables or share page-tables, and where devices with and without IOMMUs and CPUs with or without MMUS want to communicate, an abstraction like the VCM helps manage all conceivable mapping topologies. In the same way that the Linux MM manages pages apart from page-frames, the VCMM allows the Linux MM to manage ideal memory regions, VCMs, apart from the actual memory region. One real scenario would be video playback from a file on a memory card. To read and display the video, a DMA engine would read blocks of data from the memory card controller into memory. These would typically be managed using a scatter-gather list. This list would be mapped into a contiguous buffer of the video decoder's IOMMU. The video decoder would write into a buffer mapped by the display engine's IOMMU as well as the CPU (if the kernel needed to intercept the buffers). In this instance, the video decoder's IOMMU and the display engine's IOMMU use different page-table formats. Using the VCM API, this topology can be created without worrying about the device's IOMMUs or how to map the buffers into the kernel, or how to interoperate with the scatter-gather list. The call flow would would go: 1. Establish a memory region for the video decoder and the display engine that's 128 MB and starts at 0x1000. vcm_out = vcm_create(0x1000, SZ_128M); 2. Associate the memory region with the video decoder's IOMMU and the display engine's IOMMU. avcm_dec = vcm_assoc(vcm_out, video_dec_dev, 0); avcm_disp = vcm_assoc(vcm_out, disp_dev, 0); The 2 dev_ids, video_dec_dev and disp_dev allow the right IOMMU low-level functions to be called underneath. 3. Actually program the underlying IOMMUs. vcm_activate(avcm_dec); vcm_activate(avcm_disp); 4. Allocate 2 physical buffers that the DMA engine and video decoder will use. Make sure each buffer is 64 KB contiguous. buf_64k = vcm_phys_alloc(MT0, 2*SZ_64K, VCM_64KB); 5. Allocate a 16 MB buffer for the output of the video decoder and the input of the display engine. Use 1MB, 64KB and 4KB blocks to map the buffer. buf_frame = vcm_phys_alloc(MT0, SZ_16M); 6. Program the DMA controller. buf = vcm_get_next_phys_addr(buf_64k, NULL, len); while (buf) { dma_prg(buf); buf = vcm_get_next_phys_addr(buf_64k, NULL, len); } 7. Create virtual memory regions for the DMA buffers and the video decoder output from the vcm_out region. Make sure the buffers are aligned to the buffer size. res_64k = vcm_reserve(vcm_out, 8*SZ_64K, VCM_ALIGN_64K); res_16M = vcm_reserve(vcm_out, SZ_16M, VCM_ALIGN_16M); 8. Connect the virtual reservations with the physical allocations. vcm_back(res_64k, buf_64k); vcm_back(res_16M, buf_frame); 9. Program the decoder and the display engine with addresses from the IOMMU side of the mapping: base_64k = vcm_get_dev_addr(res_64k); base_16M = vcm_get_dev_addr(res_16M); 10. Create a kernel mapping to read and write the 16M buffer. cpu_vcm = vcm_create_from_prebuilt(VCM_PREBUILT_KERNEL); 11. Create a reservation on that prebuilt VCM. Use any alignment. res_cpu_16M = vcm_reserve(cpu_vcm, SZ_16M, 0); 12. Back the reservation using the same physical memory that the decoder and the display engine are looking at. vcm_back(res_cpu_16M, buf_frame); 13. Get a pointer that kernel can dereference. base_cpu_16M = vcm_get_dev_addr(res_cpu_16M); The general point of the VCMM is to allow users a higher level API than the current IOMMU abstraction provides that solves the general mapping problem. This means that all of the common mapping code would be written once. In addition, the API allows all the low level details of IOMMU programing and VM interoperation to be handled at the right level. Eventually the following functions could all be reworked and their users could call VCM functions. arch/arm/plat-omap/iovmm.c map_iovm_area() arch/m68k/sun3/sun3dvma.c dvma_map_align() arch/alpha/kernel/pci_iommu.c pci_map_single_1() arch/powerpc/platforms/pasemi/iommu.c iobmap_build() arch/powerpc/kernel/iommu.c iommu_map_page() arch/sparc/mm/iommu.c iommu_map_dma_area() arch/sparc/kernel/pci_sun4v_asm.S ENTRY(pci_sun4v_iommu_map) arch/ia64/hp/common/sba_iommu.c sba_map_page() arch/arm/mach-omap2/iommu2.c omap2_iommu_init() arch/arm/plat-omap/iovmm.c
[RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management
This patch contains the documentation for and the main header file of the API, termed the Virtual Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, VM to device and device to IOMMU interoperation code to be refactored into platform independent code. Comments, suggestions and criticisms are welcome and wanted. Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org --- Documentation/vcm.txt | 583 include/linux/vcm.h | 1017 + 2 files changed, 1600 insertions(+), 0 deletions(-) create mode 100644 Documentation/vcm.txt create mode 100644 include/linux/vcm.h diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt new file mode 100644 index 000..d29c757 --- /dev/null +++ b/Documentation/vcm.txt @@ -0,0 +1,583 @@ +What is this document about? + + +This document covers how to use the Virtual Contiguous Memory Manager +(VCMM), how the first implmentation works with a specific low-level +Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used +from user-space. It also contains a section that describes why something +like the VCMM is needed in the kernel. + +If anything in this document is wrong please send patches to the +maintainer of this file, listed at the bottom of the document. + + +The Virtual Contiguous Memory Manager += + +The VCMM was built to solve the system-wide memory mapping issues that +occur when many bus-masters have IOMMUs. + +An IOMMU maps device addresses to physical addresses. It also insulates +the system from spurious or malicious device bus transactions and allows +fine-grained mapping attribute control. The Linux kernel core does not +contain a generic API to handle IOMMU mapped memory; device driver writers +must implement device specific code to interoperate with the Linux kernel +core. As the number of IOMMUs increases, coordinating the many address +spaces mapped by all discrete IOMMUs becomes difficult without in-kernel +support. + +The VCMM API enables device independent IOMMU control, virtual memory +manager (VMM) interoperation and non-IOMMU enabled device interoperation +by treating devices with or without IOMMUs and all CPUs with or without +MMUs, their mapping contexts and their mappings using common +abstractions. Physical hardware is given a generic device type and mapping +contexts are abstracted into Virtual Contiguous Memory (VCM) +regions. Users reserve memory from VCMs and back their reservations +with physical memory. + +Why the VCMM is Needed +-- + +Driver writers who control devices with IOMMUs must contend with device +control and memory management. Driver writers have a large device driver +API that they can leverage to control their devices, but they are lacking +a unified API to help them program mappings into IOMMUs and share those +mappings with other devices and CPUs in the system. + +Sharing is complicated by Linux's CPU centric VMM. The CPU centric model +generally makes sense because average hardware only contains a MMU for the +CPU and possibly a graphics MMU. If every device in the system has one or +more MMUs the CPU centric memory management (MM) programming model breaks +down. + +Abstracting IOMMU device programming into a common API has already begun +in the Linux kernel. It was built to abstract the difference between AMDs +and Intels IOMMUs to support x86 virtualization on both platforms. The +interface is listed in kernel/include/linux/iommu.h. It contains +interfaces for mapping and unmapping as well as domain management. This +interface has not gained widespread use outside the x86; PA-RISC, Alpha +and SPARC architectures and ARM and PowerPC platforms all use their own +mapping modules to control their IOMMUs. The VCMM contains an IOMMU +programming layer, but since its abstraction supports map management +independent of device control, the layer is not used directly. This +higher-level view enables a new kernel service, not just an IOMMU +interoperation layer. + +The General Idea: Map Management using Graphs +- + +Looking at mapping from a system-wide perspective reveals a general graph +problem. The VCMMs API is built to manage the general mapping graph. Each +node that talks to memory, either through an MMU or directly (physically +mapped) can be thought of as the device-end of a mapping edge. The other +edge is the physical memory (or intermediate virtual space) that is +mapped. + +In the direct mapped case the device is assigned a one-to-one MMU. This +scheme allows direct mapped devices to participate in general graph +management. + +The CPU nodes can also be brought under the same mapping abstraction with +the use of a light overlay on the existing VMM. This light overlay allows +VMM managed mappings to interoperate with the common API. The light +overlay enables this without substantial