Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Monday 04 July 2011, Ankita Garg wrote: It still sounds to me that this can be done using the NUMA properties that Linux already understands, and teaching more subsystems about it, but maybe the memory hotplug developers have already come up with another scheme. The way that memory hotplug and CMA choose their memory regions certainly needs to take both into account. As far as I can see there are both conflicting and synergistic effects when you combine the two. Recently, we proposed a generic 'memory regions' framework to exploit the memory power management capabilities on the embedded boards. Think of some of the above CMA requirements could be met by this fraemwork. One of the main goals of regions is to make the VM aware of the hardware memory boundaries, like bank. For managing memory power consumption, memory regions are created aligned to the hardware granularity at which the power can be managed (ie, the memory power consumption operations like on/off can be performed). If attributed are associated with each of these regions, some of these regions could be marked as CMA-only, ensuring that only movable and per-bank memory is allocated. More details on the design can be found here: http://lkml.org/lkml/2011/5/27/177 http://lkml.org/lkml/2011/6/29/202 http://lwn.net/Articles/446493/ Thanks for the pointers, that is exactly what I was looking for. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
Hi, On Thu, Jun 16, 2011 at 12:06:07AM +0200, Arnd Bergmann wrote: On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote: On 15 Jun 11 10:36, Marek Szyprowski wrote: On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it accidentally, depriving those who really need it. Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another? The possible conflict that I still see with per-bank CMA regions are: * It completely destroys memory power management in cases where that is based on powering down entire memory banks. We already established that we have to know something about the banks, and your additional input makes it even clearer that we need to consider the bigger picture here: We need to describe parts of memory separately regarding general performance, device specific allocations and hotplug characteristics. It still sounds to me that this can be done using the NUMA properties that Linux already understands, and teaching more subsystems about it, but maybe the memory hotplug developers have already come up with another scheme. The way that memory hotplug and CMA choose their memory regions certainly needs to take both into account. As far as I can see there are both conflicting and synergistic effects when you combine the two. Recently, we proposed a generic 'memory regions' framework to exploit the memory power management capabilities on the embedded boards. Think of some of the above CMA requirements could be met by this fraemwork. One of the main goals of regions is to make the VM aware of the hardware memory boundaries, like bank. For managing memory power consumption, memory regions are created aligned to the hardware granularity at which the power can be managed (ie, the memory power consumption operations like on/off can be performed). If attributed are associated with each of these regions, some of these regions could be marked as CMA-only, ensuring that only movable and per-bank memory is allocated. More details on the design can be found here: http://lkml.org/lkml/2011/5/27/177 http://lkml.org/lkml/2011/6/29/202 http://lwn.net/Articles/446493/ -- Regards, Ankita Garg (ank...@in.ibm.com) Linux Technology Center IBM India Systems Technology Labs, Bangalore, India -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday, June 15, 2011 09:37:18 Arnd Bergmann wrote: On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote: I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time). Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults. The only solution I see now is to provide Kconfig entry to determine the size of the global CMA pool, but this still have some issues, especially for multi-board kernels (each board probably will have different amount of RAM and different memory-consuming devices available). It looks that each board startup code still might need to tweak the size of CMA pool. I can add a kernel command line option for it, but such solution also will not solve all the cases (afair there was a discussion about kernel command line parameters for memory configuration and the conclusion was that it should be avoided). The command line option can be a last resort if the heuristics fail, but it's not much better than a fixed Kconfig setting. How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations? I would actually like to see a cma_size kernel option of some sort. This would be for the global CMA pool only as I don't think we should try to do anything more complicated here. While it is relatively easy for embedded systems to do a recompile every time you need to change the pool size, this isn't an option on 'normal' desktop systems. While usually you have more than enough memory on such systems and don't need CMA, there are a number of cases where you do want to reserve sufficient memory. Usually these involve lots of video capture cards in one system. What I was wondering about is how this patch series changes the allocation in case it can't allocate from the CMA pool. Will it attempt to fall back to a 'normal' allocation? The reason I ask is that for desktop systems you could just start with a CMA pool of size 0. And only in specific situations would you need to add a cma_size kernel parameter depending on your needs. But this scheme would require a fallback scenario in case of a global CMA pool of size 0. Hmm, perhaps this fallback scenario is more driver specific. For SoC platform video devices you may not want a fallback, whereas for PCI(e)/USB devices you do. I don't know what's best, frankly. Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wed, 22 Jun 2011 09:03:30 +0200, Hans Verkuil hverk...@xs4all.nl wrote: What I was wondering about is how this patch series changes the allocation in case it can't allocate from the CMA pool. Will it attempt to fall back to a 'normal' allocation? Unless Marek changed something since I wrote the code, which I doubt, if CMA cannot obtain memory from CMA region, it will fail. Part of the reason is that CMA lacks the knowledge where to allocate memory from. For instance, with the case of several memory banks, it does not know which memory bank to allocate from. It is, in my opinion, a task for a higher level functions (read: DMA layer) to try another mechanism if CMA fails. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday 22 June 2011, Hans Verkuil wrote: How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations? I would actually like to see a cma_size kernel option of some sort. This would be for the global CMA pool only as I don't think we should try to do anything more complicated here. A command line is probably good to override the compile-time default, yes. We could also go further and add a runtime sysctl mechanism like the one for hugepages, where you can grow the pool at run time as long as there is enough free contiguous memory (e.g. from init scripts), or shrink it later if you want to allow larger nonmovable allocations. My feeling is that we need to find a way to integrate the global settings for four kinds of allocations: * nonmovable kernel pages * hugetlb pages * CMA * memory hotplug These essentially fight over the same memory (though things are slightly different with dynamic hugepages), and they all face the same basic problem of getting as much for themselves without starving the other three. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
Hello, On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote: On Wednesday 22 June 2011, Hans Verkuil wrote: How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations? I would actually like to see a cma_size kernel option of some sort. This would be for the global CMA pool only as I don't think we should try to do anything more complicated here. A command line is probably good to override the compile-time default, yes. We could also go further and add a runtime sysctl mechanism like the one for hugepages, where you can grow the pool at run time as long as there is enough free contiguous memory (e.g. from init scripts), or shrink it later if you want to allow larger nonmovable allocations. Sounds really good, but it might be really hard to implemnt, at least for CMA, because it needs to tweak parameters of memory management internal structures very early, when buddy allocator has not been activated yet. My feeling is that we need to find a way to integrate the global settings for four kinds of allocations: * nonmovable kernel pages * hugetlb pages * CMA * memory hotplug These essentially fight over the same memory (though things are slightly different with dynamic hugepages), and they all face the same basic problem of getting as much for themselves without starving the other three. I'm not sure we can solve all such issues in the first version. Maybe we should first have each of the above fully working in mainline separately and then start the integration works. Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday 22 June 2011, Marek Szyprowski wrote: Sounds really good, but it might be really hard to implemnt, at least for CMA, because it needs to tweak parameters of memory management internal structures very early, when buddy allocator has not been activated yet. Why that? I would expect you can do the same that hugepages (used to) do and just attempt high-order allocations. If they succeed, you can add them as a CMA region and free them again, into the movable set of pages, otherwise you just fail the request from user space when the memory is already fragmented. These essentially fight over the same memory (though things are slightly different with dynamic hugepages), and they all face the same basic problem of getting as much for themselves without starving the other three. I'm not sure we can solve all such issues in the first version. Maybe we should first have each of the above fully working in mainline separately and then start the integration works. Yes, makes sense. We just need to be careful not to introduce user-visible interfaces that we cannot change any more in the process. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote: We could also go further and add a runtime sysctl mechanism like the one for hugepages, where you can grow the pool at run time as long as there is enough free contiguous memory (e.g. from init scripts), or shrink it later if you want to allow larger nonmovable allocations. On Wed, 22 Jun 2011 15:15:35 +0200, Marek Szyprowski wrote: Sounds really good, but it might be really hard to implement, at least for CMA, because it needs to tweak parameters of memory management internal structures very early, when buddy allocator has not been activated yet. If you are able to allocate a pageblock of free memory from buddy system, you should be able to convert it to CMA memory with no problems. Also, if you want to convert CMA memory back to regular memory you should be able to do that even if some of the memory is used by CMA (it just won't be available right away but only when CMA frees it). It is important to note that, because of the use of migration type, all such conversion have to be performed on pageblock basis. I don't think this is a feature we should consider for the first patch though. We started with an overgrown idea about what CMA might do and it didn't got us far. Let's first get the basics right and then start implementing features as they become needed. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wed, 22 Jun 2011 15:39:23 +0200, Arnd Bergmann a...@arndb.de wrote: Why that? I would expect you can do the same that hugepages (used to) do and just attempt high-order allocations. If they succeed, you can add them as a CMA region and free them again, into the movable set of pages, otherwise you just fail the request from user space when the memory is already fragmented. Problem with that is that CMA needs to have whole pageblocks allocated and buddy can allocate at most half a pageblock. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Thursday 16 June 2011 19:01:33 Larry Bassel wrote: Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another? One is always faster, but very small (generally 2-10% the size of normal memory). Ok, that sounds like the SRAM regions that we are handling on some ARM platforms using the various interfaces. It should probably remain outside of the regular allocator, but we can try to generalize the SRAM support further. There are many possible uses for it. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday 15 June 2011, Daniel Vetter wrote: On Tue, Jun 14, 2011 at 20:30, Arnd Bergmann a...@arndb.de wrote: On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote: Ah yes, I forgot that separate regions for different purposes could decrease fragmentation. That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this. imo no allocator algorithm is gonna help if you have comparably large, variable-sized contiguous allocations out of a restricted address range. It might work well enough if there are only a few sizes and/or there's decent headroom. But for really generic workloads this would require sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed with ttm). The requirements are quite different depending on what system you look at. In a lot of cases, the constraints are not that tight at all, and CMA will easily help to turn works sometimes into works almost always. Let's get there first and then look into the harder problems. Unfortunately, memory allocation gets nondeterministic in the corner cases, you can simply get the system into a state where you don't have enough memory when you try to do too many things at once. This may sound like a platitude but it's really what is behind all this: If we had unlimited amounts of RAM, we would never need CMA, we could simply set aside a lot of memory at boot time. Having one CMA area with movable page eviction lets you build systems capable of doing the same thing with less RAM than without CMA. Adding more complexity lets you reduce that amount further. The other aspects that have been mentioned about bank affinity and SRAM are pretty orthogonal to the allocation, so we should also treat them separately. So if this is only a requirement on very few platforms and can be cheaply fixed with multiple cma allocation areas (heck, we have slabs for the same reasons in the kernel), it might be a sensible compromise. Yes, we can probably add it later when we find out what the limits of the generic approach are. I don't really mind having the per-device pointers to CMA areas, we just need to come up with a good way to initialize them. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Thursday 16 June 2011 02:48:12 Philip Balister wrote: On 06/15/2011 12:37 AM, Arnd Bergmann wrote: On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote: I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time). Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults. Is this a situation where passing the information from device tree might help? I know this does not help short term, but I am trying to understand the sorts of problems device tree can help solve. The device tree is a good place to describe any hardware properties such as 'this device will need 32 MB of contiguous allocations on the memory bank described in that other device node'. It is however not a good place to describe user settings such as 'I want to give this device a 200 MB pool for large allocations so I can run application X efficiently', because that would require knowledge in the boot loader about local policy, which it should generally not care about. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On 16 Jun 11 00:06, Arnd Bergmann wrote: On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote: On 15 Jun 11 10:36, Marek Szyprowski wrote: On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it accidentally, depriving those who really need it. Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another? One is always faster, but very small (generally 2-10% the size of normal memory). Larry -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
Hello, On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation? I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way. You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first. The possible conflict that I still see with per-bank CMA regions are: * It completely destroys memory power management in cases where that is based on powering down entire memory banks. I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management. Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann a...@arndb.de wrote: * We still need to solve the same problem in case of IOMMU mappings at some point, even if today's hardware doesn't have this combination. It would be good to use the same solution for both. I don't think I follow. What does IOMMU has to do with CMA? -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday 15 June 2011, Michal Nazarewicz wrote: On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann a...@arndb.de wrote: * We still need to solve the same problem in case of IOMMU mappings at some point, even if today's hardware doesn't have this combination. It would be good to use the same solution for both. I don't think I follow. What does IOMMU has to do with CMA? The point is that on the higher level device drivers, we want to hide the presence of CMA and/or IOMMU behind the dma mapping API, but the device drivers do need to know about the bank properties. If we want to solve the problem of allocating per-bank memory inside of CMA, we also need to solve it inside of the IOMMU code, using the same device driver interface. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tuesday 14 June 2011, Jordan Crouse wrote: On 06/14/2011 02:42 PM, Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation? Yes. The IOMMU case with split bank allocation is key, especially for shared buffers. Consider the case where video is using a certain bank for performance purposes and that frame is shared with the GPU. Could we use the non-uniform memory access (NUMA) code for this? That code does more than what we've been talking about, and we're currently thinking only of a degenerate case (one CPU node with multiple memory nodes), but my feeling is that we can still build on top of it. The NUMA code can describe relations between different areas of memory and how they interact with devices and processes, so you can attach a device to a specific node and have all allocations done from there. You can also set policy in user space, e.g. to have a video decoder process running on the bank that is not used by the GPU. In the DMA mapping API, that would mean we add another dma_attr to dma_alloc_* that lets you pass a node identifier. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wed, 15 Jun 2011 13:20:42 +0200, Arnd Bergmann a...@arndb.de wrote: The point is that on the higher level device drivers, we want to hide the presence of CMA and/or IOMMU behind the dma mapping API, but the device drivers do need to know about the bank properties. Gotcha, thanks. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal mina86 Nazarewicz(o o) ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tue, Jun 14, 2011 at 20:30, Arnd Bergmann a...@arndb.de wrote: On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote: Ah yes, I forgot that separate regions for different purposes could decrease fragmentation. That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this. imo no allocator algorithm is gonna help if you have comparably large, variable-sized contiguous allocations out of a restricted address range. It might work well enough if there are only a few sizes and/or there's decent headroom. But for really generic workloads this would require sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed with ttm). So if this is only a requirement on very few platforms and can be cheaply fixed with multiple cma allocation areas (heck, we have slabs for the same reasons in the kernel), it might be a sensible compromise. -Daniel -- Daniel Vetter daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On 06/15/2011 01:53 PM, Daniel Vetter wrote: On Tue, Jun 14, 2011 at 20:30, Arnd Bergmanna...@arndb.de wrote: On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote: Ah yes, I forgot that separate regions for different purposes could decrease fragmentation. That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this. imo no allocator algorithm is gonna help if you have comparably large, variable-sized contiguous allocations out of a restricted address range. It might work well enough if there are only a few sizes and/or there's decent headroom. But for really generic workloads this would require sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed with ttm). Indeed, IIRC on the meeting I pointed out that there is no way to generically solve the fragmentation problem without movable buffers. (I'd do it as a simple CMA backend to TTM). This is exactly the same problem as trying to fit buffers in a limited VRAM area. /Thomas -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On 15 Jun 11 10:36, Marek Szyprowski wrote: Hello, On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it accidentally, depriving those who really need it. Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation? I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way. You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first. The possible conflict that I still see with per-bank CMA regions are: * It completely destroys memory power management in cases where that is based on powering down entire memory banks. I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management. We also have platforms where a well-defined part of the memory can be powered off, and other parts can't (or won't). We need a way to steer the place allocations come from to the memory that won't be turned off (so that CMA allocations are not an obstacle to memory hotremove). Best regards -- Marek Szyprowski Samsung Poland RD Center ___ Linaro-mm-sig mailing list linaro-mm-...@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig Larry Bassel -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote: On 15 Jun 11 10:36, Marek Szyprowski wrote: On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it accidentally, depriving those who really need it. Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another? The possible conflict that I still see with per-bank CMA regions are: * It completely destroys memory power management in cases where that is based on powering down entire memory banks. I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management. We also have platforms where a well-defined part of the memory can be powered off, and other parts can't (or won't). We need a way to steer the place allocations come from to the memory that won't be turned off (so that CMA allocations are not an obstacle to memory hotremove). We already established that we have to know something about the banks, and your additional input makes it even clearer that we need to consider the bigger picture here: We need to describe parts of memory separately regarding general performance, device specific allocations and hotplug characteristics. It still sounds to me that this can be done using the NUMA properties that Linux already understands, and teaching more subsystems about it, but maybe the memory hotplug developers have already come up with another scheme. The way that memory hotplug and CMA choose their memory regions certainly needs to take both into account. As far as I can see there are both conflicting and synergistic effects when you combine the two. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On 06/15/2011 12:37 AM, Arnd Bergmann wrote: On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote: I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time). Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults. Is this a situation where passing the information from device tree might help? I know this does not help short term, but I am trying to understand the sorts of problems device tree can help solve. Philip The only solution I see now is to provide Kconfig entry to determine the size of the global CMA pool, but this still have some issues, especially for multi-board kernels (each board probably will have different amount of RAM and different memory-consuming devices available). It looks that each board startup code still might need to tweak the size of CMA pool. I can add a kernel command line option for it, but such solution also will not solve all the cases (afair there was a discussion about kernel command line parameters for memory configuration and the conclusion was that it should be avoided). The command line option can be a last resort if the heuristics fail, but it's not much better than a fixed Kconfig setting. How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations? Arnd ___ Linaro-mm-sig mailing list linaro-mm-...@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On 15 June 2011 16:39, Larry Bassel lbas...@codeaurora.org wrote: On 15 Jun 11 10:36, Marek Szyprowski wrote: Hello, On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it accidentally, depriving those who really need it. I think this statement actually applies to all the SoCs that are coming out now and in the future from TI, Samsung, Nvidia, Freescale, ST Ericsson and others. It seems that in all cases users will want to: 1. Allocate memory with a per-SoC physical memory mapping policy that is usually manually specified, i.e. use this physical memory bank set for this allocation or nothing. 2. Be able to easily pass a token to this memory between various userspace processes and the kernel. 3. Be able to easily and explicitly access attributes of an allocation from all contexts. 4. Be able to save and reload this memory without giving up the virtual address allocation. In essence they want a architectural independent map object that can bounce around the system with a unique handle. Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation? I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way. You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first. The possible conflict that I still see with per-bank CMA regions are: * It completely destroys memory power management in cases where that is based on powering down entire memory banks. I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management. We also have platforms where a well-defined part of the memory can be powered off, and other parts can't (or won't). We need a way to steer the place allocations come from to the memory that won't be turned off (so that CMA allocations are not an obstacle to memory hotremove). Best regards -- Marek Szyprowski Samsung Poland RD Center ___ Linaro-mm-sig mailing list linaro-mm-...@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig Larry Bassel -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
Hi, On Tue, Jun 14, 2011 at 06:03:00PM +0200, Arnd Bergmann wrote: On Tuesday 14 June 2011, Michal Nazarewicz wrote: On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote: Please explain the exact requirements that lead you to defining multiple contexts. Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes. For all I know, that is something that is only true for a few very special Samsung devices, and is completely unrelated of the need for contiguous allocations, so this approach becomes pointless as soon as the next generation of that chip grows an IOMMU, where we don't handle the special bank attributes. Also, the way I understood the situation for the Samsung SoC during the Budapest discussion, it's only a performance hack, not a functional requirement, unless you count '1080p playback' as a functional requirement. Hm, I think that was something similar but not quite the same: talking about having allocations split to lie between two banks of RAM to maximise the read/write speed for performance reasons. That's something that can be handled in the allocator, rather than an API constraint, as this is. Not that I know of any hardware which is limited as such, but eh. Cheers, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On 14 June 2011 12:01, Daniel Stone dani...@collabora.com wrote: Hi, On Tue, Jun 14, 2011 at 06:03:00PM +0200, Arnd Bergmann wrote: On Tuesday 14 June 2011, Michal Nazarewicz wrote: On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote: Please explain the exact requirements that lead you to defining multiple contexts. Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes. For all I know, that is something that is only true for a few very special Samsung devices, and is completely unrelated of the need for contiguous allocations, so this approach becomes pointless as soon as the next generation of that chip grows an IOMMU, where we don't handle the special bank attributes. Also, the way I understood the situation for the Samsung SoC during the Budapest discussion, it's only a performance hack, not a functional requirement, unless you count '1080p playback' as a functional requirement. Coming in mid topic... I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. Hm, I think that was something similar but not quite the same: talking about having allocations split to lie between two banks of RAM to maximise the read/write speed for performance reasons. That's something that can be handled in the allocator, rather than an API constraint, as this is. Not that I know of any hardware which is limited as such, but eh. Cheers, Daniel ___ Linaro-mm-sig mailing list linaro-mm-...@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation? I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way. You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first. The possible conflict that I still see with per-bank CMA regions are: * It completely destroys memory power management in cases where that is based on powering down entire memory banks. * We still need to solve the same problem in case of IOMMU mappings at some point, even if today's hardware doesn't have this combination. It would be good to use the same solution for both. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added
On 06/14/2011 02:42 PM, Arnd Bergmann wrote: On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote: I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default. Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation? Yes. The IOMMU case with split bank allocation is key, especially for shared buffers. Consider the case where video is using a certain bank for performance purposes and that frame is shared with the GPU. Jordan -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html