RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
On Wed, 2011-08-17 at 15:06 +0200, Marek Szyprowski wrote: [...] Maybe for the first version a static pool with reasonably small size (like 128KiB) will be more than enough? This size can be even board depended or changed with kernel command line for systems that really needs more memory. For a first version that sounds good enough. Maybe we could use a fraction of the CONSISTENT_DMA_SIZE as an estimate? Ok, good. For the initial values I will probably use 1/8 of CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should be more than enough for them. For people who aren't aware, we have a patch to remove the define CONSISTENT_DMA_SIZE and replace it with a runtime call to an initialisation function [1]. I don't believe this fundamentally changes anything being discussed though. -- Tixy [1] http://www.spinics.net/lists/arm-kernel/msg135589.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
Hello, On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote: On Tuesday 16 August 2011, Russell King - ARM Linux wrote: On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote: Hmm, I don't remember the point about dynamically sizing the pool for ARMv6K, but that can well be an oversight on my part. I do remember the part about taking that memory pool from the CMA region as you say. If you're setting aside a pool of pages, then you have to dynamically size it. I did mention during our discussion about this. The problem is that a pool of fixed size is two fold: you need it to be sufficiently large that it can satisfy all allocations which come along in atomic context. Yet, we don't want the pool to be too large because then it prevents the memory being used for other purposes. Basically, the total number of pages in the pool can be a fixed size, but as they are depleted through allocation, they need to be re-populated from CMA to re-build the reserve for future atomic allocations. If the pool becomes larger via frees, then obviously we need to give pages back. Ok, thanks for the reminder. I must have completely missed this part of the discussion. When I briefly considered this problem, my own conclusion was that the number of atomic DMA allocations would always be very low because they tend to be short-lived (e.g. incoming network packets), so we could ignore this problem and just use a smaller reservation size. While this seems to be true in general (see git grep -w -A3 dma_alloc_coherent | grep ATOMIC), there is one very significant case that we cannot ignore, which is pci_alloc_consistent. This function is still called by hundreds of PCI drivers and always does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived allocations and those that are too large to be ignored. So at least for the case where we have PCI devices, I agree that we need to have the dynamic pool. Do we really need the dynamic pool for the first version? I would like to know how much memory can be allocated in GFP_ATOMIC context. What are the typical sizes of such allocations? Maybe for the first version a static pool with reasonably small size (like 128KiB) will be more than enough? This size can be even board depended or changed with kernel command line for systems that really needs more memory. I noticed one more problem. The size of the CMA managed area must be the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area is 16MiB. These values comes from the internals of the kernel memory management design and page blocks are the only entities that can be managed with page migration code. I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to create a CMA area. Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
On Wednesday 17 August 2011, Marek Szyprowski wrote: Do we really need the dynamic pool for the first version? I would like to know how much memory can be allocated in GFP_ATOMIC context. What are the typical sizes of such allocations? I think this highly depends on the board and on the use case. We know that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE on most platforms. Most likely something a lot smaller will be ok in practice. CONSISTENT_DMA_SIZE is currently used for both atomic and non-atomic allocations. Maybe for the first version a static pool with reasonably small size (like 128KiB) will be more than enough? This size can be even board depended or changed with kernel command line for systems that really needs more memory. For a first version that sounds good enough. Maybe we could use a fraction of the CONSISTENT_DMA_SIZE as an estimate? For the long-term solution, I see two options: 1. make the preallocated pool rather small so we normally don't need it. 2. make it large enough so we can also fulfill most nonatomic allocations from that pool to avoid the TLB flushes and going through the CMA code. Only use the real CMA region when the pool allocation fails. In either case, there should be some method for balancing the pool size. I noticed one more problem. The size of the CMA managed area must be the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area is 16MiB. These values comes from the internals of the kernel memory management design and page blocks are the only entities that can be managed with page migration code. I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to create a CMA area. My guess is that you can assume to have 64 MB or more on ARMv6 running Linux, but other people may have more accurate data. Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER for some platforms if it becomes a problem. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
Hello, On Wednesday, August 17, 2011 10:01 AM Marek Szyprowski wrote: On Tuesday, August 16, 2011 4:26 PM Arnd Bergmann wrote: On Tuesday 16 August 2011, Russell King - ARM Linux wrote: On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote: Hmm, I don't remember the point about dynamically sizing the pool for ARMv6K, but that can well be an oversight on my part. I do remember the part about taking that memory pool from the CMA region as you say. If you're setting aside a pool of pages, then you have to dynamically size it. I did mention during our discussion about this. The problem is that a pool of fixed size is two fold: you need it to be sufficiently large that it can satisfy all allocations which come along in atomic context. Yet, we don't want the pool to be too large because then it prevents the memory being used for other purposes. Basically, the total number of pages in the pool can be a fixed size, but as they are depleted through allocation, they need to be re-populated from CMA to re-build the reserve for future atomic allocations. If the pool becomes larger via frees, then obviously we need to give pages back. Ok, thanks for the reminder. I must have completely missed this part of the discussion. When I briefly considered this problem, my own conclusion was that the number of atomic DMA allocations would always be very low because they tend to be short-lived (e.g. incoming network packets), so we could ignore this problem and just use a smaller reservation size. While this seems to be true in general (see git grep -w -A3 dma_alloc_coherent | grep ATOMIC), there is one very significant case that we cannot ignore, which is pci_alloc_consistent. This function is still called by hundreds of PCI drivers and always does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived allocations and those that are too large to be ignored. So at least for the case where we have PCI devices, I agree that we need to have the dynamic pool. Do we really need the dynamic pool for the first version? I would like to know how much memory can be allocated in GFP_ATOMIC context. What are the typical sizes of such allocations? Maybe for the first version a static pool with reasonably small size (like 128KiB) will be more than enough? This size can be even board depended or changed with kernel command line for systems that really needs more memory. I noticed one more problem. The size of the CMA managed area must be the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area is 16MiB. These values comes from the internals of the kernel memory management design and page blocks are the only entities that can be managed with page migration code. I'm really sorry for the confusion. This 16MiB value worried me too much and I've checked the code once again and found that this MAX_ORDER+1 value was a miscalculation, which appeared in v11 of the patches. The true minimal CMA area size is 8MiB for ARM architecture. I believe this shouldn't be an issue for the current ARMv6+ based machines. I've checked it with mem=16M cma=8M kernel arguments. System booted fine and CMA area has been successfully created. Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
Hello, On Wednesday, August 17, 2011 2:29 PM Arnd Bergmann wrote: On Wednesday 17 August 2011, Marek Szyprowski wrote: Do we really need the dynamic pool for the first version? I would like to know how much memory can be allocated in GFP_ATOMIC context. What are the typical sizes of such allocations? I think this highly depends on the board and on the use case. We know that 2 MB is usually enough, because that is the current CONSISTENT_DMA_SIZE on most platforms. Most likely something a lot smaller will be ok in practice. CONSISTENT_DMA_SIZE is currently used for both atomic and non-atomic allocations. Ok. The platforms that increased CONSISTENT_DMA_SIZE usually did that to enable support for framebuffer or other multimedia devices, which won't be allocated in ATOMIC context anyway. Maybe for the first version a static pool with reasonably small size (like 128KiB) will be more than enough? This size can be even board depended or changed with kernel command line for systems that really needs more memory. For a first version that sounds good enough. Maybe we could use a fraction of the CONSISTENT_DMA_SIZE as an estimate? Ok, good. For the initial values I will probably use 1/8 of CONSISTENT_DMA_SIZE for coherent allocations. Writecombine atomic allocations are extremely rare and rather ARM specific. 1/32 of CONSISTENT_DMA_SIZE should be more than enough for them. For the long-term solution, I see two options: 1. make the preallocated pool rather small so we normally don't need it. 2. make it large enough so we can also fulfill most nonatomic allocations from that pool to avoid the TLB flushes and going through the CMA code. Only use the real CMA region when the pool allocation fails. In either case, there should be some method for balancing the pool size. Right. The most obvious method is to use additional kernel thread which will periodically call the balance function. In the implementation both usage scenarios are very similar, so this can even be a kernel parameter or Kconfig option, but lets leave this for the future vesions. I noticed one more problem. The size of the CMA managed area must be the multiple of 16MiBs (MAX_ORDER+1). This means that the smallest CMA area is 16MiB. These values comes from the internals of the kernel memory management design and page blocks are the only entities that can be managed with page migration code. I'm not sure if all ARMv6+ boards have at least 32MiB of memory be able to create a CMA area. My guess is that you can assume to have 64 MB or more on ARMv6 running Linux, but other people may have more accurate data. Also, there is the option of setting a lower value for FORCE_MAX_ZONEORDER for some platforms if it becomes a problem. Ok. I figured out an error in the above calculation, so 8MiB is the smallest CMA area size. Assuming that there are at least 32MiB of memory available this is not an issue anymore. Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
Hello, On Friday, August 12, 2011 2:53 PM Arnd Bergmann wrote: On Friday 12 August 2011, Marek Szyprowski wrote: From: Russell King rmk+ker...@arm.linux.org.uk Steal memory from the kernel to provide coherent DMA memory to drivers. This avoids the problem with multiple mappings with differing attributes on later CPUs. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk [m.szyprowski: rebased onto 3.1-rc1] Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com Hi Marek, Is this the same patch that Russell had to revert because it didn't work on some of the older machines, in particular those using dmabounce? Yes. I thought that our discussion ended with the plan to use this only for ARMv6+ (which has a problem with double mapping) but not on ARMv5 and below (which don't have this problem but might need dmabounce). Ok, my fault. I've forgot to mention that this patch was almost ready during Linaro meeting, but I didn't manage to post it that time. Of course it doesn't fulfill all the agreements from that discussion. I was only unsure if we should care about the case where CMA is not enabled for ARMv6+ or not. This patch was prepared in assumption that dma_alloc_coherent should work in both cases - with and without CMA. Now I assume that for ARMv6+ the CMA should be enabled unconditionally. Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
On Sunday 14 August 2011, Russell King - ARM Linux wrote: On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote: I thought that our discussion ended with the plan to use this only for ARMv6+ (which has a problem with double mapping) but not on ARMv5 and below (which don't have this problem but might need dmabounce). I thought we'd decided to have a pool of available CMA memory on ARMv6K to satisfy atomic allocations, which can grow and shrink in size, rather than setting aside a fixed amount of contiguous system memory. Hmm, I don't remember the point about dynamically sizing the pool for ARMv6K, but that can well be an oversight on my part. I do remember the part about taking that memory pool from the CMA region as you say. ARMv6 and ARMv7+ could use CMA directly, and = ARMv5 can use the existing allocation method. Has something changed? Nothing has changed regarding =ARMv5. There was a small side discussion about ARMv6 and ARMv7+ based on the idea that they can either use CMA directly (doing TLB flushes for every allocation) or they could use the same method as ARMv6K by setting aside a pool of pages for atomic allocation. The first approach would consume less memory because it requires no special pool, the second approach would be simpler because it unifies the ARMv6K and ARMv6/ARMv7+ cases and also would be slightly more efficient for atomic allocations because it avoids the expensive TLB flush. I didn't have a strong opinion either way, so IIRC Marek said he'd try out both approaches and then send out the one that looked better, leaning towards the second for simplicity of having fewer compile-time options. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote: On Sunday 14 August 2011, Russell King - ARM Linux wrote: On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote: I thought that our discussion ended with the plan to use this only for ARMv6+ (which has a problem with double mapping) but not on ARMv5 and below (which don't have this problem but might need dmabounce). I thought we'd decided to have a pool of available CMA memory on ARMv6K to satisfy atomic allocations, which can grow and shrink in size, rather than setting aside a fixed amount of contiguous system memory. Hmm, I don't remember the point about dynamically sizing the pool for ARMv6K, but that can well be an oversight on my part. I do remember the part about taking that memory pool from the CMA region as you say. If you're setting aside a pool of pages, then you have to dynamically size it. I did mention during our discussion about this. The problem is that a pool of fixed size is two fold: you need it to be sufficiently large that it can satisfy all allocations which come along in atomic context. Yet, we don't want the pool to be too large because then it prevents the memory being used for other purposes. Basically, the total number of pages in the pool can be a fixed size, but as they are depleted through allocation, they need to be re-populated from CMA to re-build the reserve for future atomic allocations. If the pool becomes larger via frees, then obviously we need to give pages back. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
On Tuesday 16 August 2011, Russell King - ARM Linux wrote: On Tue, Aug 16, 2011 at 03:28:48PM +0200, Arnd Bergmann wrote: Hmm, I don't remember the point about dynamically sizing the pool for ARMv6K, but that can well be an oversight on my part. I do remember the part about taking that memory pool from the CMA region as you say. If you're setting aside a pool of pages, then you have to dynamically size it. I did mention during our discussion about this. The problem is that a pool of fixed size is two fold: you need it to be sufficiently large that it can satisfy all allocations which come along in atomic context. Yet, we don't want the pool to be too large because then it prevents the memory being used for other purposes. Basically, the total number of pages in the pool can be a fixed size, but as they are depleted through allocation, they need to be re-populated from CMA to re-build the reserve for future atomic allocations. If the pool becomes larger via frees, then obviously we need to give pages back. Ok, thanks for the reminder. I must have completely missed this part of the discussion. When I briefly considered this problem, my own conclusion was that the number of atomic DMA allocations would always be very low because they tend to be short-lived (e.g. incoming network packets), so we could ignore this problem and just use a smaller reservation size. While this seems to be true in general (see git grep -w -A3 dma_alloc_coherent | grep ATOMIC), there is one very significant case that we cannot ignore, which is pci_alloc_consistent. This function is still called by hundreds of PCI drivers and always does dma_alloc_coherent(..., GFP_ATOMIC), even for long-lived allocations and those that are too large to be ignored. So at least for the case where we have PCI devices, I agree that we need to have the dynamic pool. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
On Fri, Aug 12, 2011 at 02:53:05PM +0200, Arnd Bergmann wrote: On Friday 12 August 2011, Marek Szyprowski wrote: From: Russell King rmk+ker...@arm.linux.org.uk Steal memory from the kernel to provide coherent DMA memory to drivers. This avoids the problem with multiple mappings with differing attributes on later CPUs. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk [m.szyprowski: rebased onto 3.1-rc1] Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com Hi Marek, Is this the same patch that Russell had to revert because it didn't work on some of the older machines, in particular those using dmabounce? I thought that our discussion ended with the plan to use this only for ARMv6+ (which has a problem with double mapping) but not on ARMv5 and below (which don't have this problem but might need dmabounce). I thought we'd decided to have a pool of available CMA memory on ARMv6K to satisfy atomic allocations, which can grow and shrink in size, rather than setting aside a fixed amount of contiguous system memory. ARMv6 and ARMv7+ could use CMA directly, and = ARMv5 can use the existing allocation method. Has something changed? -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] ARM: DMA: steal memory for DMA coherent mappings
On Friday 12 August 2011, Marek Szyprowski wrote: From: Russell King rmk+ker...@arm.linux.org.uk Steal memory from the kernel to provide coherent DMA memory to drivers. This avoids the problem with multiple mappings with differing attributes on later CPUs. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk [m.szyprowski: rebased onto 3.1-rc1] Signed-off-by: Marek Szyprowski m.szyprow...@samsung.com Hi Marek, Is this the same patch that Russell had to revert because it didn't work on some of the older machines, in particular those using dmabounce? I thought that our discussion ended with the plan to use this only for ARMv6+ (which has a problem with double mapping) but not on ARMv5 and below (which don't have this problem but might need dmabounce). Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html