Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-07-04 Thread Arnd Bergmann
On Monday 04 July 2011, Ankita Garg wrote:
  It still sounds to me that this can be done using the NUMA properties
  that Linux already understands, and teaching more subsystems about it,
  but maybe the memory hotplug developers have already come up with
  another scheme. The way that memory hotplug and CMA choose their
  memory regions certainly needs to take both into account. As far as
  I can see there are both conflicting and synergistic effects when
  you combine the two.
  
 
 Recently, we proposed a generic 'memory regions' framework to exploit
 the memory power management capabilities on the embedded boards. Think
 of some of the above CMA requirements could be met by this fraemwork.
 One of the main goals of regions is to make the VM aware of the hardware
 memory boundaries, like bank. For managing memory power consumption,
 memory regions are created aligned to the hardware granularity at which
 the power can be managed (ie, the memory power consumption operations
 like on/off can be performed). If attributed are associated with each of
 these regions, some of these regions could be marked as CMA-only,
 ensuring that only movable and per-bank memory is allocated. More
 details on the design can be found here:
 
 http://lkml.org/lkml/2011/5/27/177
 http://lkml.org/lkml/2011/6/29/202
 http://lwn.net/Articles/446493/

Thanks for the pointers, that is exactly what I was looking for.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-07-03 Thread Ankita Garg
Hi,

On Thu, Jun 16, 2011 at 12:06:07AM +0200, Arnd Bergmann wrote:
 On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote:
  On 15 Jun 11 10:36, Marek Szyprowski wrote:
   On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
   
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
 I've seen this split bank allocation in Qualcomm and TI SoCs, with
 Samsung, that makes 3 major SoC vendors (I would be surprised if
 Nvidia didn't also need to do this) - so I think some configurable
 method to control allocations is necessarily. The chips can't do
 decode without it (and by can't do I mean 1080P and higher decode is
 not functionally useful). Far from special, this would appear to be
 the default.
  
  We at Qualcomm have some platforms that have memory of different
  performance characteristics, some drivers will need a way of
  specifying that they need fast memory for an allocation (and would prefer
  an error if it is not available rather than a fallback to slower
  memory). It would also be bad if allocators who don't need fast
  memory got it accidentally, depriving those who really need it.
 
 Can you describe how the memory areas differ specifically?
 Is there one that is always faster but very small, or are there
 just specific circumstances under which some memory is faster than
 another?
 
The possible conflict that I still see with per-bank CMA regions are:

* It completely destroys memory power management in cases where that
  is based on powering down entire memory banks.
   
 We already established that we have to know something about the banks,
 and your additional input makes it even clearer that we need to consider
 the bigger picture here: We need to describe parts of memory separately
 regarding general performance, device specific allocations and hotplug
 characteristics.
 
 It still sounds to me that this can be done using the NUMA properties
 that Linux already understands, and teaching more subsystems about it,
 but maybe the memory hotplug developers have already come up with
 another scheme. The way that memory hotplug and CMA choose their
 memory regions certainly needs to take both into account. As far as
 I can see there are both conflicting and synergistic effects when
 you combine the two.
 

Recently, we proposed a generic 'memory regions' framework to exploit
the memory power management capabilities on the embedded boards. Think
of some of the above CMA requirements could be met by this fraemwork.
One of the main goals of regions is to make the VM aware of the hardware
memory boundaries, like bank. For managing memory power consumption,
memory regions are created aligned to the hardware granularity at which
the power can be managed (ie, the memory power consumption operations
like on/off can be performed). If attributed are associated with each of
these regions, some of these regions could be marked as CMA-only,
ensuring that only movable and per-bank memory is allocated. More
details on the design can be found here:

http://lkml.org/lkml/2011/5/27/177
http://lkml.org/lkml/2011/6/29/202
http://lwn.net/Articles/446493/

-- 
Regards,
Ankita Garg (ank...@in.ibm.com)
Linux Technology Center
IBM India Systems  Technology Labs,
Bangalore, India
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Hans Verkuil
On Wednesday, June 15, 2011 09:37:18 Arnd Bergmann wrote:
 On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:
  I see your concerns, but I really wonder how to determine the properties
  of the global/default cma pool. You definitely don't want to give all
  available memory o CMA, because it will have negative impact on kernel
  operation (kernel really needs to allocate unmovable pages from time to
  time). 
 
 Exactly. This is a hard problem, so I would prefer to see a solution for
 coming up with reasonable defaults.
 
  The only solution I see now is to provide Kconfig entry to determine
  the size of the global CMA pool, but this still have some issues,
  especially for multi-board kernels (each board probably will have
  different amount of RAM and different memory-consuming devices
  available). It looks that each board startup code still might need to
  tweak the size of CMA pool. I can add a kernel command line option for
  it, but such solution also will not solve all the cases (afair there
  was a discussion about kernel command line parameters for memory 
  configuration and the conclusion was that it should be avoided).
 
 The command line option can be a last resort if the heuristics fail,
 but it's not much better than a fixed Kconfig setting.
 
 How about a Kconfig option that defines the percentage of memory
 to set aside for contiguous allocations?

I would actually like to see a cma_size kernel option of some sort. This would
be for the global CMA pool only as I don't think we should try to do anything
more complicated here.

While it is relatively easy for embedded systems to do a recompile every time
you need to change the pool size, this isn't an option on 'normal' desktop
systems.

While usually you have more than enough memory on such systems and don't need
CMA, there are a number of cases where you do want to reserve sufficient
memory. Usually these involve lots of video capture cards in one system.

What I was wondering about is how this patch series changes the allocation
in case it can't allocate from the CMA pool. Will it attempt to fall back
to a 'normal' allocation?

The reason I ask is that for desktop systems you could just start with a CMA
pool of size 0. And only in specific situations would you need to add a
cma_size kernel parameter depending on your needs. But this scheme would
require a fallback scenario in case of a global CMA pool of size 0.

Hmm, perhaps this fallback scenario is more driver specific. For SoC platform
video devices you may not want a fallback, whereas for PCI(e)/USB devices you
do. I don't know what's best, frankly.

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Michal Nazarewicz
On Wed, 22 Jun 2011 09:03:30 +0200, Hans Verkuil hverk...@xs4all.nl  
wrote:
What I was wondering about is how this patch series changes the  
allocation in case it can't allocate from the CMA pool. Will it

attempt to fall back to a 'normal' allocation?


Unless Marek changed something since I wrote the code, which I doubt,
if CMA cannot obtain memory from CMA region, it will fail.

Part of the reason is that CMA lacks the knowledge where to allocate
memory from.  For instance, with the case of several memory banks,
it does not know which memory bank to allocate from.

It is, in my opinion, a task for a higher level functions (read:
DMA layer) to try another mechanism if CMA fails.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Arnd Bergmann
On Wednesday 22 June 2011, Hans Verkuil wrote:
  How about a Kconfig option that defines the percentage of memory
  to set aside for contiguous allocations?
 
 I would actually like to see a cma_size kernel option of some sort. This would
 be for the global CMA pool only as I don't think we should try to do anything
 more complicated here.

A command line is probably good to override the compile-time default, yes.

We could also go further and add a runtime sysctl mechanism like the one
for hugepages, where you can grow the pool at run time as long as there is
enough free contiguous memory (e.g. from init scripts), or shrink it later
if you want to allow larger nonmovable allocations.

My feeling is that we need to find a way to integrate the global settings
for four kinds of allocations:

* nonmovable kernel pages
* hugetlb pages
* CMA
* memory hotplug

These essentially fight over the same memory (though things are slightly
different with dynamic hugepages), and they all face the same basic problem
of getting as much for themselves without starving the other three.

Arnd

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Marek Szyprowski
Hello,

On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote:

 On Wednesday 22 June 2011, Hans Verkuil wrote:
   How about a Kconfig option that defines the percentage of memory
   to set aside for contiguous allocations?
 
  I would actually like to see a cma_size kernel option of some sort. This
 would
  be for the global CMA pool only as I don't think we should try to do
 anything
  more complicated here.
 
 A command line is probably good to override the compile-time default, yes.
 
 We could also go further and add a runtime sysctl mechanism like the one
 for hugepages, where you can grow the pool at run time as long as there is
 enough free contiguous memory (e.g. from init scripts), or shrink it later
 if you want to allow larger nonmovable allocations.

Sounds really good, but it might be really hard to implemnt, at least for
CMA, because it needs to tweak parameters of memory management internal 
structures very early, when buddy allocator has not been activated yet.

 My feeling is that we need to find a way to integrate the global settings
 for four kinds of allocations:
 
 * nonmovable kernel pages
 * hugetlb pages
 * CMA
 * memory hotplug
 
 These essentially fight over the same memory (though things are slightly
 different with dynamic hugepages), and they all face the same basic problem
 of getting as much for themselves without starving the other three.

I'm not sure we can solve all such issues in the first version. Maybe we should
first have each of the above fully working in mainline separately and then
start the integration works.

Best regards
-- 
Marek Szyprowski
Samsung Poland RD Center



--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Arnd Bergmann
On Wednesday 22 June 2011, Marek Szyprowski wrote:
 Sounds really good, but it might be really hard to implemnt, at least for
 CMA, because it needs to tweak parameters of memory management internal 
 structures very early, when buddy allocator has not been activated yet.

Why that? I would expect you can do the same that hugepages (used to) do
and just attempt high-order allocations. If they succeed, you can add them
as a CMA region and free them again, into the movable set of pages, otherwise
you just fail the  request from user space when the memory is already
fragmented.
 
  These essentially fight over the same memory (though things are slightly
  different with dynamic hugepages), and they all face the same basic problem
  of getting as much for themselves without starving the other three.
 
 I'm not sure we can solve all such issues in the first version. Maybe we 
 should
 first have each of the above fully working in mainline separately and then
 start the integration works.

Yes, makes sense. We just need to be careful not to introduce user-visible
interfaces that we cannot change any more in the process.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Michal Nazarewicz

On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote:

We could also go further and add a runtime sysctl mechanism like the
one for hugepages, where you can grow the pool at run time as long
as there is enough free contiguous memory (e.g. from init scripts),
or shrink it later if you want to allow larger nonmovable allocations.


On Wed, 22 Jun 2011 15:15:35 +0200, Marek Szyprowski wrote:

Sounds really good, but it might be really hard to implement, at
least for CMA, because it needs to tweak parameters of memory
management internal structures very early, when buddy allocator
has not been activated yet.


If you are able to allocate a pageblock of free memory from buddy system,
you should be able to convert it to CMA memory with no problems.

Also, if you want to convert CMA memory back to regular memory you
should be able to do that even if some of the memory is used by CMA
(it just won't be available right away but only when CMA frees it).

It is important to note that, because of the use of migration type,
all such conversion have to be performed on pageblock basis.

I don't think this is a feature we should consider for the first patch
though.  We started with an overgrown idea about what CMA might do
and it didn't got us far.  Let's first get the basics right and
then start implementing features as they become needed.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-22 Thread Michal Nazarewicz

On Wed, 22 Jun 2011 15:39:23 +0200, Arnd Bergmann a...@arndb.de wrote:

Why that? I would expect you can do the same that hugepages (used to) do
and just attempt high-order allocations. If they succeed, you can add  
them as a CMA region and free them again, into the movable set of pages,  
otherwise you just fail the  request from user space when the memory is

already fragmented.


Problem with that is that CMA needs to have whole pageblocks allocated
and buddy can allocate at most half a pageblock.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-17 Thread Arnd Bergmann
On Thursday 16 June 2011 19:01:33 Larry Bassel wrote:
  Can you describe how the memory areas differ specifically?
  Is there one that is always faster but very small, or are there
  just specific circumstances under which some memory is faster than
  another?
 
 One is always faster, but very small (generally 2-10% the size
 of normal memory).
 

Ok, that sounds like the SRAM regions that we are handling on some
ARM platforms using the various interfaces. It should probably
remain outside of the regular allocator, but we can try to generalize
the SRAM support further. There are many possible uses for it.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-17 Thread Arnd Bergmann
On Wednesday 15 June 2011, Daniel Vetter wrote:
 On Tue, Jun 14, 2011 at 20:30, Arnd Bergmann a...@arndb.de wrote:
  On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
  Ah yes, I forgot that separate regions for different purposes could
  decrease fragmentation.
 
  That is indeed a good point, but having a good allocator algorithm
  could also solve this. I don't know too much about these allocation
  algorithms, but there are probably multiple working approaches to this.
 
 imo no allocator algorithm is gonna help if you have comparably large,
 variable-sized contiguous allocations out of a restricted address range.
 It might work well enough if there are only a few sizes and/or there's
 decent headroom. But for really generic workloads this would require
 sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed
 with ttm).

The requirements are quite different depending on what system you
look at. In a lot of cases, the constraints are not that tight at all,
and CMA will easily help to turn works sometimes into works almost
always. Let's get there first and then look into the harder problems.

Unfortunately, memory allocation gets nondeterministic in the corner
cases, you can simply get the system into a state where you don't
have enough memory when you try to do too many things at once. This
may sound like a platitude but it's really what is behind all this:

If we had unlimited amounts of RAM, we would never need CMA, we could
simply set aside a lot of memory at boot time. Having one CMA area
with movable page eviction lets you build systems capable of doing
the same thing with less RAM than without CMA. Adding more complexity
lets you reduce that amount further.

The other aspects that have been mentioned about bank affinity and
SRAM are pretty orthogonal to the allocation, so we should also
treat them separately.

 So if this is only a requirement on very few platforms and can be
 cheaply fixed with multiple cma allocation areas (heck, we have
 slabs for the same reasons in the kernel), it might be a sensible
 compromise.

Yes, we can probably add it later when we find out what the limits
of the generic approach are. I don't really mind having the per-device
pointers to CMA areas, we just need to come up with a good way to
initialize them.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-16 Thread Arnd Bergmann
On Thursday 16 June 2011 02:48:12 Philip Balister wrote:
 On 06/15/2011 12:37 AM, Arnd Bergmann wrote:
  On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:
  I see your concerns, but I really wonder how to determine the properties
  of the global/default cma pool. You definitely don't want to give all
  available memory o CMA, because it will have negative impact on kernel
  operation (kernel really needs to allocate unmovable pages from time to
  time).
 
  Exactly. This is a hard problem, so I would prefer to see a solution for
  coming up with reasonable defaults.
 
 Is this a situation where passing the information from device tree might 
 help? I know this does not help short term, but I am trying to 
 understand the sorts of problems device tree can help solve.

The device tree is a good place to describe any hardware properties such
as 'this device will need 32 MB of contiguous allocations on the memory
bank described in that other device node'.

It is however not a good place to describe user settings such as 'I want
to give this device a 200 MB pool for large allocations so I can run
application X efficiently', because that would require knowledge in the
boot loader about local policy, which it should generally not care about.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-16 Thread Larry Bassel
On 16 Jun 11 00:06, Arnd Bergmann wrote:
 On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote:
  On 15 Jun 11 10:36, Marek Szyprowski wrote:
   On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
   
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
 I've seen this split bank allocation in Qualcomm and TI SoCs, with
 Samsung, that makes 3 major SoC vendors (I would be surprised if
 Nvidia didn't also need to do this) - so I think some configurable
 method to control allocations is necessarily. The chips can't do
 decode without it (and by can't do I mean 1080P and higher decode is
 not functionally useful). Far from special, this would appear to be
 the default.
  
  We at Qualcomm have some platforms that have memory of different
  performance characteristics, some drivers will need a way of
  specifying that they need fast memory for an allocation (and would prefer
  an error if it is not available rather than a fallback to slower
  memory). It would also be bad if allocators who don't need fast
  memory got it accidentally, depriving those who really need it.
 
 Can you describe how the memory areas differ specifically?
 Is there one that is always faster but very small, or are there
 just specific circumstances under which some memory is faster than
 another?

One is always faster, but very small (generally 2-10% the size
of normal memory).

Larry

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Marek Szyprowski
Hello,

On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:

 On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
  I've seen this split bank allocation in Qualcomm and TI SoCs, with
  Samsung, that makes 3 major SoC vendors (I would be surprised if
  Nvidia didn't also need to do this) - so I think some configurable
  method to control allocations is necessarily. The chips can't do
  decode without it (and by can't do I mean 1080P and higher decode is
  not functionally useful). Far from special, this would appear to be
  the default.
 
 Thanks for the insight, that's a much better argument than 'something
 may need it'. Are those all chips without an IOMMU or do we also
 need to solve the IOMMU case with split bank allocation?
 
 I think I'd still prefer to see the support for multiple regions split
 out into one of the later patches, especially since that would defer
 the question of how to do the initialization for this case and make
 sure we first get a generic way.
 
 You've convinced me that we need to solve the problem of allocating
 memory from a specific bank eventually, but separating it from the
 one at hand (contiguous allocation) should help getting the important
 groundwork in at first.

 The possible conflict that I still see with per-bank CMA regions are:
 
 * It completely destroys memory power management in cases where that
   is based on powering down entire memory banks.

I don't think that per-bank CMA regions destroys memory power management
more than the global CMA pool. Please note that the contiguous buffers
(or in general dma-buffers) right now are unmovable so they don't fit
well into memory power management.

Best regards
-- 
Marek Szyprowski
Samsung Poland RD Center


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Michal Nazarewicz

On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann a...@arndb.de wrote:

* We still need to solve the same problem in case of IOMMU mappings
  at some point, even if today's hardware doesn't have this combination.
  It would be good to use the same solution for both.


I don't think I follow.  What does IOMMU has to do with CMA?

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Arnd Bergmann
On Wednesday 15 June 2011, Michal Nazarewicz wrote:
 On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann a...@arndb.de wrote:
  * We still need to solve the same problem in case of IOMMU mappings
at some point, even if today's hardware doesn't have this combination.
It would be good to use the same solution for both.
 
 I don't think I follow.  What does IOMMU has to do with CMA?

The point is that on the higher level device drivers, we want to
hide the presence of CMA and/or IOMMU behind the dma mapping API,
but the device drivers do need to know about the bank properties.

If we want to solve the problem of allocating per-bank memory inside
of CMA, we also need to solve it inside of the IOMMU code, using
the same device driver interface.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Arnd Bergmann
On Tuesday 14 June 2011, Jordan Crouse wrote:
 
 On 06/14/2011 02:42 PM, Arnd Bergmann wrote:
  On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
  I've seen this split bank allocation in Qualcomm and TI SoCs, with
  Samsung, that makes 3 major SoC vendors (I would be surprised if
  Nvidia didn't also need to do this) - so I think some configurable
  method to control allocations is necessarily. The chips can't do
  decode without it (and by can't do I mean 1080P and higher decode is
  not functionally useful). Far from special, this would appear to be
  the default.
 
  Thanks for the insight, that's a much better argument than 'something
  may need it'. Are those all chips without an IOMMU or do we also
  need to solve the IOMMU case with split bank allocation?
 
 Yes. The IOMMU case with split bank allocation is key, especially for shared
 buffers. Consider the case where video is using a certain bank for performance
 purposes and that frame is shared with the GPU.

Could we use the non-uniform memory access (NUMA) code for this? That code
does more than what we've been talking about, and we're currently thinking
only of a degenerate case (one CPU node with multiple memory nodes), but my
feeling is that we can still build on top of it.

The NUMA code can describe relations between different areas of memory
and how they interact with devices and processes, so you can attach a
device to a specific node and have all allocations done from there.
You can also set policy in user space, e.g. to have a video decoder
process running on the bank that is not used by the GPU.

In the DMA mapping API, that would mean we add another dma_attr to
dma_alloc_* that lets you pass a node identifier.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Michal Nazarewicz

On Wed, 15 Jun 2011 13:20:42 +0200, Arnd Bergmann a...@arndb.de wrote:

The point is that on the higher level device drivers, we want to
hide the presence of CMA and/or IOMMU behind the dma mapping API,
but the device drivers do need to know about the bank properties.


Gotcha, thanks.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michal mina86 Nazarewicz(o o)
ooo +-email/xmpp: mnazarew...@google.com-ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Daniel Vetter
On Tue, Jun 14, 2011 at 20:30, Arnd Bergmann a...@arndb.de wrote:
 On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
 Ah yes, I forgot that separate regions for different purposes could
 decrease fragmentation.

 That is indeed a good point, but having a good allocator algorithm
 could also solve this. I don't know too much about these allocation
 algorithms, but there are probably multiple working approaches to this.

imo no allocator algorithm is gonna help if you have comparably large,
variable-sized contiguous allocations out of a restricted address range.
It might work well enough if there are only a few sizes and/or there's
decent headroom. But for really generic workloads this would require
sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed
with ttm).

So if this is only a requirement on very few platforms and can be
cheaply fixed with multiple cma allocation areas (heck, we have
slabs for the same reasons in the kernel), it might be a sensible
compromise.
-Daniel
-- 
Daniel Vetter
daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Thomas Hellstrom

On 06/15/2011 01:53 PM, Daniel Vetter wrote:

On Tue, Jun 14, 2011 at 20:30, Arnd Bergmanna...@arndb.de  wrote:
   

On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
 

Ah yes, I forgot that separate regions for different purposes could
decrease fragmentation.
   

That is indeed a good point, but having a good allocator algorithm
could also solve this. I don't know too much about these allocation
algorithms, but there are probably multiple working approaches to this.
 

imo no allocator algorithm is gonna help if you have comparably large,
variable-sized contiguous allocations out of a restricted address range.
It might work well enough if there are only a few sizes and/or there's
decent headroom. But for really generic workloads this would require
sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed
with ttm).
   


Indeed, IIRC on the meeting I pointed out that there is no way to 
generically solve the fragmentation problem without movable buffers. 
(I'd do it as a simple CMA backend to TTM). This is exactly the same 
problem as trying to fit buffers in a limited VRAM area.


/Thomas


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Larry Bassel
On 15 Jun 11 10:36, Marek Szyprowski wrote:
 Hello,
 
 On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
 
  On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
   I've seen this split bank allocation in Qualcomm and TI SoCs, with
   Samsung, that makes 3 major SoC vendors (I would be surprised if
   Nvidia didn't also need to do this) - so I think some configurable
   method to control allocations is necessarily. The chips can't do
   decode without it (and by can't do I mean 1080P and higher decode is
   not functionally useful). Far from special, this would appear to be
   the default.

We at Qualcomm have some platforms that have memory of different
performance characteristics, some drivers will need a way of
specifying that they need fast memory for an allocation (and would prefer
an error if it is not available rather than a fallback to slower
memory). It would also be bad if allocators who don't need fast
memory got it accidentally, depriving those who really need it.

  
  Thanks for the insight, that's a much better argument than 'something
  may need it'. Are those all chips without an IOMMU or do we also
  need to solve the IOMMU case with split bank allocation?
  
  I think I'd still prefer to see the support for multiple regions split
  out into one of the later patches, especially since that would defer
  the question of how to do the initialization for this case and make
  sure we first get a generic way.
  
  You've convinced me that we need to solve the problem of allocating
  memory from a specific bank eventually, but separating it from the
  one at hand (contiguous allocation) should help getting the important
  groundwork in at first.
 
  The possible conflict that I still see with per-bank CMA regions are:
  
  * It completely destroys memory power management in cases where that
is based on powering down entire memory banks.
 
 I don't think that per-bank CMA regions destroys memory power management
 more than the global CMA pool. Please note that the contiguous buffers
 (or in general dma-buffers) right now are unmovable so they don't fit
 well into memory power management.

We also have platforms where a well-defined part of the memory
can be powered off, and other parts can't (or won't). We need a way
to steer the place allocations come from to the memory that won't be
turned off (so that CMA allocations are not an obstacle to memory
hotremove).

 
 Best regards
 -- 
 Marek Szyprowski
 Samsung Poland RD Center
 
 
 
 ___
 Linaro-mm-sig mailing list
 linaro-mm-...@lists.linaro.org
 http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Larry Bassel

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Arnd Bergmann
On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote:
 On 15 Jun 11 10:36, Marek Szyprowski wrote:
  On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
  
   On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with
Samsung, that makes 3 major SoC vendors (I would be surprised if
Nvidia didn't also need to do this) - so I think some configurable
method to control allocations is necessarily. The chips can't do
decode without it (and by can't do I mean 1080P and higher decode is
not functionally useful). Far from special, this would appear to be
the default.
 
 We at Qualcomm have some platforms that have memory of different
 performance characteristics, some drivers will need a way of
 specifying that they need fast memory for an allocation (and would prefer
 an error if it is not available rather than a fallback to slower
 memory). It would also be bad if allocators who don't need fast
 memory got it accidentally, depriving those who really need it.

Can you describe how the memory areas differ specifically?
Is there one that is always faster but very small, or are there
just specific circumstances under which some memory is faster than
another?

   The possible conflict that I still see with per-bank CMA regions are:
   
   * It completely destroys memory power management in cases where that
 is based on powering down entire memory banks.
  
  I don't think that per-bank CMA regions destroys memory power management
  more than the global CMA pool. Please note that the contiguous buffers
  (or in general dma-buffers) right now are unmovable so they don't fit
  well into memory power management.
 
 We also have platforms where a well-defined part of the memory
 can be powered off, and other parts can't (or won't). We need a way
 to steer the place allocations come from to the memory that won't be
 turned off (so that CMA allocations are not an obstacle to memory
 hotremove).

We already established that we have to know something about the banks,
and your additional input makes it even clearer that we need to consider
the bigger picture here: We need to describe parts of memory separately
regarding general performance, device specific allocations and hotplug
characteristics.

It still sounds to me that this can be done using the NUMA properties
that Linux already understands, and teaching more subsystems about it,
but maybe the memory hotplug developers have already come up with
another scheme. The way that memory hotplug and CMA choose their
memory regions certainly needs to take both into account. As far as
I can see there are both conflicting and synergistic effects when
you combine the two.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Philip Balister

On 06/15/2011 12:37 AM, Arnd Bergmann wrote:

On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:

I see your concerns, but I really wonder how to determine the properties
of the global/default cma pool. You definitely don't want to give all
available memory o CMA, because it will have negative impact on kernel
operation (kernel really needs to allocate unmovable pages from time to
time).


Exactly. This is a hard problem, so I would prefer to see a solution for
coming up with reasonable defaults.


Is this a situation where passing the information from device tree might 
help? I know this does not help short term, but I am trying to 
understand the sorts of problems device tree can help solve.


Philip




The only solution I see now is to provide Kconfig entry to determine
the size of the global CMA pool, but this still have some issues,
especially for multi-board kernels (each board probably will have
different amount of RAM and different memory-consuming devices
available). It looks that each board startup code still might need to
tweak the size of CMA pool. I can add a kernel command line option for
it, but such solution also will not solve all the cases (afair there
was a discussion about kernel command line parameters for memory
configuration and the conclusion was that it should be avoided).


The command line option can be a last resort if the heuristics fail,
but it's not much better than a fixed Kconfig setting.

How about a Kconfig option that defines the percentage of memory
to set aside for contiguous allocations?

Arnd

___
Linaro-mm-sig mailing list
linaro-mm-...@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-mm-sig


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-15 Thread Zach Pfeffer
On 15 June 2011 16:39, Larry Bassel lbas...@codeaurora.org wrote:
 On 15 Jun 11 10:36, Marek Szyprowski wrote:
 Hello,

 On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:

  On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
   I've seen this split bank allocation in Qualcomm and TI SoCs, with
   Samsung, that makes 3 major SoC vendors (I would be surprised if
   Nvidia didn't also need to do this) - so I think some configurable
   method to control allocations is necessarily. The chips can't do
   decode without it (and by can't do I mean 1080P and higher decode is
   not functionally useful). Far from special, this would appear to be
   the default.

 We at Qualcomm have some platforms that have memory of different
 performance characteristics, some drivers will need a way of
 specifying that they need fast memory for an allocation (and would prefer
 an error if it is not available rather than a fallback to slower
 memory). It would also be bad if allocators who don't need fast
 memory got it accidentally, depriving those who really need it.

I think this statement actually applies to all the SoCs that are
coming out now and in the future from TI, Samsung, Nvidia, Freescale,
ST Ericsson and others. It seems that in all cases users will want to:

1. Allocate memory with a per-SoC physical memory mapping policy that
is usually manually specified, i.e. use this physical memory bank set
for this allocation or nothing.
2. Be able to easily pass a token to this memory between various
userspace processes and the kernel.
3. Be able to easily and explicitly access attributes of an allocation
from all contexts.
4. Be able to save and reload this memory without giving up the
virtual address allocation.

In essence they want a architectural independent map object that can
bounce around the system with a unique handle.
 
  Thanks for the insight, that's a much better argument than 'something
  may need it'. Are those all chips without an IOMMU or do we also
  need to solve the IOMMU case with split bank allocation?
 
  I think I'd still prefer to see the support for multiple regions split
  out into one of the later patches, especially since that would defer
  the question of how to do the initialization for this case and make
  sure we first get a generic way.
 
  You've convinced me that we need to solve the problem of allocating
  memory from a specific bank eventually, but separating it from the
  one at hand (contiguous allocation) should help getting the important
  groundwork in at first.
 
  The possible conflict that I still see with per-bank CMA regions are:
 
  * It completely destroys memory power management in cases where that
    is based on powering down entire memory banks.

 I don't think that per-bank CMA regions destroys memory power management
 more than the global CMA pool. Please note that the contiguous buffers
 (or in general dma-buffers) right now are unmovable so they don't fit
 well into memory power management.

 We also have platforms where a well-defined part of the memory
 can be powered off, and other parts can't (or won't). We need a way
 to steer the place allocations come from to the memory that won't be
 turned off (so that CMA allocations are not an obstacle to memory
 hotremove).


 Best regards
 --
 Marek Szyprowski
 Samsung Poland RD Center



 ___
 Linaro-mm-sig mailing list
 linaro-mm-...@lists.linaro.org
 http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

 Larry Bassel

 --
 Sent by an employee of the Qualcomm Innovation Center, Inc.
 The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-14 Thread Daniel Stone
Hi,

On Tue, Jun 14, 2011 at 06:03:00PM +0200, Arnd Bergmann wrote:
 On Tuesday 14 June 2011, Michal Nazarewicz wrote:
  On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote:
   Please explain the exact requirements that lead you to defining multiple
   contexts.
  
  Some devices may have access only to some banks of memory.  Some devices
  may use different banks of memory for different purposes.
 
 For all I know, that is something that is only true for a few very special
 Samsung devices, and is completely unrelated of the need for contiguous
 allocations, so this approach becomes pointless as soon as the next
 generation of that chip grows an IOMMU, where we don't handle the special
 bank attributes. Also, the way I understood the situation for the Samsung
 SoC during the Budapest discussion, it's only a performance hack, not a
 functional requirement, unless you count '1080p playback' as a functional
 requirement.

Hm, I think that was something similar but not quite the same: talking
about having allocations split to lie between two banks of RAM to
maximise the read/write speed for performance reasons.  That's something
that can be handled in the allocator, rather than an API constraint, as
this is.

Not that I know of any hardware which is limited as such, but eh.

Cheers,
Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-14 Thread Zach Pfeffer
On 14 June 2011 12:01, Daniel Stone dani...@collabora.com wrote:
 Hi,

 On Tue, Jun 14, 2011 at 06:03:00PM +0200, Arnd Bergmann wrote:
 On Tuesday 14 June 2011, Michal Nazarewicz wrote:
  On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann a...@arndb.de wrote:
   Please explain the exact requirements that lead you to defining multiple
   contexts.
 
  Some devices may have access only to some banks of memory.  Some devices
  may use different banks of memory for different purposes.

 For all I know, that is something that is only true for a few very special
 Samsung devices, and is completely unrelated of the need for contiguous
 allocations, so this approach becomes pointless as soon as the next
 generation of that chip grows an IOMMU, where we don't handle the special
 bank attributes. Also, the way I understood the situation for the Samsung
 SoC during the Budapest discussion, it's only a performance hack, not a
 functional requirement, unless you count '1080p playback' as a functional
 requirement.

Coming in mid topic...

I've seen this split bank allocation in Qualcomm and TI SoCs, with
Samsung, that makes 3 major SoC vendors (I would be surprised if
Nvidia didn't also need to do this) - so I think some configurable
method to control allocations is necessarily. The chips can't do
decode without it (and by can't do I mean 1080P and higher decode is
not functionally useful). Far from special, this would appear to be
the default.

 Hm, I think that was something similar but not quite the same: talking
 about having allocations split to lie between two banks of RAM to
 maximise the read/write speed for performance reasons.  That's something
 that can be handled in the allocator, rather than an API constraint, as
 this is.

 Not that I know of any hardware which is limited as such, but eh.

 Cheers,
 Daniel

 ___
 Linaro-mm-sig mailing list
 linaro-mm-...@lists.linaro.org
 http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-14 Thread Arnd Bergmann
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
 I've seen this split bank allocation in Qualcomm and TI SoCs, with
 Samsung, that makes 3 major SoC vendors (I would be surprised if
 Nvidia didn't also need to do this) - so I think some configurable
 method to control allocations is necessarily. The chips can't do
 decode without it (and by can't do I mean 1080P and higher decode is
 not functionally useful). Far from special, this would appear to be
 the default.

Thanks for the insight, that's a much better argument than 'something
may need it'. Are those all chips without an IOMMU or do we also
need to solve the IOMMU case with split bank allocation?

I think I'd still prefer to see the support for multiple regions split
out into one of the later patches, especially since that would defer
the question of how to do the initialization for this case and make
sure we first get a generic way.

You've convinced me that we need to solve the problem of allocating
memory from a specific bank eventually, but separating it from the
one at hand (contiguous allocation) should help getting the important
groundwork in at first.

The possible conflict that I still see with per-bank CMA regions are:

* It completely destroys memory power management in cases where that
  is based on powering down entire memory banks.

* We still need to solve the same problem in case of IOMMU mappings
  at some point, even if today's hardware doesn't have this combination.
  It would be good to use the same solution for both.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 08/10] mm: cma: Contiguous Memory Allocator added

2011-06-14 Thread Jordan Crouse

On 06/14/2011 02:42 PM, Arnd Bergmann wrote:

On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:

I've seen this split bank allocation in Qualcomm and TI SoCs, with
Samsung, that makes 3 major SoC vendors (I would be surprised if
Nvidia didn't also need to do this) - so I think some configurable
method to control allocations is necessarily. The chips can't do
decode without it (and by can't do I mean 1080P and higher decode is
not functionally useful). Far from special, this would appear to be
the default.


Thanks for the insight, that's a much better argument than 'something
may need it'. Are those all chips without an IOMMU or do we also
need to solve the IOMMU case with split bank allocation?


Yes. The IOMMU case with split bank allocation is key, especially for shared
buffers. Consider the case where video is using a certain bank for performance
purposes and that frame is shared with the GPU.

Jordan
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html