[PATCH] drm: Add support for two-ended allocation, v2

2014-04-02 Thread Lauri Kasanen
On Wed, 02 Apr 2014 13:00:00 +0200
Christian K?nig  wrote:

> Nice idea, but I wouldn't put the decision where to place the buffer 
> into TTM based on it's size.
> 
> Instead please make that a proper TTM placement flag because for example 
> for VM page tables we want them to be at the end of VRAM, not because 
> they are big (which they are anyway) but because they never accessed by 
> the CPU.

I think there aren't many special cases like that, and they could
already be handled by setting the first and last pages for the bo?

- Lauri


[PATCH] drm: Add support for two-ended allocation, v2

2014-04-02 Thread Christian König
Am 02.04.2014 16:54, schrieb Lauri Kasanen:
> On Wed, 02 Apr 2014 13:00:00 +0200
> Christian K?nig  wrote:
>
>> Nice idea, but I wouldn't put the decision where to place the buffer
>> into TTM based on it's size.
>>
>> Instead please make that a proper TTM placement flag because for example
>> for VM page tables we want them to be at the end of VRAM, not because
>> they are big (which they are anyway) but because they never accessed by
>> the CPU.
> I think there aren't many special cases like that, and they could
> already be handled by setting the first and last pages for the bo?

Nope, that would require the driver to find a certain position for the 
buffer by itself.

The placement flag should just move allocation from preferring the start 
of the address space to preferring the end of it.

What I wanted to note is that this way the driver could use this feature 
in a much more flexible way, not only to change placement of buffers 
based on the size, but also on other criteria as well (and yes I have 
couple of use cases for this in mind).

Additional to that you could also separate the patch into adding this 
functionality and enabling it for radeon, which in the end will allow 
people to easier bisect it in the case something is wrong.

Apart from that the patch is a really nice piece of work,
Christian.

>
> - Lauri



[PATCH] drm: Add support for two-ended allocation, v2

2014-04-02 Thread Christian König
Nice idea, but I wouldn't put the decision where to place the buffer 
into TTM based on it's size.

Instead please make that a proper TTM placement flag because for example 
for VM page tables we want them to be at the end of VRAM, not because 
they are big (which they are anyway) but because they never accessed by 
the CPU.

Christian.

Am 02.04.2014 11:08, schrieb Lauri Kasanen:
> Clients like i915 need to segregate cache domains within the GTT which
> can lead to small amounts of fragmentation. By allocating the uncached
> buffers from the bottom and the cacheable buffers from the top, we can
> reduce the amount of wasted space and also optimize allocation of the
> mappable portion of the GTT to only those buffers that require CPU
> access through the GTT.
>
> For other drivers, allocating small bos from one end and large ones
> from the other helps improve the quality of fragmentation.
>
> Only Radeon has behavioral changes in this patch, as I have no Intel hw.
>
> Based on drm_mm work by Chris Wilson.
>
> --
>
> Radeon uses a 512kb threshold.
>
> This decreases eviction by up to 20%, by improving the fragmentation
> quality. No harm in normal cases that fit VRAM fully (PTS gaming suite).
>
> In some cases, even the VRAM-fitting cases improved slightly (openarena, 
> urban terror).
>
> 512kb was measured as the most optimal threshold for 3d workloads common to 
> radeon.
> Other drivers may need different thresholds according to their workloads.
>
> v2: Updated kerneldoc in more places
>
> Cc: Chris Wilson 
> Cc: Ben Widawsky 
> Signed-off-by: Lauri Kasanen 
> ---
>   drivers/gpu/drm/ast/ast_ttm.c |  3 +-
>   drivers/gpu/drm/bochs/bochs_mm.c  |  3 +-
>   drivers/gpu/drm/cirrus/cirrus_ttm.c   |  3 +-
>   drivers/gpu/drm/drm_mm.c  | 66 
> ++-
>   drivers/gpu/drm/i915/i915_gem.c   |  3 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c   |  3 +-
>   drivers/gpu/drm/mgag200/mgag200_ttm.c |  3 +-
>   drivers/gpu/drm/nouveau/nouveau_ttm.c |  2 +-
>   drivers/gpu/drm/qxl/qxl_ttm.c |  2 +-
>   drivers/gpu/drm/radeon/radeon_ttm.c   |  3 +-
>   drivers/gpu/drm/ttm/ttm_bo.c  |  4 ++-
>   drivers/gpu/drm/ttm/ttm_bo_manager.c  | 16 +++--
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |  2 +-
>   include/drm/drm_mm.h  | 32 ++---
>   include/drm/ttm/ttm_bo_driver.h   |  9 -
>   15 files changed, 118 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
> index b824622..61f9e39 100644
> --- a/drivers/gpu/drm/ast/ast_ttm.c
> +++ b/drivers/gpu/drm/ast/ast_ttm.c
> @@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast)
>_bo_driver,
>dev->anon_inode->i_mapping,
>DRM_FILE_PAGE_OFFSET,
> -  true);
> +  true,
> +  0);
>   if (ret) {
>   DRM_ERROR("Error initialising bo driver; %d\n", ret);
>   return ret;
> diff --git a/drivers/gpu/drm/bochs/bochs_mm.c 
> b/drivers/gpu/drm/bochs/bochs_mm.c
> index f488be5..9dfd24a 100644
> --- a/drivers/gpu/drm/bochs/bochs_mm.c
> +++ b/drivers/gpu/drm/bochs/bochs_mm.c
> @@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs)
>_bo_driver,
>bochs->dev->anon_inode->i_mapping,
>DRM_FILE_PAGE_OFFSET,
> -  true);
> +  true,
> +  0);
>   if (ret) {
>   DRM_ERROR("Error initialising bo driver; %d\n", ret);
>   return ret;
> diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
> b/drivers/gpu/drm/cirrus/cirrus_ttm.c
> index 92e6b77..74e8e21 100644
> --- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
> +++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
> @@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus)
>_bo_driver,
>dev->anon_inode->i_mapping,
>DRM_FILE_PAGE_OFFSET,
> -  true);
> +  true,
> +  0);
>   if (ret) {
>   DRM_ERROR("Error initialising bo driver; %d\n", ret);
>   return ret;
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index a2d45b74..8f64be4 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -82,6 +82,10 @@
>* this to implement guard pages between incompatible caching domains in the
>* graphics TT.
>*
> + * Two behaviors are supported for searching and allocating: bottom-up and 
> top-down.
> + * The default is bottom-up. Top-down allocation can be used if the memory 
> area
> + * has different restrictions, or just to reduce fragmentation.
> + *

[PATCH] drm: Add support for two-ended allocation, v2

2014-04-02 Thread Lauri Kasanen
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.

Only Radeon has behavioral changes in this patch, as I have no Intel hw.

Based on drm_mm work by Chris Wilson.

--

Radeon uses a 512kb threshold.

This decreases eviction by up to 20%, by improving the fragmentation
quality. No harm in normal cases that fit VRAM fully (PTS gaming suite).

In some cases, even the VRAM-fitting cases improved slightly (openarena, urban 
terror).

512kb was measured as the most optimal threshold for 3d workloads common to 
radeon.
Other drivers may need different thresholds according to their workloads.

v2: Updated kerneldoc in more places

Cc: Chris Wilson 
Cc: Ben Widawsky 
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ast/ast_ttm.c |  3 +-
 drivers/gpu/drm/bochs/bochs_mm.c  |  3 +-
 drivers/gpu/drm/cirrus/cirrus_ttm.c   |  3 +-
 drivers/gpu/drm/drm_mm.c  | 66 ++-
 drivers/gpu/drm/i915/i915_gem.c   |  3 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  3 +-
 drivers/gpu/drm/mgag200/mgag200_ttm.c |  3 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c |  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |  3 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  4 ++-
 drivers/gpu/drm/ttm/ttm_bo_manager.c  | 16 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |  2 +-
 include/drm/drm_mm.h  | 32 ++---
 include/drm/ttm/ttm_bo_driver.h   |  9 -
 15 files changed, 118 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index b824622..61f9e39 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast)
 _bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index f488be5..9dfd24a 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs)
 _bo_driver,
 bochs->dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 92e6b77..74e8e21 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus)
 _bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index a2d45b74..8f64be4 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -82,6 +82,10 @@
  * this to implement guard pages between incompatible caching domains in the
  * graphics TT.
  *
+ * Two behaviors are supported for searching and allocating: bottom-up and 
top-down.
+ * The default is bottom-up. Top-down allocation can be used if the memory area
+ * has different restrictions, or just to reduce fragmentation.
+ *
  * Finally iteration helpers to walk all nodes and all holes are provided as 
are
  * some basic allocator dumpers for debugging.
  */
@@ -102,7 +106,8 @@ static struct drm_mm_node 
*drm_mm_search_free_in_range_generic(const struct drm_
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 struct drm_mm_node *node,
 unsigned long size, unsigned alignment,
-unsigned long color)
+unsigned long color,
+