Re: [Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-27 Thread Ben Widawsky
On Sun, 26 Feb 2012 12:28:09 -0800
Kenneth Graunke kenn...@whitecape.org wrote:

 On 02/25/2012 03:00 AM, Daniel Vetter wrote:
  On Fri, Feb 24, 2012 at 07:53:22PM -0800, Eric Anholt wrote:
  This improves the performance of Mesa's GL_MAP_UNSYNCHRONIZED_BIT path
  in GL_ARB_map_buffer_range.  Improves Unigine Tropics performance at
  1024x768 by 2.06236% +/- 0.50272% (n=11).
  ---
 
  A few questions:
  - iirc Ben's non-blocking stuff also worked for non-llc machines - I guess
 you haven't looked into this because we don't have a non-llc platform
 that runs ungine?
 
 Tropics works on Ironlake, too, it's just slow.  Haven't tried earlier.

iirc, Eric even started to review those patches.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-27 Thread Eric Anholt
On Sat, 25 Feb 2012 12:00:07 +0100, Daniel Vetter dan...@ffwll.ch wrote:
 On Fri, Feb 24, 2012 at 07:53:22PM -0800, Eric Anholt wrote:
  This improves the performance of Mesa's GL_MAP_UNSYNCHRONIZED_BIT path
  in GL_ARB_map_buffer_range.  Improves Unigine Tropics performance at
  1024x768 by 2.06236% +/- 0.50272% (n=11).
  ---
 
 A few questions:
 - iirc Ben's non-blocking stuff also worked for non-llc machines - I guess
   you haven't looked into this because we don't have a non-llc platform
   that runs ungine?

I gave up on Ben's non-blocking stuff for being way too many patches all
smashed into one.  I wanted a simple fix that could be extended later if
other apps have other problems, while completely fixing this app (a
~21ms wait shortly after starting a new frame).

 - in my pwrite experience, writing through cpu maps beats writing through
   the gtt on llc machines. This has the added benefit that it reduces
   pressure on the mappable gtt. Have you tried that, too?

I haven't played with that, but it would be fun to at some point once we
get there.  Right now, the CPU overhead of the app isn't in this path.


pgpk7kKCaPIro.pgp
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-26 Thread Kenneth Graunke

On 02/25/2012 03:00 AM, Daniel Vetter wrote:

On Fri, Feb 24, 2012 at 07:53:22PM -0800, Eric Anholt wrote:

This improves the performance of Mesa's GL_MAP_UNSYNCHRONIZED_BIT path
in GL_ARB_map_buffer_range.  Improves Unigine Tropics performance at
1024x768 by 2.06236% +/- 0.50272% (n=11).
---


A few questions:
- iirc Ben's non-blocking stuff also worked for non-llc machines - I guess
   you haven't looked into this because we don't have a non-llc platform
   that runs ungine?


Tropics works on Ironlake, too, it's just slow.  Haven't tried earlier.


- in my pwrite experience, writing through cpu maps beats writing through
   the gtt on llc machines. This has the added benefit that it reduces
   pressure on the mappable gtt. Have you tried that, too?

Cheers, Daniel


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-25 Thread Chris Wilson
On Fri, 24 Feb 2012 19:53:22 -0800, Eric Anholt e...@anholt.net wrote:
 This improves the performance of Mesa's GL_MAP_UNSYNCHRONIZED_BIT path
 in GL_ARB_map_buffer_range.  Improves Unigine Tropics performance at
 1024x768 by 2.06236% +/- 0.50272% (n=11).

Oh well, weakly coherent wins.
Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-25 Thread Paul Menzel
Dear Eric,


Am Freitag, den 24.02.2012, 19:53 -0800 schrieb Eric Anholt:

[…]

 +/**
 + * Performs a mapping of the buffer object like the normal GTT
 + * mapping, but avoiding waiting for the GPU to be done reading from

s/avoiding/avoids/?

 + * or rendering to the buffer.
 + *
 + * This is used in the implementation of GL_ARB_map_buffer_range: The
 + * user asks to create a buffer, then does a mapping, fills some
 + * space, runs a drawing command, then asks to map it again without
 + * synchronizing because it guarantees that it won't write over the
 + * data that the GPU is busy using (or, more specifically, that if it
 + * does write over the data, it acknowledges that rendering is
 + * undefined).
 + */
 +
 +int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)

[…]


Thanks,

Paul


signature.asc
Description: This is a digitally signed message part
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-25 Thread Daniel Vetter
On Fri, Feb 24, 2012 at 07:53:22PM -0800, Eric Anholt wrote:
 This improves the performance of Mesa's GL_MAP_UNSYNCHRONIZED_BIT path
 in GL_ARB_map_buffer_range.  Improves Unigine Tropics performance at
 1024x768 by 2.06236% +/- 0.50272% (n=11).
 ---

A few questions:
- iirc Ben's non-blocking stuff also worked for non-llc machines - I guess
  you haven't looked into this because we don't have a non-llc platform
  that runs ungine?

- in my pwrite experience, writing through cpu maps beats writing through
  the gtt on llc machines. This has the added benefit that it reduces
  pressure on the mappable gtt. Have you tried that, too?

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-25 Thread Chris Wilson
On Sat, 25 Feb 2012 12:00:07 +0100, Daniel Vetter dan...@ffwll.ch wrote:
 - in my pwrite experience, writing through cpu maps beats writing through
   the gtt on llc machines. This has the added benefit that it reduces
   pressure on the mappable gtt. Have you tried that, too?

Speaking of which, those wonderful pwrite patches are still MIA?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] intel: Add support for (possibly) unsynchronized maps.

2012-02-24 Thread Eric Anholt
This improves the performance of Mesa's GL_MAP_UNSYNCHRONIZED_BIT path
in GL_ARB_map_buffer_range.  Improves Unigine Tropics performance at
1024x768 by 2.06236% +/- 0.50272% (n=11).
---
 intel/intel_bufmgr.h |2 +
 intel/intel_bufmgr_gem.c |   72 +
 2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index 85da8b9..e852eab 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -148,8 +148,10 @@ void drm_intel_bufmgr_gem_enable_reuse(drm_intel_bufmgr 
*bufmgr);
 void drm_intel_bufmgr_gem_enable_fenced_relocs(drm_intel_bufmgr *bufmgr);
 void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int limit);
+int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
 void drm_intel_gem_bo_start_gtt_access(drm_intel_bo *bo, int write_enable);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 187e8ec..12641e1 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -1150,15 +1150,13 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int 
write_enable)
return 0;
 }
 
-int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
+static int
+map_gtt(drm_intel_bo *bo)
 {
drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo-bufmgr;
drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
-   struct drm_i915_gem_set_domain set_domain;
int ret;
 
-   pthread_mutex_lock(bufmgr_gem-lock);
-
if (bo_gem-map_count++ == 0)
drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
 
@@ -1184,7 +1182,6 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
strerror(errno));
if (--bo_gem-map_count == 0)
drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
-   pthread_mutex_unlock(bufmgr_gem-lock);
return ret;
}
 
@@ -1201,7 +1198,6 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
strerror(errno));
if (--bo_gem-map_count == 0)
drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
-   pthread_mutex_unlock(bufmgr_gem-lock);
return ret;
}
}
@@ -1211,7 +1207,33 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
DBG(bo_map_gtt: %d (%s) - %p\n, bo_gem-gem_handle, bo_gem-name,
bo_gem-gtt_virtual);
 
-   /* Now move it to the GTT domain so that the CPU caches are flushed */
+   return 0;
+}
+
+int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
+{
+   drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo-bufmgr;
+   drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+   struct drm_i915_gem_set_domain set_domain;
+   int ret;
+
+   pthread_mutex_lock(bufmgr_gem-lock);
+
+   ret = map_gtt(bo);
+   if (ret) {
+   pthread_mutex_unlock(bufmgr_gem-lock);
+   return ret;
+   }
+
+   /* Now move it to the GTT domain so that the GPU and CPU
+* caches are flushed and the GPU isn't actively using the
+* buffer.
+*
+* The pagefault handler does this domain change for us when
+* it has unbound the BO from the GTT, but it's up to us to
+* tell it when we're about to use things if we had done
+* rendering and it still happens to be bound to the GTT.
+*/
set_domain.handle = bo_gem-gem_handle;
set_domain.read_domains = I915_GEM_DOMAIN_GTT;
set_domain.write_domain = I915_GEM_DOMAIN_GTT;
@@ -1229,6 +1251,42 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo)
return 0;
 }
 
+/**
+ * Performs a mapping of the buffer object like the normal GTT
+ * mapping, but avoiding waiting for the GPU to be done reading from
+ * or rendering to the buffer.
+ *
+ * This is used in the implementation of GL_ARB_map_buffer_range: The
+ * user asks to create a buffer, then does a mapping, fills some
+ * space, runs a drawing command, then asks to map it again without
+ * synchronizing because it guarantees that it won't write over the
+ * data that the GPU is busy using (or, more specifically, that if it
+ * does write over the data, it acknowledges that rendering is
+ * undefined).
+ */
+
+int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
+{
+   drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo-bufmgr;
+   int ret;
+
+   /* If the CPU cache isn't coherent with the GTT, then use a
+* regular synchronized mapping.  The problem is that we don't
+* track where the buffer was last used on the CPU