Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-04-04 Thread Daniel Vetter
On Wed, Mar 06, 2013 at 04:28:09PM +, Chris Wilson wrote:
 From: Ville Syrjälä ville.syrj...@linux.intel.com
 
 Currently all scanout buffers must be uncached because the
 display controller doesn't snoop the LLC. SNB introduced another
 method to guarantee coherency for the display controller. It's
 called the GFDT or graphics data type.
 
 Pages that have the GFDT bit enabled in their PTEs get flushed
 all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
 issued with the synchronize GFDT bit set.
 
 So rather than making all scanout buffers uncached, set the GFDT
 bit in their PTEs, and modify the ring flush functions to enable
 the synchronize GFDT bit.
 
 On HSW the GFDT bit was removed from the PTE, and it's only present in
 surface state, so we can't really set it from the kernel. Also the docs
 state that the hardware isn't actually guaranteed to respect the GFDT
 bit. So it looks like GFDT isn't all that useful on HSW.
 
 So far I've tried this very quickly on an IVB machine, and
 it seems to be working as advertised. No idea if it does any
 good though.
 
 On an i5-2520m (laptop) running gnome-shell at 1366x768:
   padman  140.78 - 145.98 fps
   openarena   183.72 - 186.87 fps
   gtkperf ComboBoxEntry   20.27 - 22.14s
   gtkperf pixbuf   1.12 -  1.47s
   x11perf -aa10text   13.40 - 13.20 Mglyphs
 which are well within the throttling noise.
 
 v2 [ickle]: adapt to comply with existing userspace guarantees
 
 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

Oops, this one here somehow fell a bit through the cracks.

Two bikesheds and one real issue:
- s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect
  more fun to come here ;-)
- ring-flush has a pile of arguments, I guess we should coalesce
  invalidate/flush and the new internal into a simple flags array. I don't
  expect that we'll every switch invalidate/flush back to domain arrays
  from the current it's just a bool, really usage.

Also, the above should be done in separate prep patches imo.

The real deal is flushing cpu access, since that probably does not set the
gfdt flag. So cpu reads are fine and don't require any flushing, but cpu
writes must be clflushed I think. In a way gfdt works as if the gpu is
doing write-through caching (safe that we have to manually initiate the
write-through with the gfdt flush). But from the pov of cpu access it's
the same, and hence requires the same amount of clflushing.

Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc
for gfdt we still need to keep track of the gfdt_dirty state, but all the
cpu side flushing business should be much clearer.

Stumbled over this while checking whether sw_finish_ioctl would still do
the right thing, and noticed that the clflush is a noop when gfdt is
treated like the current CACHE_LLC.

Cheers, Daniel

PS: Ben Widawsky noticed that on ivb we set a stupid ppgtt pte caching
override, which results in pte cachelines being tagged with gfdt. Hsw has
an equivalent mode to allow the gpu to write through. I have no idea
what's the point of this and we never write ptes from the gpu. So can you
please throw the relevant patch on top to disable gfdt for ppgtt ptes in
ECOCHK?
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-04-04 Thread Ville Syrjälä
On Thu, Apr 04, 2013 at 11:01:02AM +0200, Daniel Vetter wrote:
 On Wed, Mar 06, 2013 at 04:28:09PM +, Chris Wilson wrote:
  From: Ville Syrjälä ville.syrj...@linux.intel.com
  
  Currently all scanout buffers must be uncached because the
  display controller doesn't snoop the LLC. SNB introduced another
  method to guarantee coherency for the display controller. It's
  called the GFDT or graphics data type.
  
  Pages that have the GFDT bit enabled in their PTEs get flushed
  all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
  issued with the synchronize GFDT bit set.
  
  So rather than making all scanout buffers uncached, set the GFDT
  bit in their PTEs, and modify the ring flush functions to enable
  the synchronize GFDT bit.
  
  On HSW the GFDT bit was removed from the PTE, and it's only present in
  surface state, so we can't really set it from the kernel. Also the docs
  state that the hardware isn't actually guaranteed to respect the GFDT
  bit. So it looks like GFDT isn't all that useful on HSW.
  
  So far I've tried this very quickly on an IVB machine, and
  it seems to be working as advertised. No idea if it does any
  good though.
  
  On an i5-2520m (laptop) running gnome-shell at 1366x768:
padman140.78 - 145.98 fps
openarena 183.72 - 186.87 fps
gtkperf ComboBoxEntry 20.27 - 22.14s
gtkperf pixbuf 1.12 -  1.47s
x11perf -aa10text 13.40 - 13.20 Mglyphs
  which are well within the throttling noise.
  
  v2 [ickle]: adapt to comply with existing userspace guarantees
  
  Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 
 Oops, this one here somehow fell a bit through the cracks.
 
 Two bikesheds and one real issue:
 - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect
   more fun to come here ;-)
 - ring-flush has a pile of arguments, I guess we should coalesce
   invalidate/flush and the new internal into a simple flags array. I don't
   expect that we'll every switch invalidate/flush back to domain arrays
   from the current it's just a bool, really usage.
 
 Also, the above should be done in separate prep patches imo.
 
 The real deal is flushing cpu access, since that probably does not set the
 gfdt flag. So cpu reads are fine and don't require any flushing, but cpu
 writes must be clflushed I think. In a way gfdt works as if the gpu is
 doing write-through caching (safe that we have to manually initiate the
 write-through with the gfdt flush). But from the pov of cpu access it's
 the same, and hence requires the same amount of clflushing.
 
 Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc
 for gfdt we still need to keep track of the gfdt_dirty state, but all the
 cpu side flushing business should be much clearer.
 
 Stumbled over this while checking whether sw_finish_ioctl would still do
 the right thing, and noticed that the clflush is a noop when gfdt is
 treated like the current CACHE_LLC.

My original patch had a change to clflushing where pinned objects w/ gfdt
were also flushed. I didn't really read v2 well enough to figure out why
that got dropped.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-04-04 Thread Chris Wilson
On Thu, Apr 04, 2013 at 02:27:12PM +0300, Ville Syrjälä wrote:
 On Thu, Apr 04, 2013 at 11:01:02AM +0200, Daniel Vetter wrote:
  On Wed, Mar 06, 2013 at 04:28:09PM +, Chris Wilson wrote:
   From: Ville Syrjälä ville.syrj...@linux.intel.com
   
   Currently all scanout buffers must be uncached because the
   display controller doesn't snoop the LLC. SNB introduced another
   method to guarantee coherency for the display controller. It's
   called the GFDT or graphics data type.
   
   Pages that have the GFDT bit enabled in their PTEs get flushed
   all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
   issued with the synchronize GFDT bit set.
   
   So rather than making all scanout buffers uncached, set the GFDT
   bit in their PTEs, and modify the ring flush functions to enable
   the synchronize GFDT bit.
   
   On HSW the GFDT bit was removed from the PTE, and it's only present in
   surface state, so we can't really set it from the kernel. Also the docs
   state that the hardware isn't actually guaranteed to respect the GFDT
   bit. So it looks like GFDT isn't all that useful on HSW.
   
   So far I've tried this very quickly on an IVB machine, and
   it seems to be working as advertised. No idea if it does any
   good though.
   
   On an i5-2520m (laptop) running gnome-shell at 1366x768:
 padman  140.78 - 145.98 fps
 openarena   183.72 - 186.87 fps
 gtkperf ComboBoxEntry   20.27 - 22.14s
 gtkperf pixbuf   1.12 -  1.47s
 x11perf -aa10text   13.40 - 13.20 Mglyphs
   which are well within the throttling noise.
   
   v2 [ickle]: adapt to comply with existing userspace guarantees
   
   Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
   Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
  
  Oops, this one here somehow fell a bit through the cracks.
  
  Two bikesheds and one real issue:
  - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect
more fun to come here ;-)
  - ring-flush has a pile of arguments, I guess we should coalesce
invalidate/flush and the new internal into a simple flags array. I don't
expect that we'll every switch invalidate/flush back to domain arrays
from the current it's just a bool, really usage.
  
  Also, the above should be done in separate prep patches imo.
  
  The real deal is flushing cpu access, since that probably does not set the
  gfdt flag. So cpu reads are fine and don't require any flushing, but cpu
  writes must be clflushed I think. In a way gfdt works as if the gpu is
  doing write-through caching (safe that we have to manually initiate the
  write-through with the gfdt flush). But from the pov of cpu access it's
  the same, and hence requires the same amount of clflushing.
  
  Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc
  for gfdt we still need to keep track of the gfdt_dirty state, but all the
  cpu side flushing business should be much clearer.
  
  Stumbled over this while checking whether sw_finish_ioctl would still do
  the right thing, and noticed that the clflush is a noop when gfdt is
  treated like the current CACHE_LLC.
 
 My original patch had a change to clflushing where pinned objects w/ gfdt
 were also flushed. I didn't really read v2 well enough to figure out why
 that got dropped.

Because it was the wrong approach. The API where the scanout is to be
made coherent with CPU mmaps is sw_finish. The only user of that is early
pre-gen5 DDX. So imo the original patch was abusing a generic flag and
applying the flushes at the wrong points, and I simply didn't care about
badly designed API such as sw_finish.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-03-06 Thread Chris Wilson
From: Ville Syrjälä ville.syrj...@linux.intel.com

Currently all scanout buffers must be uncached because the
display controller doesn't snoop the LLC. SNB introduced another
method to guarantee coherency for the display controller. It's
called the GFDT or graphics data type.

Pages that have the GFDT bit enabled in their PTEs get flushed
all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
issued with the synchronize GFDT bit set.

So rather than making all scanout buffers uncached, set the GFDT
bit in their PTEs, and modify the ring flush functions to enable
the synchronize GFDT bit.

On HSW the GFDT bit was removed from the PTE, and it's only present in
surface state, so we can't really set it from the kernel. Also the docs
state that the hardware isn't actually guaranteed to respect the GFDT
bit. So it looks like GFDT isn't all that useful on HSW.

So far I've tried this very quickly on an IVB machine, and
it seems to be working as advertised. No idea if it does any
good though.

On an i5-2520m (laptop) running gnome-shell at 1366x768:
  padman140.78 - 145.98 fps
  openarena 183.72 - 186.87 fps
  gtkperf ComboBoxEntry 20.27 - 22.14s
  gtkperf pixbuf 1.12 -  1.47s
  x11perf -aa10text 13.40 - 13.20 Mglyphs
which are well within the throttling noise.

v2 [ickle]: adapt to comply with existing userspace guarantees

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_drv.c|4 +
 drivers/gpu/drm/i915/i915_drv.h|   16 +++-
 drivers/gpu/drm/i915/i915_gem.c|  111 
 drivers/gpu/drm/i915/i915_gem_context.c|5 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   12 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c|   38 ++
 drivers/gpu/drm/i915/i915_reg.h|2 +
 drivers/gpu/drm/i915/i915_trace.h  |   10 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c|   79 +---
 drivers/gpu/drm/i915/intel_ringbuffer.h|5 +-
 10 files changed, 225 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 66d06ac..ff935f1 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -235,6 +235,7 @@ static const struct intel_device_info 
intel_sandybridge_d_info = {
.has_bsd_ring = 1,
.has_blt_ring = 1,
.has_llc = 1,
+   .has_gfdt = 1,
.has_force_wake = 1,
 };
 
@@ -245,6 +246,7 @@ static const struct intel_device_info 
intel_sandybridge_m_info = {
.has_bsd_ring = 1,
.has_blt_ring = 1,
.has_llc = 1,
+   .has_gfdt = 1,
.has_force_wake = 1,
 };
 
@@ -254,6 +256,7 @@ static const struct intel_device_info 
intel_ivybridge_d_info = {
.has_bsd_ring = 1,
.has_blt_ring = 1,
.has_llc = 1,
+   .has_gfdt = 1,
.has_force_wake = 1,
 };
 
@@ -264,6 +267,7 @@ static const struct intel_device_info 
intel_ivybridge_m_info = {
.has_bsd_ring = 1,
.has_blt_ring = 1,
.has_llc = 1,
+   .has_gfdt = 1,
.has_force_wake = 1,
 };
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 535bf29..9841dd7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -367,6 +367,7 @@ struct intel_device_info {
u8 has_bsd_ring:1;
u8 has_blt_ring:1;
u8 has_llc:1;
+   u8 has_gfdt:1;
 };
 
 enum i915_cache_level {
@@ -409,7 +410,8 @@ struct i915_gtt {
void (*gtt_insert_entries)(struct drm_device *dev,
   struct sg_table *st,
   unsigned int pg_start,
-  enum i915_cache_level cache_level);
+  enum i915_cache_level cache_level,
+  bool gfdt);
 };
 #define gtt_total_entries(gtt) ((gtt).total  PAGE_SHIFT)
 
@@ -430,7 +432,8 @@ struct i915_hw_ppgtt {
void (*insert_entries)(struct i915_hw_ppgtt *ppgtt,
   struct sg_table *st,
   unsigned int pg_start,
-  enum i915_cache_level cache_level);
+  enum i915_cache_level cache_level,
+  bool gfdt);
void (*cleanup)(struct i915_hw_ppgtt *ppgtt);
 };
 
@@ -1170,6 +1173,8 @@ struct drm_i915_gem_object {
unsigned int fenced_gpu_access:1;
 
unsigned int cache_level:2;
+   unsigned int gfdt:1;
+   unsigned int gfdt_dirty:1;
 
unsigned int has_aliasing_ppgtt_mapping:1;
unsigned int has_global_gtt_mapping:1;
@@ -1328,6 +1333,9 @@ struct drm_i915_file_private {
 #define HAS_LLC(dev)(INTEL_INFO(dev)-has_llc)
 #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)-need_gfx_hws)
 
+/* Only SNB and IVB have GFDT in PTEs 

Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-03-04 Thread Ville Syrjälä
On Sun, Mar 03, 2013 at 05:39:09PM +, Chris Wilson wrote:
 On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote:
  The other thing was that I didn't manage to get things to work properly,
  leaving some random cachline dirt on the screen. Looking at your code, you
  add the gfdt flush to every ring_flush, whereas I've tried to be clever
  and only flushed after batches rendering to the frontbuffer. So probably a
  bug in my code, or a flush on a given ring doesn't flush out caches for
  one of the other engines.
 
 Right, we have a formalized interface for flushes to scanout rather than
 always flushing GFDT.

We do? Where's that?

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-03-04 Thread Chris Wilson
On Mon, Mar 04, 2013 at 03:51:11PM +0200, Ville Syrjälä wrote:
 On Sun, Mar 03, 2013 at 05:39:09PM +, Chris Wilson wrote:
  On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote:
   The other thing was that I didn't manage to get things to work properly,
   leaving some random cachline dirt on the screen. Looking at your code, you
   add the gfdt flush to every ring_flush, whereas I've tried to be clever
   and only flushed after batches rendering to the frontbuffer. So probably a
   bug in my code, or a flush on a given ring doesn't flush out caches for
   one of the other engines.
  
  Right, we have a formalized interface for flushes to scanout rather than
  always flushing GFDT.
 
 We do? Where's that?

A side-effect of the API for busy_ioctl() is that all caches for the bo
are flushed and any outstanding requests queued in order for the
bo to become unbusy in the near-future. This semantic was required for
GL query objects to eventually complete without futher interaction. It
also solved the problem of having to flush the scanout periodically on
gen4+ (and on gen3 if we enable the optimisation bit) and is so
enshrined into lore.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-03-04 Thread Ville Syrjälä
On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote:
 On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrj...@linux.intel.com wrote:
  From: Ville Syrjälä ville.syrj...@linux.intel.com
  
  Currently all scanout buffers must be uncached because the
  display controller doesn't snoop the LLC. SNB introduced another
  method to guarantee coherency for the display controller. It's
  called the GFDT or graphics data type.
  
  Pages that have the GFDT bit enabled in their PTEs get flushed
  all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
  issued with the synchronize GFDT bit set.
  
  So rather than making all scanout buffers uncached, set the GFDT
  bit in their PTEs, and modify the ring flush functions to enable
  the synchronize GFDT bit.
  
  On HSW the GFDT bit was removed from the PTE, and it's only present in
  surface state, so we can't really set it from the kernel. Also the docs
  state that the hardware isn't actually guaranteed to respect the GFDT
  bit. So it looks like GFDT isn't all that useful on HSW.
  
  So far I've tried this very quickly on an IVB machine, and
  it seems to be working as advertised. No idea if it does any
  good though.
  
  TODO:
  - make sure there are no missing flushes (CPU access doesn't
respect GFDT after all).
  - measure it and see if there's some real benefit
  - maybe we can track whether synchronize GFDT needs to be
issued, and skip it when possible. needs some numbers to
determine if it's worthwile.
  
  Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
 
 Iirc when I've tried this out a while back it regressed a few benchmarks.
 Chrisme suspected cache trahsing, but hard to tell without proper
 instrumentation support. Chris played around with a few tricks to mark
 other giant bos as uncacheable, but he couldn't find any improved
 workloads.

I see. I didn't realize this was tried already. Not that I really
planned to implement this in the first place. I was just studying the
code and figured I'd learn better by trying to change things a bit
;) But if there's interest I can of course try to improve it further.

 In short I think this needs to come with decent amounts of numbers
 attached, like the TODO says ;-)
 
 The other thing was that I didn't manage to get things to work properly,
 leaving some random cachline dirt on the screen. Looking at your code, you
 add the gfdt flush to every ring_flush, whereas I've tried to be clever
 and only flushed after batches rendering to the frontbuffer. So probably a
 bug in my code, or a flush on a given ring doesn't flush out caches for
 one of the other engines.

I had a bug in my first attempt where I forgot the initial clflush. That
left some random crap on the screen until I rendered the next frame w/
GFDT already enabled. Other than that I didn't see any corruption except
when I intentionally left out the flushes. But as stated my code does
too many flushes probably.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-03-03 Thread Daniel Vetter
On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrj...@linux.intel.com wrote:
 From: Ville Syrjälä ville.syrj...@linux.intel.com
 
 Currently all scanout buffers must be uncached because the
 display controller doesn't snoop the LLC. SNB introduced another
 method to guarantee coherency for the display controller. It's
 called the GFDT or graphics data type.
 
 Pages that have the GFDT bit enabled in their PTEs get flushed
 all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
 issued with the synchronize GFDT bit set.
 
 So rather than making all scanout buffers uncached, set the GFDT
 bit in their PTEs, and modify the ring flush functions to enable
 the synchronize GFDT bit.
 
 On HSW the GFDT bit was removed from the PTE, and it's only present in
 surface state, so we can't really set it from the kernel. Also the docs
 state that the hardware isn't actually guaranteed to respect the GFDT
 bit. So it looks like GFDT isn't all that useful on HSW.
 
 So far I've tried this very quickly on an IVB machine, and
 it seems to be working as advertised. No idea if it does any
 good though.
 
 TODO:
 - make sure there are no missing flushes (CPU access doesn't
   respect GFDT after all).
 - measure it and see if there's some real benefit
 - maybe we can track whether synchronize GFDT needs to be
   issued, and skip it when possible. needs some numbers to
   determine if it's worthwile.
 
 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com

Iirc when I've tried this out a while back it regressed a few benchmarks.
Chrisme suspected cache trahsing, but hard to tell without proper
instrumentation support. Chris played around with a few tricks to mark
other giant bos as uncacheable, but he couldn't find any improved
workloads.

In short I think this needs to come with decent amounts of numbers
attached, like the TODO says ;-)

The other thing was that I didn't manage to get things to work properly,
leaving some random cachline dirt on the screen. Looking at your code, you
add the gfdt flush to every ring_flush, whereas I've tried to be clever
and only flushed after batches rendering to the frontbuffer. So probably a
bug in my code, or a flush on a given ring doesn't flush out caches for
one of the other engines.

And finally a bikeshed: I'd vote for a slightly more generice unsigned
long cache_flags instead of bool gfdt.

Cheers, Daniel

 ---
  drivers/gpu/drm/i915/i915_drv.h|  18 +++--
  drivers/gpu/drm/i915/i915_gem.c| 109 
 -
  drivers/gpu/drm/i915/i915_gem_context.c|   3 +-
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +--
  drivers/gpu/drm/i915/i915_gem_gtt.c|  38 ++
  drivers/gpu/drm/i915/i915_reg.h|   2 +
  drivers/gpu/drm/i915/intel_display.c   |   4 +-
  drivers/gpu/drm/i915/intel_overlay.c   |   2 +-
  drivers/gpu/drm/i915/intel_ringbuffer.c|  25 +++
  9 files changed, 167 insertions(+), 44 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index e95337c..537b344 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -408,7 +408,8 @@ struct i915_gtt {
   void (*gtt_insert_entries)(struct drm_device *dev,
  struct sg_table *st,
  unsigned int pg_start,
 -enum i915_cache_level cache_level);
 +enum i915_cache_level cache_level,
 +bool gfdt);
  };
  #define gtt_total_entries(gtt) ((gtt).total  PAGE_SHIFT)
  
 @@ -429,7 +430,8 @@ struct i915_hw_ppgtt {
   void (*insert_entries)(struct i915_hw_ppgtt *ppgtt,
  struct sg_table *st,
  unsigned int pg_start,
 -enum i915_cache_level cache_level);
 +enum i915_cache_level cache_level,
 +bool gfdt);
   void (*cleanup)(struct i915_hw_ppgtt *ppgtt);
  };
  
 @@ -1169,6 +1171,7 @@ struct drm_i915_gem_object {
   unsigned int fenced_gpu_access:1;
  
   unsigned int cache_level:2;
 + unsigned int gfdt:1;
  
   unsigned int has_aliasing_ppgtt_mapping:1;
   unsigned int has_global_gtt_mapping:1;
 @@ -1310,6 +1313,10 @@ struct drm_i915_file_private {
  #define HAS_LLC(dev)(INTEL_INFO(dev)-has_llc)
  #define I915_NEED_GFX_HWS(dev)   (INTEL_INFO(dev)-need_gfx_hws)
  
 +/* Only SNB and IVB have GFDT in PTEs */
 +#define HAS_GFDT(dev)(HAS_LLC(dev)  !IS_HASWELL(dev)  \
 +  (IS_GEN6(dev) || IS_GEN7(dev)))
 +
  #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)-gen = 6)
  #define HAS_ALIASING_PPGTT(dev)  (INTEL_INFO(dev)-gen =6  
 !IS_VALLEYVIEW(dev))
  
 @@ -1639,7 +1646,8 @@ i915_gem_object_set_to_gtt_domain(struct 
 drm_i915_gem_object *obj,
  int __must_check
  i915_gem_object_set_to_cpu_domain(struct 

Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-03-03 Thread Chris Wilson
On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote:
 The other thing was that I didn't manage to get things to work properly,
 leaving some random cachline dirt on the screen. Looking at your code, you
 add the gfdt flush to every ring_flush, whereas I've tried to be clever
 and only flushed after batches rendering to the frontbuffer. So probably a
 bug in my code, or a flush on a given ring doesn't flush out caches for
 one of the other engines.

Right, we have a formalized interface for flushes to scanout rather than
always flushing GFDT.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB

2013-03-01 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

Currently all scanout buffers must be uncached because the
display controller doesn't snoop the LLC. SNB introduced another
method to guarantee coherency for the display controller. It's
called the GFDT or graphics data type.

Pages that have the GFDT bit enabled in their PTEs get flushed
all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
issued with the synchronize GFDT bit set.

So rather than making all scanout buffers uncached, set the GFDT
bit in their PTEs, and modify the ring flush functions to enable
the synchronize GFDT bit.

On HSW the GFDT bit was removed from the PTE, and it's only present in
surface state, so we can't really set it from the kernel. Also the docs
state that the hardware isn't actually guaranteed to respect the GFDT
bit. So it looks like GFDT isn't all that useful on HSW.

So far I've tried this very quickly on an IVB machine, and
it seems to be working as advertised. No idea if it does any
good though.

TODO:
- make sure there are no missing flushes (CPU access doesn't
  respect GFDT after all).
- measure it and see if there's some real benefit
- maybe we can track whether synchronize GFDT needs to be
  issued, and skip it when possible. needs some numbers to
  determine if it's worthwile.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 drivers/gpu/drm/i915/i915_drv.h|  18 +++--
 drivers/gpu/drm/i915/i915_gem.c| 109 -
 drivers/gpu/drm/i915/i915_gem_context.c|   3 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c|  38 ++
 drivers/gpu/drm/i915/i915_reg.h|   2 +
 drivers/gpu/drm/i915/intel_display.c   |   4 +-
 drivers/gpu/drm/i915/intel_overlay.c   |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c|  25 +++
 9 files changed, 167 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e95337c..537b344 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -408,7 +408,8 @@ struct i915_gtt {
void (*gtt_insert_entries)(struct drm_device *dev,
   struct sg_table *st,
   unsigned int pg_start,
-  enum i915_cache_level cache_level);
+  enum i915_cache_level cache_level,
+  bool gfdt);
 };
 #define gtt_total_entries(gtt) ((gtt).total  PAGE_SHIFT)
 
@@ -429,7 +430,8 @@ struct i915_hw_ppgtt {
void (*insert_entries)(struct i915_hw_ppgtt *ppgtt,
   struct sg_table *st,
   unsigned int pg_start,
-  enum i915_cache_level cache_level);
+  enum i915_cache_level cache_level,
+  bool gfdt);
void (*cleanup)(struct i915_hw_ppgtt *ppgtt);
 };
 
@@ -1169,6 +1171,7 @@ struct drm_i915_gem_object {
unsigned int fenced_gpu_access:1;
 
unsigned int cache_level:2;
+   unsigned int gfdt:1;
 
unsigned int has_aliasing_ppgtt_mapping:1;
unsigned int has_global_gtt_mapping:1;
@@ -1310,6 +1313,10 @@ struct drm_i915_file_private {
 #define HAS_LLC(dev)(INTEL_INFO(dev)-has_llc)
 #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)-need_gfx_hws)
 
+/* Only SNB and IVB have GFDT in PTEs */
+#define HAS_GFDT(dev)  (HAS_LLC(dev)  !IS_HASWELL(dev)  \
+(IS_GEN6(dev) || IS_GEN7(dev)))
+
 #define HAS_HW_CONTEXTS(dev)   (INTEL_INFO(dev)-gen = 6)
 #define HAS_ALIASING_PPGTT(dev)(INTEL_INFO(dev)-gen =6  
!IS_VALLEYVIEW(dev))
 
@@ -1639,7 +1646,8 @@ i915_gem_object_set_to_gtt_domain(struct 
drm_i915_gem_object *obj,
 int __must_check
 i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write);
 int __must_check
-i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
+i915_gem_object_pin_to_display_plane(struct drm_device *dev,
+struct drm_i915_gem_object *obj,
 u32 alignment,
 struct intel_ring_buffer *pipelined);
 int i915_gem_attach_phys_object(struct drm_device *dev,
@@ -1681,14 +1689,14 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
*dev, void *data,
 void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev);
 void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt,
struct drm_i915_gem_object *obj,
-   enum i915_cache_level cache_level);
+   enum i915_cache_level cache_level, bool gfdt);
 void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt,
  struct drm_i915_gem_object *obj);
 
 void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 int