Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Wed, Mar 06, 2013 at 04:28:09PM +, Chris Wilson wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com Currently all scanout buffers must be uncached because the display controller doesn't snoop the LLC. SNB introduced another method to guarantee coherency for the display controller. It's called the GFDT or graphics data type. Pages that have the GFDT bit enabled in their PTEs get flushed all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is issued with the synchronize GFDT bit set. So rather than making all scanout buffers uncached, set the GFDT bit in their PTEs, and modify the ring flush functions to enable the synchronize GFDT bit. On HSW the GFDT bit was removed from the PTE, and it's only present in surface state, so we can't really set it from the kernel. Also the docs state that the hardware isn't actually guaranteed to respect the GFDT bit. So it looks like GFDT isn't all that useful on HSW. So far I've tried this very quickly on an IVB machine, and it seems to be working as advertised. No idea if it does any good though. On an i5-2520m (laptop) running gnome-shell at 1366x768: padman 140.78 - 145.98 fps openarena 183.72 - 186.87 fps gtkperf ComboBoxEntry 20.27 - 22.14s gtkperf pixbuf 1.12 - 1.47s x11perf -aa10text 13.40 - 13.20 Mglyphs which are well within the throttling noise. v2 [ickle]: adapt to comply with existing userspace guarantees Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Oops, this one here somehow fell a bit through the cracks. Two bikesheds and one real issue: - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect more fun to come here ;-) - ring-flush has a pile of arguments, I guess we should coalesce invalidate/flush and the new internal into a simple flags array. I don't expect that we'll every switch invalidate/flush back to domain arrays from the current it's just a bool, really usage. Also, the above should be done in separate prep patches imo. The real deal is flushing cpu access, since that probably does not set the gfdt flag. So cpu reads are fine and don't require any flushing, but cpu writes must be clflushed I think. In a way gfdt works as if the gpu is doing write-through caching (safe that we have to manually initiate the write-through with the gfdt flush). But from the pov of cpu access it's the same, and hence requires the same amount of clflushing. Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc for gfdt we still need to keep track of the gfdt_dirty state, but all the cpu side flushing business should be much clearer. Stumbled over this while checking whether sw_finish_ioctl would still do the right thing, and noticed that the clflush is a noop when gfdt is treated like the current CACHE_LLC. Cheers, Daniel PS: Ben Widawsky noticed that on ivb we set a stupid ppgtt pte caching override, which results in pte cachelines being tagged with gfdt. Hsw has an equivalent mode to allow the gpu to write through. I have no idea what's the point of this and we never write ptes from the gpu. So can you please throw the relevant patch on top to disable gfdt for ppgtt ptes in ECOCHK? -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Thu, Apr 04, 2013 at 11:01:02AM +0200, Daniel Vetter wrote: On Wed, Mar 06, 2013 at 04:28:09PM +, Chris Wilson wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com Currently all scanout buffers must be uncached because the display controller doesn't snoop the LLC. SNB introduced another method to guarantee coherency for the display controller. It's called the GFDT or graphics data type. Pages that have the GFDT bit enabled in their PTEs get flushed all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is issued with the synchronize GFDT bit set. So rather than making all scanout buffers uncached, set the GFDT bit in their PTEs, and modify the ring flush functions to enable the synchronize GFDT bit. On HSW the GFDT bit was removed from the PTE, and it's only present in surface state, so we can't really set it from the kernel. Also the docs state that the hardware isn't actually guaranteed to respect the GFDT bit. So it looks like GFDT isn't all that useful on HSW. So far I've tried this very quickly on an IVB machine, and it seems to be working as advertised. No idea if it does any good though. On an i5-2520m (laptop) running gnome-shell at 1366x768: padman140.78 - 145.98 fps openarena 183.72 - 186.87 fps gtkperf ComboBoxEntry 20.27 - 22.14s gtkperf pixbuf 1.12 - 1.47s x11perf -aa10text 13.40 - 13.20 Mglyphs which are well within the throttling noise. v2 [ickle]: adapt to comply with existing userspace guarantees Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Oops, this one here somehow fell a bit through the cracks. Two bikesheds and one real issue: - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect more fun to come here ;-) - ring-flush has a pile of arguments, I guess we should coalesce invalidate/flush and the new internal into a simple flags array. I don't expect that we'll every switch invalidate/flush back to domain arrays from the current it's just a bool, really usage. Also, the above should be done in separate prep patches imo. The real deal is flushing cpu access, since that probably does not set the gfdt flag. So cpu reads are fine and don't require any flushing, but cpu writes must be clflushed I think. In a way gfdt works as if the gpu is doing write-through caching (safe that we have to manually initiate the write-through with the gfdt flush). But from the pov of cpu access it's the same, and hence requires the same amount of clflushing. Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc for gfdt we still need to keep track of the gfdt_dirty state, but all the cpu side flushing business should be much clearer. Stumbled over this while checking whether sw_finish_ioctl would still do the right thing, and noticed that the clflush is a noop when gfdt is treated like the current CACHE_LLC. My original patch had a change to clflushing where pinned objects w/ gfdt were also flushed. I didn't really read v2 well enough to figure out why that got dropped. -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Thu, Apr 04, 2013 at 02:27:12PM +0300, Ville Syrjälä wrote: On Thu, Apr 04, 2013 at 11:01:02AM +0200, Daniel Vetter wrote: On Wed, Mar 06, 2013 at 04:28:09PM +, Chris Wilson wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com Currently all scanout buffers must be uncached because the display controller doesn't snoop the LLC. SNB introduced another method to guarantee coherency for the display controller. It's called the GFDT or graphics data type. Pages that have the GFDT bit enabled in their PTEs get flushed all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is issued with the synchronize GFDT bit set. So rather than making all scanout buffers uncached, set the GFDT bit in their PTEs, and modify the ring flush functions to enable the synchronize GFDT bit. On HSW the GFDT bit was removed from the PTE, and it's only present in surface state, so we can't really set it from the kernel. Also the docs state that the hardware isn't actually guaranteed to respect the GFDT bit. So it looks like GFDT isn't all that useful on HSW. So far I've tried this very quickly on an IVB machine, and it seems to be working as advertised. No idea if it does any good though. On an i5-2520m (laptop) running gnome-shell at 1366x768: padman 140.78 - 145.98 fps openarena 183.72 - 186.87 fps gtkperf ComboBoxEntry 20.27 - 22.14s gtkperf pixbuf 1.12 - 1.47s x11perf -aa10text 13.40 - 13.20 Mglyphs which are well within the throttling noise. v2 [ickle]: adapt to comply with existing userspace guarantees Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Oops, this one here somehow fell a bit through the cracks. Two bikesheds and one real issue: - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect more fun to come here ;-) - ring-flush has a pile of arguments, I guess we should coalesce invalidate/flush and the new internal into a simple flags array. I don't expect that we'll every switch invalidate/flush back to domain arrays from the current it's just a bool, really usage. Also, the above should be done in separate prep patches imo. The real deal is flushing cpu access, since that probably does not set the gfdt flag. So cpu reads are fine and don't require any flushing, but cpu writes must be clflushed I think. In a way gfdt works as if the gpu is doing write-through caching (safe that we have to manually initiate the write-through with the gfdt flush). But from the pov of cpu access it's the same, and hence requires the same amount of clflushing. Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc for gfdt we still need to keep track of the gfdt_dirty state, but all the cpu side flushing business should be much clearer. Stumbled over this while checking whether sw_finish_ioctl would still do the right thing, and noticed that the clflush is a noop when gfdt is treated like the current CACHE_LLC. My original patch had a change to clflushing where pinned objects w/ gfdt were also flushed. I didn't really read v2 well enough to figure out why that got dropped. Because it was the wrong approach. The API where the scanout is to be made coherent with CPU mmaps is sw_finish. The only user of that is early pre-gen5 DDX. So imo the original patch was abusing a generic flag and applying the flushes at the wrong points, and I simply didn't care about badly designed API such as sw_finish. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
From: Ville Syrjälä ville.syrj...@linux.intel.com Currently all scanout buffers must be uncached because the display controller doesn't snoop the LLC. SNB introduced another method to guarantee coherency for the display controller. It's called the GFDT or graphics data type. Pages that have the GFDT bit enabled in their PTEs get flushed all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is issued with the synchronize GFDT bit set. So rather than making all scanout buffers uncached, set the GFDT bit in their PTEs, and modify the ring flush functions to enable the synchronize GFDT bit. On HSW the GFDT bit was removed from the PTE, and it's only present in surface state, so we can't really set it from the kernel. Also the docs state that the hardware isn't actually guaranteed to respect the GFDT bit. So it looks like GFDT isn't all that useful on HSW. So far I've tried this very quickly on an IVB machine, and it seems to be working as advertised. No idea if it does any good though. On an i5-2520m (laptop) running gnome-shell at 1366x768: padman140.78 - 145.98 fps openarena 183.72 - 186.87 fps gtkperf ComboBoxEntry 20.27 - 22.14s gtkperf pixbuf 1.12 - 1.47s x11perf -aa10text 13.40 - 13.20 Mglyphs which are well within the throttling noise. v2 [ickle]: adapt to comply with existing userspace guarantees Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- drivers/gpu/drm/i915/i915_drv.c|4 + drivers/gpu/drm/i915/i915_drv.h| 16 +++- drivers/gpu/drm/i915/i915_gem.c| 111 drivers/gpu/drm/i915/i915_gem_context.c|5 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 12 ++- drivers/gpu/drm/i915/i915_gem_gtt.c| 38 ++ drivers/gpu/drm/i915/i915_reg.h|2 + drivers/gpu/drm/i915/i915_trace.h | 10 ++- drivers/gpu/drm/i915/intel_ringbuffer.c| 79 +--- drivers/gpu/drm/i915/intel_ringbuffer.h|5 +- 10 files changed, 225 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 66d06ac..ff935f1 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -235,6 +235,7 @@ static const struct intel_device_info intel_sandybridge_d_info = { .has_bsd_ring = 1, .has_blt_ring = 1, .has_llc = 1, + .has_gfdt = 1, .has_force_wake = 1, }; @@ -245,6 +246,7 @@ static const struct intel_device_info intel_sandybridge_m_info = { .has_bsd_ring = 1, .has_blt_ring = 1, .has_llc = 1, + .has_gfdt = 1, .has_force_wake = 1, }; @@ -254,6 +256,7 @@ static const struct intel_device_info intel_ivybridge_d_info = { .has_bsd_ring = 1, .has_blt_ring = 1, .has_llc = 1, + .has_gfdt = 1, .has_force_wake = 1, }; @@ -264,6 +267,7 @@ static const struct intel_device_info intel_ivybridge_m_info = { .has_bsd_ring = 1, .has_blt_ring = 1, .has_llc = 1, + .has_gfdt = 1, .has_force_wake = 1, }; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 535bf29..9841dd7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -367,6 +367,7 @@ struct intel_device_info { u8 has_bsd_ring:1; u8 has_blt_ring:1; u8 has_llc:1; + u8 has_gfdt:1; }; enum i915_cache_level { @@ -409,7 +410,8 @@ struct i915_gtt { void (*gtt_insert_entries)(struct drm_device *dev, struct sg_table *st, unsigned int pg_start, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, + bool gfdt); }; #define gtt_total_entries(gtt) ((gtt).total PAGE_SHIFT) @@ -430,7 +432,8 @@ struct i915_hw_ppgtt { void (*insert_entries)(struct i915_hw_ppgtt *ppgtt, struct sg_table *st, unsigned int pg_start, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, + bool gfdt); void (*cleanup)(struct i915_hw_ppgtt *ppgtt); }; @@ -1170,6 +1173,8 @@ struct drm_i915_gem_object { unsigned int fenced_gpu_access:1; unsigned int cache_level:2; + unsigned int gfdt:1; + unsigned int gfdt_dirty:1; unsigned int has_aliasing_ppgtt_mapping:1; unsigned int has_global_gtt_mapping:1; @@ -1328,6 +1333,9 @@ struct drm_i915_file_private { #define HAS_LLC(dev)(INTEL_INFO(dev)-has_llc) #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)-need_gfx_hws) +/* Only SNB and IVB have GFDT in PTEs
Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Sun, Mar 03, 2013 at 05:39:09PM +, Chris Wilson wrote: On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: The other thing was that I didn't manage to get things to work properly, leaving some random cachline dirt on the screen. Looking at your code, you add the gfdt flush to every ring_flush, whereas I've tried to be clever and only flushed after batches rendering to the frontbuffer. So probably a bug in my code, or a flush on a given ring doesn't flush out caches for one of the other engines. Right, we have a formalized interface for flushes to scanout rather than always flushing GFDT. We do? Where's that? -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Mon, Mar 04, 2013 at 03:51:11PM +0200, Ville Syrjälä wrote: On Sun, Mar 03, 2013 at 05:39:09PM +, Chris Wilson wrote: On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: The other thing was that I didn't manage to get things to work properly, leaving some random cachline dirt on the screen. Looking at your code, you add the gfdt flush to every ring_flush, whereas I've tried to be clever and only flushed after batches rendering to the frontbuffer. So probably a bug in my code, or a flush on a given ring doesn't flush out caches for one of the other engines. Right, we have a formalized interface for flushes to scanout rather than always flushing GFDT. We do? Where's that? A side-effect of the API for busy_ioctl() is that all caches for the bo are flushed and any outstanding requests queued in order for the bo to become unbusy in the near-future. This semantic was required for GL query objects to eventually complete without futher interaction. It also solved the problem of having to flush the scanout periodically on gen4+ (and on gen3 if we enable the optimisation bit) and is so enshrined into lore. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com Currently all scanout buffers must be uncached because the display controller doesn't snoop the LLC. SNB introduced another method to guarantee coherency for the display controller. It's called the GFDT or graphics data type. Pages that have the GFDT bit enabled in their PTEs get flushed all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is issued with the synchronize GFDT bit set. So rather than making all scanout buffers uncached, set the GFDT bit in their PTEs, and modify the ring flush functions to enable the synchronize GFDT bit. On HSW the GFDT bit was removed from the PTE, and it's only present in surface state, so we can't really set it from the kernel. Also the docs state that the hardware isn't actually guaranteed to respect the GFDT bit. So it looks like GFDT isn't all that useful on HSW. So far I've tried this very quickly on an IVB machine, and it seems to be working as advertised. No idea if it does any good though. TODO: - make sure there are no missing flushes (CPU access doesn't respect GFDT after all). - measure it and see if there's some real benefit - maybe we can track whether synchronize GFDT needs to be issued, and skip it when possible. needs some numbers to determine if it's worthwile. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com Iirc when I've tried this out a while back it regressed a few benchmarks. Chrisme suspected cache trahsing, but hard to tell without proper instrumentation support. Chris played around with a few tricks to mark other giant bos as uncacheable, but he couldn't find any improved workloads. I see. I didn't realize this was tried already. Not that I really planned to implement this in the first place. I was just studying the code and figured I'd learn better by trying to change things a bit ;) But if there's interest I can of course try to improve it further. In short I think this needs to come with decent amounts of numbers attached, like the TODO says ;-) The other thing was that I didn't manage to get things to work properly, leaving some random cachline dirt on the screen. Looking at your code, you add the gfdt flush to every ring_flush, whereas I've tried to be clever and only flushed after batches rendering to the frontbuffer. So probably a bug in my code, or a flush on a given ring doesn't flush out caches for one of the other engines. I had a bug in my first attempt where I forgot the initial clflush. That left some random crap on the screen until I rendered the next frame w/ GFDT already enabled. Other than that I didn't see any corruption except when I intentionally left out the flushes. But as stated my code does too many flushes probably. -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com Currently all scanout buffers must be uncached because the display controller doesn't snoop the LLC. SNB introduced another method to guarantee coherency for the display controller. It's called the GFDT or graphics data type. Pages that have the GFDT bit enabled in their PTEs get flushed all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is issued with the synchronize GFDT bit set. So rather than making all scanout buffers uncached, set the GFDT bit in their PTEs, and modify the ring flush functions to enable the synchronize GFDT bit. On HSW the GFDT bit was removed from the PTE, and it's only present in surface state, so we can't really set it from the kernel. Also the docs state that the hardware isn't actually guaranteed to respect the GFDT bit. So it looks like GFDT isn't all that useful on HSW. So far I've tried this very quickly on an IVB machine, and it seems to be working as advertised. No idea if it does any good though. TODO: - make sure there are no missing flushes (CPU access doesn't respect GFDT after all). - measure it and see if there's some real benefit - maybe we can track whether synchronize GFDT needs to be issued, and skip it when possible. needs some numbers to determine if it's worthwile. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com Iirc when I've tried this out a while back it regressed a few benchmarks. Chrisme suspected cache trahsing, but hard to tell without proper instrumentation support. Chris played around with a few tricks to mark other giant bos as uncacheable, but he couldn't find any improved workloads. In short I think this needs to come with decent amounts of numbers attached, like the TODO says ;-) The other thing was that I didn't manage to get things to work properly, leaving some random cachline dirt on the screen. Looking at your code, you add the gfdt flush to every ring_flush, whereas I've tried to be clever and only flushed after batches rendering to the frontbuffer. So probably a bug in my code, or a flush on a given ring doesn't flush out caches for one of the other engines. And finally a bikeshed: I'd vote for a slightly more generice unsigned long cache_flags instead of bool gfdt. Cheers, Daniel --- drivers/gpu/drm/i915/i915_drv.h| 18 +++-- drivers/gpu/drm/i915/i915_gem.c| 109 - drivers/gpu/drm/i915/i915_gem_context.c| 3 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +-- drivers/gpu/drm/i915/i915_gem_gtt.c| 38 ++ drivers/gpu/drm/i915/i915_reg.h| 2 + drivers/gpu/drm/i915/intel_display.c | 4 +- drivers/gpu/drm/i915/intel_overlay.c | 2 +- drivers/gpu/drm/i915/intel_ringbuffer.c| 25 +++ 9 files changed, 167 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e95337c..537b344 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -408,7 +408,8 @@ struct i915_gtt { void (*gtt_insert_entries)(struct drm_device *dev, struct sg_table *st, unsigned int pg_start, -enum i915_cache_level cache_level); +enum i915_cache_level cache_level, +bool gfdt); }; #define gtt_total_entries(gtt) ((gtt).total PAGE_SHIFT) @@ -429,7 +430,8 @@ struct i915_hw_ppgtt { void (*insert_entries)(struct i915_hw_ppgtt *ppgtt, struct sg_table *st, unsigned int pg_start, -enum i915_cache_level cache_level); +enum i915_cache_level cache_level, +bool gfdt); void (*cleanup)(struct i915_hw_ppgtt *ppgtt); }; @@ -1169,6 +1171,7 @@ struct drm_i915_gem_object { unsigned int fenced_gpu_access:1; unsigned int cache_level:2; + unsigned int gfdt:1; unsigned int has_aliasing_ppgtt_mapping:1; unsigned int has_global_gtt_mapping:1; @@ -1310,6 +1313,10 @@ struct drm_i915_file_private { #define HAS_LLC(dev)(INTEL_INFO(dev)-has_llc) #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)-need_gfx_hws) +/* Only SNB and IVB have GFDT in PTEs */ +#define HAS_GFDT(dev)(HAS_LLC(dev) !IS_HASWELL(dev) \ + (IS_GEN6(dev) || IS_GEN7(dev))) + #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)-gen = 6) #define HAS_ALIASING_PPGTT(dev) (INTEL_INFO(dev)-gen =6 !IS_VALLEYVIEW(dev)) @@ -1639,7 +1646,8 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, int __must_check i915_gem_object_set_to_cpu_domain(struct
Re: [Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: The other thing was that I didn't manage to get things to work properly, leaving some random cachline dirt on the screen. Looking at your code, you add the gfdt flush to every ring_flush, whereas I've tried to be clever and only flushed after batches rendering to the frontbuffer. So probably a bug in my code, or a flush on a given ring doesn't flush out caches for one of the other engines. Right, we have a formalized interface for flushes to scanout rather than always flushing GFDT. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: GFDT support for SNB/IVB
From: Ville Syrjälä ville.syrj...@linux.intel.com Currently all scanout buffers must be uncached because the display controller doesn't snoop the LLC. SNB introduced another method to guarantee coherency for the display controller. It's called the GFDT or graphics data type. Pages that have the GFDT bit enabled in their PTEs get flushed all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is issued with the synchronize GFDT bit set. So rather than making all scanout buffers uncached, set the GFDT bit in their PTEs, and modify the ring flush functions to enable the synchronize GFDT bit. On HSW the GFDT bit was removed from the PTE, and it's only present in surface state, so we can't really set it from the kernel. Also the docs state that the hardware isn't actually guaranteed to respect the GFDT bit. So it looks like GFDT isn't all that useful on HSW. So far I've tried this very quickly on an IVB machine, and it seems to be working as advertised. No idea if it does any good though. TODO: - make sure there are no missing flushes (CPU access doesn't respect GFDT after all). - measure it and see if there's some real benefit - maybe we can track whether synchronize GFDT needs to be issued, and skip it when possible. needs some numbers to determine if it's worthwile. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h| 18 +++-- drivers/gpu/drm/i915/i915_gem.c| 109 - drivers/gpu/drm/i915/i915_gem_context.c| 3 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +-- drivers/gpu/drm/i915/i915_gem_gtt.c| 38 ++ drivers/gpu/drm/i915/i915_reg.h| 2 + drivers/gpu/drm/i915/intel_display.c | 4 +- drivers/gpu/drm/i915/intel_overlay.c | 2 +- drivers/gpu/drm/i915/intel_ringbuffer.c| 25 +++ 9 files changed, 167 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e95337c..537b344 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -408,7 +408,8 @@ struct i915_gtt { void (*gtt_insert_entries)(struct drm_device *dev, struct sg_table *st, unsigned int pg_start, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, + bool gfdt); }; #define gtt_total_entries(gtt) ((gtt).total PAGE_SHIFT) @@ -429,7 +430,8 @@ struct i915_hw_ppgtt { void (*insert_entries)(struct i915_hw_ppgtt *ppgtt, struct sg_table *st, unsigned int pg_start, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, + bool gfdt); void (*cleanup)(struct i915_hw_ppgtt *ppgtt); }; @@ -1169,6 +1171,7 @@ struct drm_i915_gem_object { unsigned int fenced_gpu_access:1; unsigned int cache_level:2; + unsigned int gfdt:1; unsigned int has_aliasing_ppgtt_mapping:1; unsigned int has_global_gtt_mapping:1; @@ -1310,6 +1313,10 @@ struct drm_i915_file_private { #define HAS_LLC(dev)(INTEL_INFO(dev)-has_llc) #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)-need_gfx_hws) +/* Only SNB and IVB have GFDT in PTEs */ +#define HAS_GFDT(dev) (HAS_LLC(dev) !IS_HASWELL(dev) \ +(IS_GEN6(dev) || IS_GEN7(dev))) + #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)-gen = 6) #define HAS_ALIASING_PPGTT(dev)(INTEL_INFO(dev)-gen =6 !IS_VALLEYVIEW(dev)) @@ -1639,7 +1646,8 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, int __must_check i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write); int __must_check -i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, +i915_gem_object_pin_to_display_plane(struct drm_device *dev, +struct drm_i915_gem_object *obj, u32 alignment, struct intel_ring_buffer *pipelined); int i915_gem_attach_phys_object(struct drm_device *dev, @@ -1681,14 +1689,14 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev); void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt, struct drm_i915_gem_object *obj, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, bool gfdt); void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt, struct drm_i915_gem_object *obj); void i915_gem_restore_gtt_mappings(struct drm_device *dev); int