Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
-Original Message- From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter Sent: Saturday, June 08, 2013 10:22 PM To: Carsten Emde Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development; Bloomfield, Jon; Steven Rostedt; Christoph Mathys; stable Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org wrote: On 04/08/2013 11:54 AM, Daniel Vetter wrote: On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) { + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+. In theory, the write to the fence register +* flushes all memory transactions before, and coupled with the +* mb() placed around the register write we serialise all memory +* operations with respect to the changes in the tiler. Yet, on +* SNB+ we need to take a step further and emit an explicit wbinvd() +* on each processor in order to manually flush all memory +* transactions before updating the fence register. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { I almost wish x86 just gave up and went fully weakly ordered. At least then we'd know we need barriers everywhere, rather than just in mystical places. Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org Queued for -next, thanks for the patch. I am a bit unhappy that the testcase doesn't work too well and can't reliably reproduce this on all affected machines. But I've tried a bit to improve things and it's very fickle. I guess we just have to accept this :( Under real-time conditions when we expect deterministic response to external and internal events at any time within a couple of microseconds, invalidating and flushing the entire cache in a running system is unacceptable. This is introducing latencies of several milliseconds which was clearly shown in RT regression tests on a kernel with this patch applied (https://www.osadl.org/?id=1543#c7602). We therefore have to revert it in the RT patch queue - kind of a workaround of a workaround. Would simply be wonderful, if we could get rid of the hateful wbinvd(). I'm somewhat surprised people have not started to scream earlier about this one ;-) We're trying to figure out whether there's a less costly w/a (we have some
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
-Original Message- From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter Sent: Monday, June 10, 2013 3:38 PM To: Bloomfield, Jon Cc: Carsten Emde; Chris Wilson; Jesse Barnes; Intel Graphics Development; Steven Rostedt; Christoph Mathys; stable Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs On Mon, Jun 10, 2013 at 3:17 PM, Bloomfield, Jon jon.bloomfi...@intel.com wrote: -Original Message- From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter Sent: Saturday, June 08, 2013 10:22 PM To: Carsten Emde Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development; Bloomfield, Jon; Steven Rostedt; Christoph Mathys; stable Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org wrote: On 04/08/2013 11:54 AM, Daniel Vetter wrote: On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) { + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev- dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+. In theory, the write to the fence register +* flushes all memory transactions before, and coupled with the +* mb() placed around the register write we serialise all memory +* operations with respect to the changes in the tiler. Yet, on +* SNB+ we need to take a step further and emit an + explicit wbinvd() +* on each processor in order to manually flush all memory +* transactions before updating the fence register. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : + NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { I almost wish x86 just gave up and went fully weakly ordered. At least then we'd know we need barriers everywhere, rather than just in mystical places. Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org Queued for -next, thanks for the patch. I am a bit unhappy that the testcase doesn't work too well and can't reliably reproduce this on all affected machines. But I've tried a bit to improve things and it's very fickle. I guess we just have to accept this :( Under real-time conditions when we expect deterministic response to external and internal events at any time within
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
On Mon, Jun 10, 2013 at 3:17 PM, Bloomfield, Jon jon.bloomfi...@intel.com wrote: -Original Message- From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter Sent: Saturday, June 08, 2013 10:22 PM To: Carsten Emde Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development; Bloomfield, Jon; Steven Rostedt; Christoph Mathys; stable Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org wrote: On 04/08/2013 11:54 AM, Daniel Vetter wrote: On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) { + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+. In theory, the write to the fence register +* flushes all memory transactions before, and coupled with the +* mb() placed around the register write we serialise all memory +* operations with respect to the changes in the tiler. Yet, on +* SNB+ we need to take a step further and emit an explicit wbinvd() +* on each processor in order to manually flush all memory +* transactions before updating the fence register. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { I almost wish x86 just gave up and went fully weakly ordered. At least then we'd know we need barriers everywhere, rather than just in mystical places. Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org Queued for -next, thanks for the patch. I am a bit unhappy that the testcase doesn't work too well and can't reliably reproduce this on all affected machines. But I've tried a bit to improve things and it's very fickle. I guess we just have to accept this :( Under real-time conditions when we expect deterministic response to external and internal events at any time within a couple of microseconds, invalidating and flushing the entire cache in a running system is unacceptable. This is introducing latencies of several milliseconds which was clearly shown in RT regression tests on a kernel with this patch applied (https://www.osadl.org/?id=1543#c7602). We therefore have to revert it in the RT patch queue - kind of a workaround of a workaround. Would simply be wonderful, if we could get rid of the hateful wbinvd(). I'm somewhat surprised people have not started to scream earlier about this one
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
On Mon, Jun 10, 2013 at 4:47 PM, Bloomfield, Jon jon.bloomfi...@intel.com wrote: This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Oh, the pinning isn't a suitable w/a for upstream nor for general consumption. But I guess for the realtime linux folks it is suitable (at least for now) since they tend to not care too much about gfx throughput anyway. [Aside: The patch is only for the -rt tree for now, so not something that will land upstream.] -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
On 04/08/2013 11:54 AM, Daniel Vetter wrote: On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) +{ + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+. In theory, the write to the fence register +* flushes all memory transactions before, and coupled with the +* mb() placed around the register write we serialise all memory +* operations with respect to the changes in the tiler. Yet, on +* SNB+ we need to take a step further and emit an explicit wbinvd() +* on each processor in order to manually flush all memory +* transactions before updating the fence register. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { I almost wish x86 just gave up and went fully weakly ordered. At least then we'd know we need barriers everywhere, rather than just in mystical places. Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org Queued for -next, thanks for the patch. I am a bit unhappy that the testcase doesn't work too well and can't reliably reproduce this on all affected machines. But I've tried a bit to improve things and it's very fickle. I guess we just have to accept this :( Under real-time conditions when we expect deterministic response to external and internal events at any time within a couple of microseconds, invalidating and flushing the entire cache in a running system is unacceptable. This is introducing latencies of several milliseconds which was clearly shown in RT regression tests on a kernel with this patch applied (https://www.osadl.org/?id=1543#c7602). We therefore have to revert it in the RT patch queue - kind of a workaround of a workaround. Would simply be wonderful, if we could get rid of the hateful wbinvd(). -Carsten. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org wrote: On 04/08/2013 11:54 AM, Daniel Vetter wrote: On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) +{ + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+. In theory, the write to the fence register +* flushes all memory transactions before, and coupled with the +* mb() placed around the register write we serialise all memory +* operations with respect to the changes in the tiler. Yet, on +* SNB+ we need to take a step further and emit an explicit wbinvd() +* on each processor in order to manually flush all memory +* transactions before updating the fence register. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { I almost wish x86 just gave up and went fully weakly ordered. At least then we'd know we need barriers everywhere, rather than just in mystical places. Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org Queued for -next, thanks for the patch. I am a bit unhappy that the testcase doesn't work too well and can't reliably reproduce this on all affected machines. But I've tried a bit to improve things and it's very fickle. I guess we just have to accept this :( Under real-time conditions when we expect deterministic response to external and internal events at any time within a couple of microseconds, invalidating and flushing the entire cache in a running system is unacceptable. This is introducing latencies of several milliseconds which was clearly shown in RT regression tests on a kernel with this patch applied (https://www.osadl.org/?id=1543#c7602). We therefore have to revert it in the RT patch queue - kind of a workaround of a workaround. Would simply be wonderful, if we could get rid of the hateful wbinvd(). I'm somewhat surprised people have not started to scream earlier about this one ;-) We're trying to figure out whether there's a less costly w/a (we have some benchmark where it kills performance, too) but until that's done we pretty much need to stick with it. If you want to avoid any bad side-effects due to that w/a you can alternatively pin all gpu rendering tasks to the same cpu core/thread. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) +{ + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+. In theory, the write to the fence register +* flushes all memory transactions before, and coupled with the +* mb() placed around the register write we serialise all memory +* operations with respect to the changes in the tiler. Yet, on +* SNB+ we need to take a step further and emit an explicit wbinvd() +* on each processor in order to manually flush all memory +* transactions before updating the fence register. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { I almost wish x86 just gave up and went fully weakly ordered. At least then we'd know we need barriers everywhere, rather than just in mystical places. Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org Queued for -next, thanks for the patch. I am a bit unhappy that the testcase doesn't work too well and can't reliably reproduce this on all affected machines. But I've tried a bit to improve things and it's very fickle. I guess we just have to accept this :( -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) +{ + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and + * the update to the fence register we need to take extreme + * measures on SNB+. In theory, the write to the fence register + * flushes all memory transactions before, and coupled with the + * mb() placed around the register write we serialise all memory + * operations with respect to the changes in the tiler. Yet, on + * SNB+ we need to take a step further and emit an explicit wbinvd() + * on each processor in order to manually flush all memory + * transactions before updating the fence register. + */ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { I almost wish x86 just gave up and went fully weakly ordered. At least then we'd know we need barriers everywhere, rather than just in mystical places. Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org -- Jesse Barnes, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and write the fence from each cpu taking care to serialise memory accesses on each. The usual mb(), or even a mb() on each CPU is not enough to ensure that access to the fenced region is coherent across the change in fence register. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk CC: Jon Bloomfield jon.bloomfi...@intel.com Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 42 +++ 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..13e42f6 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,51 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +struct write_fence { + struct drm_device *dev; + struct drm_i915_gem_object *obj; + int fence; +}; + +static void i915_gem_write_fence__ipi(void *data) +{ + struct write_fence *args = data; + + wbinvd(); + + i915_gem_write_fence(args-dev, args-fence, args-obj); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct write_fence args = { + .dev = obj-base.dev, + .fence = fence_number(dev_priv, fence), + .obj = enable ? obj : NULL, + }; + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+, and write the fence from each cpu taking +* care to serialise memory accesses on each. The usual mb(), +* or even a mb() on each CPU is not enough to ensure that access +* to the fenced region is coherent across the change in fence +* register. +* +* As it turns out for IVB, I need slightly heavier bullets and +* need to do a wbinvd() per-processor to serialise memory acceses +* with the fence update. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, args, 1); + else + i915_gem_write_fence__ipi(args); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = args.fence; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { -- 1.7.10.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and write the fence from each cpu taking care to serialise memory accesses on each. The usual mb(), or even a mb() on each CPU is not enough to ensure that access to the fenced region is coherent across the change in fence register. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk CC: Jon Bloomfield jon.bloomfi...@intel.com Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 42 +++ 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..02b3a61 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,51 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +struct write_fence { + struct drm_device *dev; + struct drm_i915_gem_object *obj; + int fence; +}; + +static void i915_gem_write_fence__ipi(void *data) +{ + struct write_fence *args = data; + + wbinvd(); + + i915_gem_write_fence(args-dev, args-fence, args-obj); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct write_fence args = { + .dev = obj-base.dev, + .fence = fence_number(dev_priv, fence), + .obj = enable ? obj : NULL, + }; + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+, and write the fence from each cpu taking +* care to serialise memory accesses on each. The usual mb(), +* or even a mb() on each CPU is not enough to ensure that access +* to the fenced region is coherent across the change in fence +* register. +* +* As it turns out for IVB, I need slightly heavier bullets and +* need to do a wbinvd() per-processor to serialise memory acceses +* with the fence update. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, args, 1); + else + i915_gem_write_fence(args.dev, args.fence, args.obj); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = args.fence; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { -- 1.7.10.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and write the fence from each cpu taking care to serialise memory accesses on each. The usual mb(), or even a mb() on each CPU is not enough to ensure that access to the fenced region is coherent across the change in fence register - however a full blown write-back invalidate (wbinvd) per processor is sufficient. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Jon Bloomfield jon.bloomfi...@intel.com Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2) Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..632a050 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2689,17 +2689,33 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +static void i915_gem_write_fence__ipi(void *data) +{ + wbinvd(); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { - struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct drm_device *dev = obj-base.dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int fence_reg = fence_number(dev_priv, fence); + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+, and write the fence from each cpu taking +* care to serialise memory accesses on each. The usual mb(), +* or even a mb() on each CPU is not enough to ensure that access +* to the fenced region is coherent across the change in fence +* register, but a wbinvd() per processor is sufficient. +*/ + if (HAS_LLC(obj-base.dev)) + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = fence_reg; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { -- 1.7.10.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
On Thu, Mar 21, 2013 at 03:30:19PM +, Chris Wilson wrote: In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and write the fence from each cpu taking care to serialise memory accesses on each. The usual mb(), or even a mb() on each CPU is not enough to ensure that access to the fenced region is coherent across the change in fence register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 34 ++ 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 19fc21b..fe34240 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2684,17 +2684,43 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +struct write_fence { + struct drm_device *dev; + struct drm_i915_gem_object *obj; + int fence; +}; + +static void i915_gem_write_fence__ipi(void *data) +{ + struct write_fence *args = data; + i915_gem_write_fence(args-dev, args-fence, args-obj); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct write_fence args = { + .dev = obj-base.dev, + .fence = fence_number(dev_priv, fence), + .obj = enable ? obj : NULL, + }; + + /* In order to fully serialize access to the fenced region and + * the update to the fence register we need to take extreme + * measures on SNB+, and write the fence from each cpu taking + * care to serialise memory accesses on each. The usual mb(), + * or even a mb() on each CPU is not enough to ensure that access + * to the fenced region is coherent across the change in fence + * register. + */ + if (!HAS_LLC(obj-base.dev) || + on_each_cpu(i915_gem_write_fence__ipi, args, 1) != 0) + i915_gem_write_fence__ipi(args); I think the if condition here is a notch to clever and hides the elefantentöter a bit too well. on_each_cpu always calls the given function unconditionally, even when the ipi function call fails, so if (!HAS_LLC) WARN_ON(on_each_cpu); else i915_gem_write_fence looks clearer to me. -Daniel if (enable) { - obj-fence_reg = reg; + obj-fence_reg = args.fence; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { -- 1.7.10.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and write the fence from each cpu taking care to serialise memory accesses on each. The usual mb(), or even a mb() on each CPU is not enough to ensure that access to the fenced region is coherent across the change in fence register. v2: Bring a bigger gun Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk CC: Jon Bloomfield jon.bloomfi...@intel.com Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 42 +++ 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0e207e6..a92d431 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -32,6 +32,7 @@ #include intel_drv.h #include linux/shmem_fs.h #include linux/slab.h +#include linux/stop_machine.h #include linux/swap.h #include linux/pci.h #include linux/dma-buf.h @@ -2678,17 +2679,50 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +struct write_fence { + struct drm_device *dev; + struct drm_i915_gem_object *obj; + int fence; +}; + +static int i915_gem_write_fence__ipi(void *data) +{ + struct write_fence *args = data; + i915_gem_write_fence(args-dev, args-fence, args-obj); + return 0; +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct write_fence args = { + .dev = obj-base.dev, + .fence = fence_number(dev_priv, fence), + .obj = enable ? obj : NULL, + }; + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+, and write the fence from each cpu taking +* care to serialise memory accesses on each. The usual mb(), +* or even a mb() on each CPU is not enough to ensure that access +* to the fenced region is coherent across the change in fence +* register. +* +* As it turns out for IVB, I need a bigger gun. +*/ + if (HAS_LLC(obj-base.dev)) { + if (INTEL_INFO(obj-base.dev)-gen = 7) + stop_machine(i915_gem_write_fence__ipi, args, cpu_possible_mask); + else + on_each_cpu((void (*)(void *))i915_gem_write_fence__ipi, args, 1); + } else + i915_gem_write_fence__ipi(args); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = args.fence; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { -- 1.7.10.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and write the fence from each cpu taking care to serialise memory accesses on each. The usual mb(), or even a mb() on each CPU is not enough to ensure that access to the fenced region is coherent across the change in fence register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 34 ++ 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 19fc21b..fe34240 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2684,17 +2684,43 @@ static inline int fence_number(struct drm_i915_private *dev_priv, return fence - dev_priv-fence_regs; } +struct write_fence { + struct drm_device *dev; + struct drm_i915_gem_object *obj; + int fence; +}; + +static void i915_gem_write_fence__ipi(void *data) +{ + struct write_fence *args = data; + i915_gem_write_fence(args-dev, args-fence, args-obj); +} + static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj, struct drm_i915_fence_reg *fence, bool enable) { struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - int reg = fence_number(dev_priv, fence); - - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL); + struct write_fence args = { + .dev = obj-base.dev, + .fence = fence_number(dev_priv, fence), + .obj = enable ? obj : NULL, + }; + + /* In order to fully serialize access to the fenced region and +* the update to the fence register we need to take extreme +* measures on SNB+, and write the fence from each cpu taking +* care to serialise memory accesses on each. The usual mb(), +* or even a mb() on each CPU is not enough to ensure that access +* to the fenced region is coherent across the change in fence +* register. +*/ + if (!HAS_LLC(obj-base.dev) || + on_each_cpu(i915_gem_write_fence__ipi, args, 1) != 0) + i915_gem_write_fence__ipi(args); if (enable) { - obj-fence_reg = reg; + obj-fence_reg = args.fence; fence-obj = obj; list_move_tail(fence-lru_list, dev_priv-mm.fence_list); } else { -- 1.7.10.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx