Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-06-10 Thread Bloomfield, Jon
 -Original Message-
 From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
 Daniel Vetter
 Sent: Saturday, June 08, 2013 10:22 PM
 To: Carsten Emde
 Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development; Bloomfield, Jon;
 Steven Rostedt; Christoph Mathys; stable
 Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence
 between fences and LLC across multiple CPUs
 
 On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org wrote:
  On 04/08/2013 11:54 AM, Daniel Vetter wrote:
 
  On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote:
 
  On Thu,  4 Apr 2013 21:31:03 +0100
  Chris Wilson ch...@chris-wilson.co.uk wrote:
 
  In order to fully serialize access to the fenced region and the
  update to the fence register we need to take extreme measures on
  SNB+, and manually flush writes to memory prior to writing the
  fence register in conjunction with the memory barriers placed around
 the register write.
 
  Fixes i-g-t/gem_fence_thrash
 
  v2: Bring a bigger gun
  v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
  v4: Remove changes for working generations.
  v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
  v6: Rewrite comments to ellide forgotten history.
 
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
  Cc: Jon Bloomfield jon.bloomfi...@intel.com
  Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
  Cc: sta...@vger.kernel.org
  ---
drivers/gpu/drm/i915/i915_gem.c |   28
 +++-
1 file changed, 23 insertions(+), 5 deletions(-)
 
  diff --git a/drivers/gpu/drm/i915/i915_gem.c
  b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -2689,17 +2689,35 @@ static inline int fence_number(struct
  drm_i915_private *dev_priv,
  return fence - dev_priv-fence_regs;
}
 
  +static void i915_gem_write_fence__ipi(void *data) {
  +   wbinvd();
  +}
  +
static void i915_gem_object_update_fence(struct
  drm_i915_gem_object *obj,
   struct drm_i915_fence_reg
  *fence,
   bool enable)
{
  -   struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
  -   int reg = fence_number(dev_priv, fence);
  -
  -   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
  +   struct drm_device *dev = obj-base.dev;
  +   struct drm_i915_private *dev_priv = dev-dev_private;
  +   int fence_reg = fence_number(dev_priv, fence);
  +
  +   /* In order to fully serialize access to the fenced region and
  +* the update to the fence register we need to take extreme
  +* measures on SNB+. In theory, the write to the fence register
  +* flushes all memory transactions before, and coupled with the
  +* mb() placed around the register write we serialise all memory
  +* operations with respect to the changes in the tiler. Yet, on
  +* SNB+ we need to take a step further and emit an explicit
  wbinvd()
  +* on each processor in order to manually flush all memory
  +* transactions before updating the fence register.
  +*/
  +   if (HAS_LLC(obj-base.dev))
  +   on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
  +   i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);
 
  if (enable) {
  -   obj-fence_reg = reg;
  +   obj-fence_reg = fence_reg;
  fence-obj = obj;
  list_move_tail(fence-lru_list,
  dev_priv-mm.fence_list);
  } else {
 
 
  I almost wish x86 just gave up and went fully weakly ordered.  At
  least then we'd know we need barriers everywhere, rather than just
  in mystical places.
 
  Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org
 
 
  Queued for -next, thanks for the patch. I am a bit unhappy that the
  testcase doesn't work too well and can't reliably reproduce this on
  all affected machines. But I've tried a bit to improve things and
  it's very fickle. I guess we just have to accept this :(
 
 
  Under real-time conditions when we expect deterministic response to
  external and internal events at any time within a couple of
  microseconds, invalidating and flushing the entire cache in a running
  system is unacceptable. This is introducing latencies of several
  milliseconds which was clearly shown in RT regression tests on a
  kernel with this patch applied (https://www.osadl.org/?id=1543#c7602).
  We therefore have to revert it in the RT patch queue - kind of a
  workaround of a workaround.
 
  Would simply be wonderful, if we could get rid of the hateful wbinvd().
 
 I'm somewhat surprised people have not started to scream earlier about this
 one ;-)
 
 We're trying to figure out whether there's a less costly w/a (we have some

Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-06-10 Thread Bloomfield, Jon
 -Original Message-
 From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
 Daniel Vetter
 Sent: Monday, June 10, 2013 3:38 PM
 To: Bloomfield, Jon
 Cc: Carsten Emde; Chris Wilson; Jesse Barnes; Intel Graphics Development;
 Steven Rostedt; Christoph Mathys; stable
 Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence
 between fences and LLC across multiple CPUs
 
 On Mon, Jun 10, 2013 at 3:17 PM, Bloomfield, Jon
 jon.bloomfi...@intel.com wrote:
  -Original Message-
  From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On
  Behalf Of Daniel Vetter
  Sent: Saturday, June 08, 2013 10:22 PM
  To: Carsten Emde
  Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development;
  Bloomfield, Jon; Steven Rostedt; Christoph Mathys; stable
  Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence
  between fences and LLC across multiple CPUs
 
  On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org
 wrote:
   On 04/08/2013 11:54 AM, Daniel Vetter wrote:
  
   On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote:
  
   On Thu,  4 Apr 2013 21:31:03 +0100 Chris Wilson
   ch...@chris-wilson.co.uk wrote:
  
   In order to fully serialize access to the fenced region and the
   update to the fence register we need to take extreme measures on
   SNB+, and manually flush writes to memory prior to writing the
   fence register in conjunction with the memory barriers placed
   around
  the register write.
  
   Fixes i-g-t/gem_fence_thrash
  
   v2: Bring a bigger gun
   v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
   v4: Remove changes for working generations.
   v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
   v6: Rewrite comments to ellide forgotten history.
  
   Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
   Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
   Cc: Jon Bloomfield jon.bloomfi...@intel.com
   Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
   Cc: sta...@vger.kernel.org
   ---
 drivers/gpu/drm/i915/i915_gem.c |   28
  +++-
 1 file changed, 23 insertions(+), 5 deletions(-)
  
   diff --git a/drivers/gpu/drm/i915/i915_gem.c
   b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644
   --- a/drivers/gpu/drm/i915/i915_gem.c
   +++ b/drivers/gpu/drm/i915/i915_gem.c
   @@ -2689,17 +2689,35 @@ static inline int fence_number(struct
   drm_i915_private *dev_priv,
   return fence - dev_priv-fence_regs;
 }
  
   +static void i915_gem_write_fence__ipi(void *data) {
   +   wbinvd();
   +}
   +
 static void i915_gem_object_update_fence(struct
   drm_i915_gem_object *obj,
struct
   drm_i915_fence_reg *fence,
bool enable)
 {
   -   struct drm_i915_private *dev_priv = obj-base.dev-
 dev_private;
   -   int reg = fence_number(dev_priv, fence);
   -
   -   i915_gem_write_fence(obj-base.dev, reg, enable ? obj :
 NULL);
   +   struct drm_device *dev = obj-base.dev;
   +   struct drm_i915_private *dev_priv = dev-dev_private;
   +   int fence_reg = fence_number(dev_priv, fence);
   +
   +   /* In order to fully serialize access to the fenced region and
   +* the update to the fence register we need to take extreme
   +* measures on SNB+. In theory, the write to the fence 
   register
   +* flushes all memory transactions before, and coupled with 
   the
   +* mb() placed around the register write we serialise all
 memory
   +* operations with respect to the changes in the tiler. Yet, 
   on
   +* SNB+ we need to take a step further and emit an
   + explicit
   wbinvd()
   +* on each processor in order to manually flush all memory
   +* transactions before updating the fence register.
   +*/
   +   if (HAS_LLC(obj-base.dev))
   +   on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
   +   i915_gem_write_fence(dev, fence_reg, enable ? obj :
   + NULL);
  
   if (enable) {
   -   obj-fence_reg = reg;
   +   obj-fence_reg = fence_reg;
   fence-obj = obj;
   list_move_tail(fence-lru_list,
   dev_priv-mm.fence_list);
   } else {
  
  
   I almost wish x86 just gave up and went fully weakly ordered.  At
   least then we'd know we need barriers everywhere, rather than
   just in mystical places.
  
   Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org
  
  
   Queued for -next, thanks for the patch. I am a bit unhappy that
   the testcase doesn't work too well and can't reliably reproduce
   this on all affected machines. But I've tried a bit to improve
   things and it's very fickle. I guess we just have to accept this
   :(
  
  
   Under real-time conditions when we expect deterministic response to
   external and internal events at any time within

Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-06-10 Thread Daniel Vetter
On Mon, Jun 10, 2013 at 3:17 PM, Bloomfield, Jon
jon.bloomfi...@intel.com wrote:
 -Original Message-
 From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
 Daniel Vetter
 Sent: Saturday, June 08, 2013 10:22 PM
 To: Carsten Emde
 Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development; Bloomfield, Jon;
 Steven Rostedt; Christoph Mathys; stable
 Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence
 between fences and LLC across multiple CPUs

 On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org wrote:
  On 04/08/2013 11:54 AM, Daniel Vetter wrote:
 
  On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote:
 
  On Thu,  4 Apr 2013 21:31:03 +0100
  Chris Wilson ch...@chris-wilson.co.uk wrote:
 
  In order to fully serialize access to the fenced region and the
  update to the fence register we need to take extreme measures on
  SNB+, and manually flush writes to memory prior to writing the
  fence register in conjunction with the memory barriers placed around
 the register write.
 
  Fixes i-g-t/gem_fence_thrash
 
  v2: Bring a bigger gun
  v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
  v4: Remove changes for working generations.
  v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
  v6: Rewrite comments to ellide forgotten history.
 
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
  Cc: Jon Bloomfield jon.bloomfi...@intel.com
  Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
  Cc: sta...@vger.kernel.org
  ---
drivers/gpu/drm/i915/i915_gem.c |   28
 +++-
1 file changed, 23 insertions(+), 5 deletions(-)
 
  diff --git a/drivers/gpu/drm/i915/i915_gem.c
  b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -2689,17 +2689,35 @@ static inline int fence_number(struct
  drm_i915_private *dev_priv,
  return fence - dev_priv-fence_regs;
}
 
  +static void i915_gem_write_fence__ipi(void *data) {
  +   wbinvd();
  +}
  +
static void i915_gem_object_update_fence(struct
  drm_i915_gem_object *obj,
   struct drm_i915_fence_reg
  *fence,
   bool enable)
{
  -   struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
  -   int reg = fence_number(dev_priv, fence);
  -
  -   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
  +   struct drm_device *dev = obj-base.dev;
  +   struct drm_i915_private *dev_priv = dev-dev_private;
  +   int fence_reg = fence_number(dev_priv, fence);
  +
  +   /* In order to fully serialize access to the fenced region and
  +* the update to the fence register we need to take extreme
  +* measures on SNB+. In theory, the write to the fence register
  +* flushes all memory transactions before, and coupled with the
  +* mb() placed around the register write we serialise all memory
  +* operations with respect to the changes in the tiler. Yet, on
  +* SNB+ we need to take a step further and emit an explicit
  wbinvd()
  +* on each processor in order to manually flush all memory
  +* transactions before updating the fence register.
  +*/
  +   if (HAS_LLC(obj-base.dev))
  +   on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
  +   i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);
 
  if (enable) {
  -   obj-fence_reg = reg;
  +   obj-fence_reg = fence_reg;
  fence-obj = obj;
  list_move_tail(fence-lru_list,
  dev_priv-mm.fence_list);
  } else {
 
 
  I almost wish x86 just gave up and went fully weakly ordered.  At
  least then we'd know we need barriers everywhere, rather than just
  in mystical places.
 
  Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org
 
 
  Queued for -next, thanks for the patch. I am a bit unhappy that the
  testcase doesn't work too well and can't reliably reproduce this on
  all affected machines. But I've tried a bit to improve things and
  it's very fickle. I guess we just have to accept this :(
 
 
  Under real-time conditions when we expect deterministic response to
  external and internal events at any time within a couple of
  microseconds, invalidating and flushing the entire cache in a running
  system is unacceptable. This is introducing latencies of several
  milliseconds which was clearly shown in RT regression tests on a
  kernel with this patch applied (https://www.osadl.org/?id=1543#c7602).
  We therefore have to revert it in the RT patch queue - kind of a
  workaround of a workaround.
 
  Would simply be wonderful, if we could get rid of the hateful wbinvd().

 I'm somewhat surprised people have not started to scream earlier about this
 one

Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-06-10 Thread Daniel Vetter
On Mon, Jun 10, 2013 at 4:47 PM, Bloomfield, Jon
jon.bloomfi...@intel.com wrote:
 This e-mail and any attachments may contain confidential material for
 the sole use of the intended recipient(s). Any review or distribution
 by others is strictly prohibited. If you are not the intended
 recipient, please contact the sender and delete all copies.

Oh, the pinning isn't a suitable w/a for upstream nor for general
consumption. But I guess for the realtime linux folks it is suitable
(at least for now) since they tend to not care too much about gfx
throughput anyway.

[Aside: The patch is only for the -rt tree for now, so not something
that will land upstream.]
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-06-08 Thread Carsten Emde

On 04/08/2013 11:54 AM, Daniel Vetter wrote:

On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote:

On Thu,  4 Apr 2013 21:31:03 +0100
Chris Wilson ch...@chris-wilson.co.uk wrote:


In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
manually flush writes to memory prior to writing the fence register in
conjunction with the memory barriers placed around the register write.

Fixes i-g-t/gem_fence_thrash

v2: Bring a bigger gun
v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
v4: Remove changes for working generations.
v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
v6: Rewrite comments to ellide forgotten history.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Jon Bloomfield jon.bloomfi...@intel.com
Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
Cc: sta...@vger.kernel.org
---
  drivers/gpu/drm/i915/i915_gem.c |   28 +++-
  1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fa4ea1a..8f7739e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2689,17 +2689,35 @@ static inline int fence_number(struct drm_i915_private 
*dev_priv,
return fence - dev_priv-fence_regs;
  }

+static void i915_gem_write_fence__ipi(void *data)
+{
+   wbinvd();
+}
+
  static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 struct drm_i915_fence_reg *fence,
 bool enable)
  {
-   struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
-   int reg = fence_number(dev_priv, fence);
-
-   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
+   struct drm_device *dev = obj-base.dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   int fence_reg = fence_number(dev_priv, fence);
+
+   /* In order to fully serialize access to the fenced region and
+* the update to the fence register we need to take extreme
+* measures on SNB+. In theory, the write to the fence register
+* flushes all memory transactions before, and coupled with the
+* mb() placed around the register write we serialise all memory
+* operations with respect to the changes in the tiler. Yet, on
+* SNB+ we need to take a step further and emit an explicit wbinvd()
+* on each processor in order to manually flush all memory
+* transactions before updating the fence register.
+*/
+   if (HAS_LLC(obj-base.dev))
+   on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
+   i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);

if (enable) {
-   obj-fence_reg = reg;
+   obj-fence_reg = fence_reg;
fence-obj = obj;
list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
} else {


I almost wish x86 just gave up and went fully weakly ordered.  At least
then we'd know we need barriers everywhere, rather than just in
mystical places.

Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org


Queued for -next, thanks for the patch. I am a bit unhappy that the
testcase doesn't work too well and can't reliably reproduce this on all
affected machines. But I've tried a bit to improve things and it's very
fickle. I guess we just have to accept this :(


Under real-time conditions when we expect deterministic response to
external and internal events at any time within a couple of
microseconds, invalidating and flushing the entire cache in a running
system is unacceptable. This is introducing latencies of several
milliseconds which was clearly shown in RT regression tests on a kernel
with this patch applied (https://www.osadl.org/?id=1543#c7602). We
therefore have to revert it in the RT patch queue - kind of a
workaround of a workaround.

Would simply be wonderful, if we could get rid of the hateful wbinvd().

-Carsten.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-06-08 Thread Daniel Vetter
On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde c.e...@osadl.org wrote:
 On 04/08/2013 11:54 AM, Daniel Vetter wrote:

 On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote:

 On Thu,  4 Apr 2013 21:31:03 +0100
 Chris Wilson ch...@chris-wilson.co.uk wrote:

 In order to fully serialize access to the fenced region and the update
 to the fence register we need to take extreme measures on SNB+, and
 manually flush writes to memory prior to writing the fence register in
 conjunction with the memory barriers placed around the register write.

 Fixes i-g-t/gem_fence_thrash

 v2: Bring a bigger gun
 v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
 v4: Remove changes for working generations.
 v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
 v6: Rewrite comments to ellide forgotten history.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Jon Bloomfield jon.bloomfi...@intel.com
 Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
 Cc: sta...@vger.kernel.org
 ---
   drivers/gpu/drm/i915/i915_gem.c |   28 +++-
   1 file changed, 23 insertions(+), 5 deletions(-)

 diff --git a/drivers/gpu/drm/i915/i915_gem.c
 b/drivers/gpu/drm/i915/i915_gem.c
 index fa4ea1a..8f7739e 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -2689,17 +2689,35 @@ static inline int fence_number(struct
 drm_i915_private *dev_priv,
 return fence - dev_priv-fence_regs;
   }

 +static void i915_gem_write_fence__ipi(void *data)
 +{
 +   wbinvd();
 +}
 +
   static void i915_gem_object_update_fence(struct drm_i915_gem_object
 *obj,
  struct drm_i915_fence_reg
 *fence,
  bool enable)
   {
 -   struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
 -   int reg = fence_number(dev_priv, fence);
 -
 -   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
 +   struct drm_device *dev = obj-base.dev;
 +   struct drm_i915_private *dev_priv = dev-dev_private;
 +   int fence_reg = fence_number(dev_priv, fence);
 +
 +   /* In order to fully serialize access to the fenced region and
 +* the update to the fence register we need to take extreme
 +* measures on SNB+. In theory, the write to the fence register
 +* flushes all memory transactions before, and coupled with the
 +* mb() placed around the register write we serialise all memory
 +* operations with respect to the changes in the tiler. Yet, on
 +* SNB+ we need to take a step further and emit an explicit
 wbinvd()
 +* on each processor in order to manually flush all memory
 +* transactions before updating the fence register.
 +*/
 +   if (HAS_LLC(obj-base.dev))
 +   on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
 +   i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);

 if (enable) {
 -   obj-fence_reg = reg;
 +   obj-fence_reg = fence_reg;
 fence-obj = obj;
 list_move_tail(fence-lru_list,
 dev_priv-mm.fence_list);
 } else {


 I almost wish x86 just gave up and went fully weakly ordered.  At least
 then we'd know we need barriers everywhere, rather than just in
 mystical places.

 Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org


 Queued for -next, thanks for the patch. I am a bit unhappy that the
 testcase doesn't work too well and can't reliably reproduce this on all
 affected machines. But I've tried a bit to improve things and it's very
 fickle. I guess we just have to accept this :(


 Under real-time conditions when we expect deterministic response to
 external and internal events at any time within a couple of
 microseconds, invalidating and flushing the entire cache in a running
 system is unacceptable. This is introducing latencies of several
 milliseconds which was clearly shown in RT regression tests on a kernel
 with this patch applied (https://www.osadl.org/?id=1543#c7602). We
 therefore have to revert it in the RT patch queue - kind of a
 workaround of a workaround.

 Would simply be wonderful, if we could get rid of the hateful wbinvd().

I'm somewhat surprised people have not started to scream earlier about
this one ;-)

We're trying to figure out whether there's a less costly w/a (we have
some benchmark where it kills performance, too) but until that's done
we pretty much need to stick with it. If you want to avoid any bad
side-effects due to that w/a you can alternatively pin all gpu
rendering tasks to the same cpu core/thread.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-04-08 Thread Daniel Vetter
On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote:
 On Thu,  4 Apr 2013 21:31:03 +0100
 Chris Wilson ch...@chris-wilson.co.uk wrote:
 
  In order to fully serialize access to the fenced region and the update
  to the fence register we need to take extreme measures on SNB+, and
  manually flush writes to memory prior to writing the fence register in
  conjunction with the memory barriers placed around the register write.
  
  Fixes i-g-t/gem_fence_thrash
  
  v2: Bring a bigger gun
  v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
  v4: Remove changes for working generations.
  v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
  v6: Rewrite comments to ellide forgotten history.
  
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
  Cc: Jon Bloomfield jon.bloomfi...@intel.com
  Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
  Cc: sta...@vger.kernel.org
  ---
   drivers/gpu/drm/i915/i915_gem.c |   28 +++-
   1 file changed, 23 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/gpu/drm/i915/i915_gem.c 
  b/drivers/gpu/drm/i915/i915_gem.c
  index fa4ea1a..8f7739e 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -2689,17 +2689,35 @@ static inline int fence_number(struct 
  drm_i915_private *dev_priv,
  return fence - dev_priv-fence_regs;
   }
   
  +static void i915_gem_write_fence__ipi(void *data)
  +{
  +   wbinvd();
  +}
  +
   static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
   struct drm_i915_fence_reg *fence,
   bool enable)
   {
  -   struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
  -   int reg = fence_number(dev_priv, fence);
  -
  -   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
  +   struct drm_device *dev = obj-base.dev;
  +   struct drm_i915_private *dev_priv = dev-dev_private;
  +   int fence_reg = fence_number(dev_priv, fence);
  +
  +   /* In order to fully serialize access to the fenced region and
  +* the update to the fence register we need to take extreme
  +* measures on SNB+. In theory, the write to the fence register
  +* flushes all memory transactions before, and coupled with the
  +* mb() placed around the register write we serialise all memory
  +* operations with respect to the changes in the tiler. Yet, on
  +* SNB+ we need to take a step further and emit an explicit wbinvd()
  +* on each processor in order to manually flush all memory
  +* transactions before updating the fence register.
  +*/
  +   if (HAS_LLC(obj-base.dev))
  +   on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
  +   i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);
   
  if (enable) {
  -   obj-fence_reg = reg;
  +   obj-fence_reg = fence_reg;
  fence-obj = obj;
  list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
  } else {
 
 I almost wish x86 just gave up and went fully weakly ordered.  At least
 then we'd know we need barriers everywhere, rather than just in
 mystical places.
 
 Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org

Queued for -next, thanks for the patch. I am a bit unhappy that the
testcase doesn't work too well and can't reliably reproduce this on all
affected machines. But I've tried a bit to improve things and it's very
fickle. I guess we just have to accept this :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-04-05 Thread Jesse Barnes
On Thu,  4 Apr 2013 21:31:03 +0100
Chris Wilson ch...@chris-wilson.co.uk wrote:

 In order to fully serialize access to the fenced region and the update
 to the fence register we need to take extreme measures on SNB+, and
 manually flush writes to memory prior to writing the fence register in
 conjunction with the memory barriers placed around the register write.
 
 Fixes i-g-t/gem_fence_thrash
 
 v2: Bring a bigger gun
 v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
 v4: Remove changes for working generations.
 v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
 v6: Rewrite comments to ellide forgotten history.
 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Jon Bloomfield jon.bloomfi...@intel.com
 Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
 Cc: sta...@vger.kernel.org
 ---
  drivers/gpu/drm/i915/i915_gem.c |   28 +++-
  1 file changed, 23 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index fa4ea1a..8f7739e 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -2689,17 +2689,35 @@ static inline int fence_number(struct 
 drm_i915_private *dev_priv,
   return fence - dev_priv-fence_regs;
  }
  
 +static void i915_gem_write_fence__ipi(void *data)
 +{
 + wbinvd();
 +}
 +
  static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
struct drm_i915_fence_reg *fence,
bool enable)
  {
 - struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
 - int reg = fence_number(dev_priv, fence);
 -
 - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
 + struct drm_device *dev = obj-base.dev;
 + struct drm_i915_private *dev_priv = dev-dev_private;
 + int fence_reg = fence_number(dev_priv, fence);
 +
 + /* In order to fully serialize access to the fenced region and
 +  * the update to the fence register we need to take extreme
 +  * measures on SNB+. In theory, the write to the fence register
 +  * flushes all memory transactions before, and coupled with the
 +  * mb() placed around the register write we serialise all memory
 +  * operations with respect to the changes in the tiler. Yet, on
 +  * SNB+ we need to take a step further and emit an explicit wbinvd()
 +  * on each processor in order to manually flush all memory
 +  * transactions before updating the fence register.
 +  */
 + if (HAS_LLC(obj-base.dev))
 + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
 + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);
  
   if (enable) {
 - obj-fence_reg = reg;
 + obj-fence_reg = fence_reg;
   fence-obj = obj;
   list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
   } else {

I almost wish x86 just gave up and went fully weakly ordered.  At least
then we'd know we need barriers everywhere, rather than just in
mystical places.

Reviewed-by: Jesse Barnes jbar...@virtuousgeek.org

-- 
Jesse Barnes, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-04-04 Thread Chris Wilson
In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
write the fence from each cpu taking care to serialise memory accesses
on each.  The usual mb(), or even a mb() on each CPU is not enough to
ensure that access to the fenced region is coherent across the change in
fence register.

Fixes i-g-t/gem_fence_thrash

v2: Bring a bigger gun
v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
CC: Jon Bloomfield jon.bloomfi...@intel.com
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c |   42 +++
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fa4ea1a..13e42f6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2689,17 +2689,51 @@ static inline int fence_number(struct drm_i915_private 
*dev_priv,
return fence - dev_priv-fence_regs;
 }
 
+struct write_fence {
+   struct drm_device *dev;
+   struct drm_i915_gem_object *obj;
+   int fence;
+};
+
+static void i915_gem_write_fence__ipi(void *data)
+{
+   struct write_fence *args = data;
+
+   wbinvd();
+
+   i915_gem_write_fence(args-dev, args-fence, args-obj);
+}
+
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 struct drm_i915_fence_reg *fence,
 bool enable)
 {
struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
-   int reg = fence_number(dev_priv, fence);
-
-   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
+   struct write_fence args = {
+   .dev = obj-base.dev,
+   .fence = fence_number(dev_priv, fence),
+   .obj = enable ? obj : NULL,
+   };
+
+   /* In order to fully serialize access to the fenced region and
+* the update to the fence register we need to take extreme
+* measures on SNB+, and write the fence from each cpu taking
+* care to serialise memory accesses on each. The usual mb(),
+* or even a mb() on each CPU is not enough to ensure that access
+* to the fenced region is coherent across the change in fence
+* register.
+*
+* As it turns out for IVB, I need slightly heavier bullets and
+* need to do a wbinvd() per-processor to serialise memory acceses
+* with the fence update.
+*/
+   if (HAS_LLC(obj-base.dev))
+   on_each_cpu(i915_gem_write_fence__ipi, args, 1);
+   else
+   i915_gem_write_fence__ipi(args);
 
if (enable) {
-   obj-fence_reg = reg;
+   obj-fence_reg = args.fence;
fence-obj = obj;
list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
} else {
-- 
1.7.10.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-04-04 Thread Chris Wilson
In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
write the fence from each cpu taking care to serialise memory accesses
on each.  The usual mb(), or even a mb() on each CPU is not enough to
ensure that access to the fenced region is coherent across the change in
fence register.

Fixes i-g-t/gem_fence_thrash

v2: Bring a bigger gun
v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
v4: Remove changes for working generations.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
CC: Jon Bloomfield jon.bloomfi...@intel.com
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c |   42 +++
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fa4ea1a..02b3a61 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2689,17 +2689,51 @@ static inline int fence_number(struct drm_i915_private 
*dev_priv,
return fence - dev_priv-fence_regs;
 }
 
+struct write_fence {
+   struct drm_device *dev;
+   struct drm_i915_gem_object *obj;
+   int fence;
+};
+
+static void i915_gem_write_fence__ipi(void *data)
+{
+   struct write_fence *args = data;
+
+   wbinvd();
+
+   i915_gem_write_fence(args-dev, args-fence, args-obj);
+}
+
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 struct drm_i915_fence_reg *fence,
 bool enable)
 {
struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
-   int reg = fence_number(dev_priv, fence);
-
-   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
+   struct write_fence args = {
+   .dev = obj-base.dev,
+   .fence = fence_number(dev_priv, fence),
+   .obj = enable ? obj : NULL,
+   };
+
+   /* In order to fully serialize access to the fenced region and
+* the update to the fence register we need to take extreme
+* measures on SNB+, and write the fence from each cpu taking
+* care to serialise memory accesses on each. The usual mb(),
+* or even a mb() on each CPU is not enough to ensure that access
+* to the fenced region is coherent across the change in fence
+* register.
+*
+* As it turns out for IVB, I need slightly heavier bullets and
+* need to do a wbinvd() per-processor to serialise memory acceses
+* with the fence update.
+*/
+   if (HAS_LLC(obj-base.dev))
+   on_each_cpu(i915_gem_write_fence__ipi, args, 1);
+   else
+   i915_gem_write_fence(args.dev, args.fence, args.obj);
 
if (enable) {
-   obj-fence_reg = reg;
+   obj-fence_reg = args.fence;
fence-obj = obj;
list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
} else {
-- 
1.7.10.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-04-04 Thread Chris Wilson
In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
write the fence from each cpu taking care to serialise memory accesses
on each.  The usual mb(), or even a mb() on each CPU is not enough to
ensure that access to the fenced region is coherent across the change in
fence register - however a full blown write-back invalidate (wbinvd) per
processor is sufficient.

Fixes i-g-t/gem_fence_thrash

v2: Bring a bigger gun
v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
v4: Remove changes for working generations.
v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Jon Bloomfield jon.bloomfi...@intel.com
Tested-by: Jon Bloomfield jon.bloomfi...@intel.com (v2)
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c |   26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fa4ea1a..632a050 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2689,17 +2689,33 @@ static inline int fence_number(struct drm_i915_private 
*dev_priv,
return fence - dev_priv-fence_regs;
 }
 
+static void i915_gem_write_fence__ipi(void *data)
+{
+   wbinvd();
+}
+
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 struct drm_i915_fence_reg *fence,
 bool enable)
 {
-   struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
-   int reg = fence_number(dev_priv, fence);
-
-   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
+   struct drm_device *dev = obj-base.dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   int fence_reg = fence_number(dev_priv, fence);
+
+   /* In order to fully serialize access to the fenced region and
+* the update to the fence register we need to take extreme
+* measures on SNB+, and write the fence from each cpu taking
+* care to serialise memory accesses on each. The usual mb(),
+* or even a mb() on each CPU is not enough to ensure that access
+* to the fenced region is coherent across the change in fence
+* register, but a wbinvd() per processor is sufficient.
+*/
+   if (HAS_LLC(obj-base.dev))
+   on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
+   i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);
 
if (enable) {
-   obj-fence_reg = reg;
+   obj-fence_reg = fence_reg;
fence-obj = obj;
list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
} else {
-- 
1.7.10.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-03-22 Thread Daniel Vetter
On Thu, Mar 21, 2013 at 03:30:19PM +, Chris Wilson wrote:
 In order to fully serialize access to the fenced region and the update
 to the fence register we need to take extreme measures on SNB+, and
 write the fence from each cpu taking care to serialise memory accesses
 on each.  The usual mb(), or even a mb() on each CPU is not enough to
 ensure that access to the fenced region is coherent across the change in
 fence register.
 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: sta...@vger.kernel.org
 ---
  drivers/gpu/drm/i915/i915_gem.c |   34 ++
  1 file changed, 30 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index 19fc21b..fe34240 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -2684,17 +2684,43 @@ static inline int fence_number(struct 
 drm_i915_private *dev_priv,
   return fence - dev_priv-fence_regs;
  }
  
 +struct write_fence {
 + struct drm_device *dev;
 + struct drm_i915_gem_object *obj;
 + int fence;
 +};
 +
 +static void i915_gem_write_fence__ipi(void *data)
 +{
 + struct write_fence *args = data;
 + i915_gem_write_fence(args-dev, args-fence, args-obj);
 +}
 +
  static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
struct drm_i915_fence_reg *fence,
bool enable)
  {
   struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
 - int reg = fence_number(dev_priv, fence);
 -
 - i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
 + struct write_fence args = {
 + .dev = obj-base.dev,
 + .fence = fence_number(dev_priv, fence),
 + .obj = enable ? obj : NULL,
 + };
 +
 + /* In order to fully serialize access to the fenced region and
 +  * the update to the fence register we need to take extreme
 +  * measures on SNB+, and write the fence from each cpu taking
 +  * care to serialise memory accesses on each. The usual mb(),
 +  * or even a mb() on each CPU is not enough to ensure that access
 +  * to the fenced region is coherent across the change in fence
 +  * register.
 +  */
 + if (!HAS_LLC(obj-base.dev) ||
 + on_each_cpu(i915_gem_write_fence__ipi, args, 1) != 0)
 + i915_gem_write_fence__ipi(args);

I think the if condition here is a notch to clever and hides the
elefantentöter a bit too well. on_each_cpu always calls the given function
unconditionally, even when the ipi function call fails, so

if (!HAS_LLC)
WARN_ON(on_each_cpu);
else
i915_gem_write_fence

looks clearer to me.
-Daniel

  
   if (enable) {
 - obj-fence_reg = reg;
 + obj-fence_reg = args.fence;
   fence-obj = obj;
   list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
   } else {
 -- 
 1.7.10.4
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-03-22 Thread Chris Wilson
In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
write the fence from each cpu taking care to serialise memory accesses
on each.  The usual mb(), or even a mb() on each CPU is not enough to
ensure that access to the fenced region is coherent across the change in
fence register.

v2: Bring a bigger gun

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
CC: Jon Bloomfield jon.bloomfi...@intel.com
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c |   42 +++
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0e207e6..a92d431 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -32,6 +32,7 @@
 #include intel_drv.h
 #include linux/shmem_fs.h
 #include linux/slab.h
+#include linux/stop_machine.h
 #include linux/swap.h
 #include linux/pci.h
 #include linux/dma-buf.h
@@ -2678,17 +2679,50 @@ static inline int fence_number(struct drm_i915_private 
*dev_priv,
return fence - dev_priv-fence_regs;
 }
 
+struct write_fence {
+   struct drm_device *dev;
+   struct drm_i915_gem_object *obj;
+   int fence;
+};
+
+static int i915_gem_write_fence__ipi(void *data)
+{
+   struct write_fence *args = data;
+   i915_gem_write_fence(args-dev, args-fence, args-obj);
+   return 0;
+}
+
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 struct drm_i915_fence_reg *fence,
 bool enable)
 {
struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
-   int reg = fence_number(dev_priv, fence);
-
-   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
+   struct write_fence args = {
+   .dev = obj-base.dev,
+   .fence = fence_number(dev_priv, fence),
+   .obj = enable ? obj : NULL,
+   };
+
+   /* In order to fully serialize access to the fenced region and
+* the update to the fence register we need to take extreme
+* measures on SNB+, and write the fence from each cpu taking
+* care to serialise memory accesses on each. The usual mb(),
+* or even a mb() on each CPU is not enough to ensure that access
+* to the fenced region is coherent across the change in fence
+* register.
+*
+* As it turns out for IVB, I need a bigger gun.
+*/
+   if (HAS_LLC(obj-base.dev)) {
+   if (INTEL_INFO(obj-base.dev)-gen = 7)
+   stop_machine(i915_gem_write_fence__ipi, args, 
cpu_possible_mask);
+   else
+   on_each_cpu((void (*)(void 
*))i915_gem_write_fence__ipi, args, 1);
+   } else
+   i915_gem_write_fence__ipi(args);
 
if (enable) {
-   obj-fence_reg = reg;
+   obj-fence_reg = args.fence;
fence-obj = obj;
list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
} else {
-- 
1.7.10.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

2013-03-21 Thread Chris Wilson
In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
write the fence from each cpu taking care to serialise memory accesses
on each.  The usual mb(), or even a mb() on each CPU is not enough to
ensure that access to the fenced region is coherent across the change in
fence register.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c |   34 ++
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 19fc21b..fe34240 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2684,17 +2684,43 @@ static inline int fence_number(struct drm_i915_private 
*dev_priv,
return fence - dev_priv-fence_regs;
 }
 
+struct write_fence {
+   struct drm_device *dev;
+   struct drm_i915_gem_object *obj;
+   int fence;
+};
+
+static void i915_gem_write_fence__ipi(void *data)
+{
+   struct write_fence *args = data;
+   i915_gem_write_fence(args-dev, args-fence, args-obj);
+}
+
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 struct drm_i915_fence_reg *fence,
 bool enable)
 {
struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
-   int reg = fence_number(dev_priv, fence);
-
-   i915_gem_write_fence(obj-base.dev, reg, enable ? obj : NULL);
+   struct write_fence args = {
+   .dev = obj-base.dev,
+   .fence = fence_number(dev_priv, fence),
+   .obj = enable ? obj : NULL,
+   };
+
+   /* In order to fully serialize access to the fenced region and
+* the update to the fence register we need to take extreme
+* measures on SNB+, and write the fence from each cpu taking
+* care to serialise memory accesses on each. The usual mb(),
+* or even a mb() on each CPU is not enough to ensure that access
+* to the fenced region is coherent across the change in fence
+* register.
+*/
+   if (!HAS_LLC(obj-base.dev) ||
+   on_each_cpu(i915_gem_write_fence__ipi, args, 1) != 0)
+   i915_gem_write_fence__ipi(args);
 
if (enable) {
-   obj-fence_reg = reg;
+   obj-fence_reg = args.fence;
fence-obj = obj;
list_move_tail(fence-lru_list, dev_priv-mm.fence_list);
} else {
-- 
1.7.10.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx