Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-26 Thread Chris Wilson
On Mon, Aug 25, 2014 at 10:18:09PM +0200, Daniel Vetter wrote:
 On Wed, Aug 20, 2014 at 04:56:41PM +0100, Chris Wilson wrote:
  On Wed, Aug 20, 2014 at 03:21:55PM +, Mcaulay, Alistair wrote:
   It is not the same. This is a special case when re-initialising the hw. 
   This flag is to allow gem_init_hw() to complete successfully during 
   reset. 
   At any other point during reset, -EAGAIN should be returned.
  
  Indeed. You've missed the point. Look closer at the reset counter and
  reset ordering.
 
 We could try to mark the gpu as reset again before starting the reinit to
 avoid this kludge. But that has the problem that if the ring init fails we
 have a bit a mess in marking the gpu terminally wedged. So I think overall
 not prettier than what we have here ... And if I'm mistaken I guess I can
 put on my idiot hat and merge the fixup ;-)

See other patches during the week. Marking the reset as complete after
performing the reset and before resuming the hardware is not ugly. If
the reset fails, you still have the wedge | in-progress. If the reset
succeeds but resume fails, you then have wedge | !in-progress.

Seriously read the request patch to see how it straightens out ring
access, full stop.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-25 Thread Daniel Vetter
On Wed, Aug 20, 2014 at 04:56:41PM +0100, Chris Wilson wrote:
 On Wed, Aug 20, 2014 at 03:21:55PM +, Mcaulay, Alistair wrote:
  It is not the same. This is a special case when re-initialising the hw. 
  This flag is to allow gem_init_hw() to complete successfully during reset. 
  At any other point during reset, -EAGAIN should be returned.
 
 Indeed. You've missed the point. Look closer at the reset counter and
 reset ordering.

We could try to mark the gpu as reset again before starting the reinit to
avoid this kludge. But that has the problem that if the ring init fails we
have a bit a mess in marking the gpu terminally wedged. So I think overall
not prettier than what we have here ... And if I'm mistaken I guess I can
put on my idiot hat and merge the fixup ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-25 Thread Daniel Vetter
On Thu, Aug 21, 2014 at 12:38:58PM +, Mcaulay, Alistair wrote:
 Hi Daniel,
 
 Is there anything else needing done before this patch can be merged?

Me getting working internet mostly. Patch merged with the comment in
check_wedge a bit extended. Also had to rebase, please double-check that I
didn't fumble it.

Thanks, Daniel

 
 Thanks,
 Alistair.
 
  -Original Message-
  From: Mika Kuoppala [mailto:mika.kuopp...@linux.intel.com]
  Sent: Tuesday, August 19, 2014 1:36 PM
  To: Mcaulay, Alistair; intel-gfx@lists.freedesktop.org
  Subject: RE: [PATCH v3] drm/i915: Rework GPU reset sequence to match
  driver load  thaw
  
  Mcaulay, Alistair alistair.mcau...@intel.com writes:
  
   Hi Mika,
  
   can you please review this patch, and verify it fixes the issues in your
  previous review.
  
   Thanks,
   Alistair.
  
   -Original Message-
   From: Mcaulay, Alistair
   Sent: Friday, August 15, 2014 6:52 PM
   To: intel-gfx@lists.freedesktop.org
   Cc: Mcaulay, Alistair
   Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match
   driver load  thaw
  
   From: McAulay, Alistair alistair.mcau...@intel.com
  
   This patch is to address Daniels concerns over different code during 
   reset:
  
   http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html
  
   The reason for aiming as hard as possible to use the exact same code
   for driver load, gpu reset and runtime pm/system resume is that we've
   simply seen too many bugs due to slight variations and unintended
  omissions.
  
   Tested using igt drv_hangman.
  
   V2: Cleaner way of preventing check_wedge returning -EAGAIN
   V3: Clean the last_context during reset, to ensure do_switch() does
   the MI_SET_CONTEXT. As per review.
   Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
   ---
  
  Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com
  
drivers/gpu/drm/i915/i915_drv.c |  6 +++
drivers/gpu/drm/i915/i915_drv.h |  3 ++
drivers/gpu/drm/i915/i915_gem.c |  4 +-
drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
   +--
  --
drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
6 files changed, 28 insertions(+), 88 deletions(-)
  
   diff --git a/drivers/gpu/drm/i915/i915_drv.c
   b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644
   --- a/drivers/gpu/drm/i915/i915_drv.c
   +++ b/drivers/gpu/drm/i915/i915_drv.c
   @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
!dev_priv-ums.mm_suspended) {
dev_priv-ums.mm_suspended = 0;
  
   +/* Used to prevent gem_check_wedged returning -EAGAIN
   during gpu reset */
   +dev_priv-gpu_error.reload_in_reset = true;
   +
ret = i915_gem_init_hw(dev);
   +
   +dev_priv-gpu_error.reload_in_reset = false;
   +
mutex_unlock(dev-struct_mutex);
if (ret) {
DRM_ERROR(Failed hw init on reset %d\n, ret); 
   diff
  --git
   a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
   index 991b663..116daff 100644
   --- a/drivers/gpu/drm/i915/i915_drv.h
   +++ b/drivers/gpu/drm/i915/i915_drv.h
   @@ -1217,6 +1217,9 @@ struct i915_gpu_error {
  
/* For missed irq/seqno simulation. */
unsigned int test_irq_rings;
   +
   +/* Used to prevent gem_check_wedged returning -EAGAIN during
   gpu reset   */
   +bool reload_in_reset;
};
  
enum modeset_restore {
   diff --git a/drivers/gpu/drm/i915/i915_gem.c
   b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644
   --- a/drivers/gpu/drm/i915/i915_gem.c
   +++ b/drivers/gpu/drm/i915/i915_gem.c
   @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error
   *error,
if (i915_terminally_wedged(error))
return -EIO;
  
   -return -EAGAIN;
   +/* Check if GPU Reset is in progress */
   +if (!error-reload_in_reset)
   +return -EAGAIN;
}
  
return 0;
   diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
   b/drivers/gpu/drm/i915/i915_gem_context.c
   index de72a28..9378ad8 100644
   --- a/drivers/gpu/drm/i915/i915_gem_context.c
   +++ b/drivers/gpu/drm/i915/i915_gem_context.c
   @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct
  drm_device
   *dev)
struct drm_i915_private *dev_priv = dev-dev_private;
int i;
  
   -/* Prevent the hardware from restoring the last context (which
   hung) on
   - * the next switch */
for (i = 0; i  I915_NUM_RINGS; i++) {
struct intel_engine_cs *ring = dev_priv-ring[i];
   -struct intel_context *dctx = ring-default_context;
struct intel_context *lctx = 

Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-21 Thread Mcaulay, Alistair
Hi Daniel,

Is there anything else needing done before this patch can be merged?

Thanks,
Alistair.

 -Original Message-
 From: Mika Kuoppala [mailto:mika.kuopp...@linux.intel.com]
 Sent: Tuesday, August 19, 2014 1:36 PM
 To: Mcaulay, Alistair; intel-gfx@lists.freedesktop.org
 Subject: RE: [PATCH v3] drm/i915: Rework GPU reset sequence to match
 driver load  thaw
 
 Mcaulay, Alistair alistair.mcau...@intel.com writes:
 
  Hi Mika,
 
  can you please review this patch, and verify it fixes the issues in your
 previous review.
 
  Thanks,
  Alistair.
 
  -Original Message-
  From: Mcaulay, Alistair
  Sent: Friday, August 15, 2014 6:52 PM
  To: intel-gfx@lists.freedesktop.org
  Cc: Mcaulay, Alistair
  Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match
  driver load  thaw
 
  From: McAulay, Alistair alistair.mcau...@intel.com
 
  This patch is to address Daniels concerns over different code during reset:
 
  http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html
 
  The reason for aiming as hard as possible to use the exact same code
  for driver load, gpu reset and runtime pm/system resume is that we've
  simply seen too many bugs due to slight variations and unintended
 omissions.
 
  Tested using igt drv_hangman.
 
  V2: Cleaner way of preventing check_wedge returning -EAGAIN
  V3: Clean the last_context during reset, to ensure do_switch() does
  the MI_SET_CONTEXT. As per review.
  Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
  ---
 
 Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com
 
   drivers/gpu/drm/i915/i915_drv.c |  6 +++
   drivers/gpu/drm/i915/i915_drv.h |  3 ++
   drivers/gpu/drm/i915/i915_gem.c |  4 +-
   drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
   drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
  +--
 --
   drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
   6 files changed, 28 insertions(+), 88 deletions(-)
 
  diff --git a/drivers/gpu/drm/i915/i915_drv.c
  b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644
  --- a/drivers/gpu/drm/i915/i915_drv.c
  +++ b/drivers/gpu/drm/i915/i915_drv.c
  @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
 !dev_priv-ums.mm_suspended) {
 dev_priv-ums.mm_suspended = 0;
 
  +  /* Used to prevent gem_check_wedged returning -EAGAIN
  during gpu reset */
  +  dev_priv-gpu_error.reload_in_reset = true;
  +
 ret = i915_gem_init_hw(dev);
  +
  +  dev_priv-gpu_error.reload_in_reset = false;
  +
 mutex_unlock(dev-struct_mutex);
 if (ret) {
 DRM_ERROR(Failed hw init on reset %d\n, ret); diff
 --git
  a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
  index 991b663..116daff 100644
  --- a/drivers/gpu/drm/i915/i915_drv.h
  +++ b/drivers/gpu/drm/i915/i915_drv.h
  @@ -1217,6 +1217,9 @@ struct i915_gpu_error {
 
 /* For missed irq/seqno simulation. */
 unsigned int test_irq_rings;
  +
  +  /* Used to prevent gem_check_wedged returning -EAGAIN during
  gpu reset   */
  +  bool reload_in_reset;
   };
 
   enum modeset_restore {
  diff --git a/drivers/gpu/drm/i915/i915_gem.c
  b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error
  *error,
 if (i915_terminally_wedged(error))
 return -EIO;
 
  -  return -EAGAIN;
  +  /* Check if GPU Reset is in progress */
  +  if (!error-reload_in_reset)
  +  return -EAGAIN;
 }
 
 return 0;
  diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
  b/drivers/gpu/drm/i915/i915_gem_context.c
  index de72a28..9378ad8 100644
  --- a/drivers/gpu/drm/i915/i915_gem_context.c
  +++ b/drivers/gpu/drm/i915/i915_gem_context.c
  @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct
 drm_device
  *dev)
 struct drm_i915_private *dev_priv = dev-dev_private;
 int i;
 
  -  /* Prevent the hardware from restoring the last context (which
  hung) on
  -   * the next switch */
 for (i = 0; i  I915_NUM_RINGS; i++) {
 struct intel_engine_cs *ring = dev_priv-ring[i];
  -  struct intel_context *dctx = ring-default_context;
 struct intel_context *lctx = ring-last_context;
 
  -  /* Do a fake switch to the default context */
  -  if (lctx == dctx)
  -  continue;
  -
  -  if (!lctx)
  -  continue;
  +  if (lctx) {
  +  if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
  +  i915_gem_object_ggtt_unpin(lctx-
  legacy_hw_ctx.rcs_state);
 
  -  if (dctx-legacy_hw_ctx.rcs_state  i == RCS) {
  -  WARN_ON(i915_gem_obj_ggtt_pin(dctx-
  legacy_hw_ctx.rcs_state,
  -
  get_context_alignment(dev), 0));
  

Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-20 Thread Daniel, Thomas


 -Original Message-
 From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf
 Of alistair.mcau...@intel.com
 Sent: Friday, August 15, 2014 6:52 PM
 To: intel-gfx@lists.freedesktop.org
 Subject: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to
 match driver load  thaw
 
 From: McAulay, Alistair alistair.mcau...@intel.com
 
 This patch is to address Daniels concerns over different code during reset:
 
 http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html
 
 The reason for aiming as hard as possible to use the exact same code for
 driver load, gpu reset and runtime pm/system resume is that we've simply
 seen too many bugs due to slight variations and unintended omissions.
 
 Tested using igt drv_hangman.
 
 V2: Cleaner way of preventing check_wedge returning -EAGAIN
 V3: Clean the last_context during reset, to ensure do_switch() does the
 MI_SET_CONTEXT. As per review.
 Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
 ---
  drivers/gpu/drm/i915/i915_drv.c |  6 +++
  drivers/gpu/drm/i915/i915_drv.h |  3 ++
  drivers/gpu/drm/i915/i915_gem.c |  4 +-
  drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
  drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
 +
  drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
  6 files changed, 28 insertions(+), 88 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.c
 b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644
 --- a/drivers/gpu/drm/i915/i915_drv.c
 +++ b/drivers/gpu/drm/i915/i915_drv.c
 @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
   !dev_priv-ums.mm_suspended) {
   dev_priv-ums.mm_suspended = 0;
 
 + /* Used to prevent gem_check_wedged returning -EAGAIN
 during gpu reset */
 + dev_priv-gpu_error.reload_in_reset = true;
 +
   ret = i915_gem_init_hw(dev);
 +
 + dev_priv-gpu_error.reload_in_reset = false;
 +
   mutex_unlock(dev-struct_mutex);
   if (ret) {
   DRM_ERROR(Failed hw init on reset %d\n, ret); diff
 --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index 991b663..116daff 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -1217,6 +1217,9 @@ struct i915_gpu_error {
 
   /* For missed irq/seqno simulation. */
   unsigned int test_irq_rings;
 +
 + /* Used to prevent gem_check_wedged returning -EAGAIN during
 gpu reset   */
 + bool reload_in_reset;
  };
 
  enum modeset_restore {
 diff --git a/drivers/gpu/drm/i915/i915_gem.c
 b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error
 *error,
   if (i915_terminally_wedged(error))
   return -EIO;
 
 - return -EAGAIN;
 + /* Check if GPU Reset is in progress */
 + if (!error-reload_in_reset)
 + return -EAGAIN;
   }
 
   return 0;
 diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
 b/drivers/gpu/drm/i915/i915_gem_context.c
 index de72a28..9378ad8 100644
 --- a/drivers/gpu/drm/i915/i915_gem_context.c
 +++ b/drivers/gpu/drm/i915/i915_gem_context.c
 @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device
 *dev)
   struct drm_i915_private *dev_priv = dev-dev_private;
   int i;
 
 - /* Prevent the hardware from restoring the last context (which
 hung) on
 -  * the next switch */
   for (i = 0; i  I915_NUM_RINGS; i++) {
   struct intel_engine_cs *ring = dev_priv-ring[i];
 - struct intel_context *dctx = ring-default_context;
   struct intel_context *lctx = ring-last_context;
 
 - /* Do a fake switch to the default context */
 - if (lctx == dctx)
 - continue;
 -
 - if (!lctx)
 - continue;
 + if (lctx) {
 + if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
 + i915_gem_object_ggtt_unpin(lctx-
 legacy_hw_ctx.rcs_state);
 
 - if (dctx-legacy_hw_ctx.rcs_state  i == RCS) {
 - WARN_ON(i915_gem_obj_ggtt_pin(dctx-
 legacy_hw_ctx.rcs_state,
 -
 get_context_alignment(dev), 0));
 - /* Fake a finish/inactive */
 - dctx-legacy_hw_ctx.rcs_state-base.write_domain
 = 0;
 - dctx-legacy_hw_ctx.rcs_state-active = 0;
 + i915_gem_context_unreference(lctx);
 + ring-last_context = NULL;
   }
 -
 - if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
 - i915_gem_object_ggtt_unpin(lctx-
 legacy_hw_ctx.rcs_state);
 -
 - i915_gem_context_unreference(lctx

Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-20 Thread Chris Wilson
On Wed, Aug 20, 2014 at 02:46:37PM +, Daniel, Thomas wrote:
 
 
  -Original Message-
  From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf
  Of alistair.mcau...@intel.com
  Sent: Friday, August 15, 2014 6:52 PM
  To: intel-gfx@lists.freedesktop.org
  Subject: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to
  match driver load  thaw
  
  From: McAulay, Alistair alistair.mcau...@intel.com
  
  This patch is to address Daniels concerns over different code during reset:
  
  http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html
  
  The reason for aiming as hard as possible to use the exact same code for
  driver load, gpu reset and runtime pm/system resume is that we've simply
  seen too many bugs due to slight variations and unintended omissions.
  
  Tested using igt drv_hangman.
  
  V2: Cleaner way of preventing check_wedge returning -EAGAIN
  V3: Clean the last_context during reset, to ensure do_switch() does the
  MI_SET_CONTEXT. As per review.
  Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
  ---
   drivers/gpu/drm/i915/i915_drv.c |  6 +++
   drivers/gpu/drm/i915/i915_drv.h |  3 ++
   drivers/gpu/drm/i915/i915_gem.c |  4 +-
   drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
   drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
  +
   drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
   6 files changed, 28 insertions(+), 88 deletions(-)
  
  diff --git a/drivers/gpu/drm/i915/i915_drv.c
  b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644
  --- a/drivers/gpu/drm/i915/i915_drv.c
  +++ b/drivers/gpu/drm/i915/i915_drv.c
  @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
  !dev_priv-ums.mm_suspended) {
  dev_priv-ums.mm_suspended = 0;
  
  +   /* Used to prevent gem_check_wedged returning -EAGAIN
  during gpu reset */
  +   dev_priv-gpu_error.reload_in_reset = true;
  +
  ret = i915_gem_init_hw(dev);
  +
  +   dev_priv-gpu_error.reload_in_reset = false;
  +
  mutex_unlock(dev-struct_mutex);
  if (ret) {
  DRM_ERROR(Failed hw init on reset %d\n, ret); diff
  --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
  index 991b663..116daff 100644
  --- a/drivers/gpu/drm/i915/i915_drv.h
  +++ b/drivers/gpu/drm/i915/i915_drv.h
  @@ -1217,6 +1217,9 @@ struct i915_gpu_error {
  
  /* For missed irq/seqno simulation. */
  unsigned int test_irq_rings;
  +
  +   /* Used to prevent gem_check_wedged returning -EAGAIN during
  gpu reset   */
  +   bool reload_in_reset;
   };
  
   enum modeset_restore {
  diff --git a/drivers/gpu/drm/i915/i915_gem.c
  b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error
  *error,
  if (i915_terminally_wedged(error))
  return -EIO;
  
  -   return -EAGAIN;
  +   /* Check if GPU Reset is in progress */
  +   if (!error-reload_in_reset)
  +   return -EAGAIN;

This is silly. You already have the same flag above. Look closer.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-20 Thread Mcaulay, Alistair


 -Original Message-
 From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
 Sent: Wednesday, August 20, 2014 3:58 PM
 To: Daniel, Thomas
 Cc: Mcaulay, Alistair; intel-gfx@lists.freedesktop.org
 Subject: Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to
 match driver load  thaw
 
 On Wed, Aug 20, 2014 at 02:46:37PM +, Daniel, Thomas wrote:
 
 
   -Original Message-
   From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On
   Behalf Of alistair.mcau...@intel.com
   Sent: Friday, August 15, 2014 6:52 PM
   To: intel-gfx@lists.freedesktop.org
   Subject: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence
   to match driver load  thaw
  
   From: McAulay, Alistair alistair.mcau...@intel.com
  
   This patch is to address Daniels concerns over different code during 
   reset:
  
   http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.htm
   l
  
   The reason for aiming as hard as possible to use the exact same
   code for driver load, gpu reset and runtime pm/system resume is that
   we've simply seen too many bugs due to slight variations and unintended
 omissions.
  
   Tested using igt drv_hangman.
  
   V2: Cleaner way of preventing check_wedge returning -EAGAIN
   V3: Clean the last_context during reset, to ensure do_switch() does
   the MI_SET_CONTEXT. As per review.
   Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
   ---
drivers/gpu/drm/i915/i915_drv.c |  6 +++
drivers/gpu/drm/i915/i915_drv.h |  3 ++
drivers/gpu/drm/i915/i915_gem.c |  4 +-
drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
   +--
 --
drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
6 files changed, 28 insertions(+), 88 deletions(-)
  
   diff --git a/drivers/gpu/drm/i915/i915_drv.c
   b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644
   --- a/drivers/gpu/drm/i915/i915_drv.c
   +++ b/drivers/gpu/drm/i915/i915_drv.c
   @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
 !dev_priv-ums.mm_suspended) {
 dev_priv-ums.mm_suspended = 0;
  
   + /* Used to prevent gem_check_wedged returning -EAGAIN
   during gpu reset */
   + dev_priv-gpu_error.reload_in_reset = true;
   +
 ret = i915_gem_init_hw(dev);
   +
   + dev_priv-gpu_error.reload_in_reset = false;
   +
 mutex_unlock(dev-struct_mutex);
 if (ret) {
 DRM_ERROR(Failed hw init on reset %d\n, ret); diff
 --git
   a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
   index 991b663..116daff 100644
   --- a/drivers/gpu/drm/i915/i915_drv.h
   +++ b/drivers/gpu/drm/i915/i915_drv.h
   @@ -1217,6 +1217,9 @@ struct i915_gpu_error {
  
 /* For missed irq/seqno simulation. */
 unsigned int test_irq_rings;
   +
   + /* Used to prevent gem_check_wedged returning -EAGAIN during
   gpu reset   */
   + bool reload_in_reset;
};
  
enum modeset_restore {
   diff --git a/drivers/gpu/drm/i915/i915_gem.c
   b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644
   --- a/drivers/gpu/drm/i915/i915_gem.c
   +++ b/drivers/gpu/drm/i915/i915_gem.c
   @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error
   *error,
 if (i915_terminally_wedged(error))
 return -EIO;
  
   - return -EAGAIN;
   + /* Check if GPU Reset is in progress */
   + if (!error-reload_in_reset)
   + return -EAGAIN;
 
 This is silly. You already have the same flag above. Look closer.
 -Chris
 
 --
 Chris Wilson, Intel Open Source Technology Centre

It is not the same. This is a special case when re-initialising the hw. This 
flag is to allow gem_init_hw() to complete successfully during reset. 
At any other point during reset, -EAGAIN should be returned.

Alistair.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-20 Thread Chris Wilson
On Wed, Aug 20, 2014 at 03:21:55PM +, Mcaulay, Alistair wrote:
 It is not the same. This is a special case when re-initialising the hw. This 
 flag is to allow gem_init_hw() to complete successfully during reset. 
 At any other point during reset, -EAGAIN should be returned.

Indeed. You've missed the point. Look closer at the reset counter and
reset ordering.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-19 Thread Mcaulay, Alistair
Hi Mika,

can you please review this patch, and verify it fixes the issues in your 
previous review.

Thanks,
Alistair.

 -Original Message-
 From: Mcaulay, Alistair
 Sent: Friday, August 15, 2014 6:52 PM
 To: intel-gfx@lists.freedesktop.org
 Cc: Mcaulay, Alistair
 Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver
 load  thaw
 
 From: McAulay, Alistair alistair.mcau...@intel.com
 
 This patch is to address Daniels concerns over different code during reset:
 
 http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html
 
 The reason for aiming as hard as possible to use the exact same code for
 driver load, gpu reset and runtime pm/system resume is that we've simply
 seen too many bugs due to slight variations and unintended omissions.
 
 Tested using igt drv_hangman.
 
 V2: Cleaner way of preventing check_wedge returning -EAGAIN
 V3: Clean the last_context during reset, to ensure do_switch() does the
 MI_SET_CONTEXT. As per review.
 Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
 ---
  drivers/gpu/drm/i915/i915_drv.c |  6 +++
  drivers/gpu/drm/i915/i915_drv.h |  3 ++
  drivers/gpu/drm/i915/i915_gem.c |  4 +-
  drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
  drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
 +
  drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
  6 files changed, 28 insertions(+), 88 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.c
 b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644
 --- a/drivers/gpu/drm/i915/i915_drv.c
 +++ b/drivers/gpu/drm/i915/i915_drv.c
 @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
   !dev_priv-ums.mm_suspended) {
   dev_priv-ums.mm_suspended = 0;
 
 + /* Used to prevent gem_check_wedged returning -EAGAIN
 during gpu reset */
 + dev_priv-gpu_error.reload_in_reset = true;
 +
   ret = i915_gem_init_hw(dev);
 +
 + dev_priv-gpu_error.reload_in_reset = false;
 +
   mutex_unlock(dev-struct_mutex);
   if (ret) {
   DRM_ERROR(Failed hw init on reset %d\n, ret); diff
 --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index 991b663..116daff 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -1217,6 +1217,9 @@ struct i915_gpu_error {
 
   /* For missed irq/seqno simulation. */
   unsigned int test_irq_rings;
 +
 + /* Used to prevent gem_check_wedged returning -EAGAIN during
 gpu reset   */
 + bool reload_in_reset;
  };
 
  enum modeset_restore {
 diff --git a/drivers/gpu/drm/i915/i915_gem.c
 b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error
 *error,
   if (i915_terminally_wedged(error))
   return -EIO;
 
 - return -EAGAIN;
 + /* Check if GPU Reset is in progress */
 + if (!error-reload_in_reset)
 + return -EAGAIN;
   }
 
   return 0;
 diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
 b/drivers/gpu/drm/i915/i915_gem_context.c
 index de72a28..9378ad8 100644
 --- a/drivers/gpu/drm/i915/i915_gem_context.c
 +++ b/drivers/gpu/drm/i915/i915_gem_context.c
 @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device
 *dev)
   struct drm_i915_private *dev_priv = dev-dev_private;
   int i;
 
 - /* Prevent the hardware from restoring the last context (which
 hung) on
 -  * the next switch */
   for (i = 0; i  I915_NUM_RINGS; i++) {
   struct intel_engine_cs *ring = dev_priv-ring[i];
 - struct intel_context *dctx = ring-default_context;
   struct intel_context *lctx = ring-last_context;
 
 - /* Do a fake switch to the default context */
 - if (lctx == dctx)
 - continue;
 -
 - if (!lctx)
 - continue;
 + if (lctx) {
 + if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
 + i915_gem_object_ggtt_unpin(lctx-
 legacy_hw_ctx.rcs_state);
 
 - if (dctx-legacy_hw_ctx.rcs_state  i == RCS) {
 - WARN_ON(i915_gem_obj_ggtt_pin(dctx-
 legacy_hw_ctx.rcs_state,
 -
 get_context_alignment(dev), 0));
 - /* Fake a finish/inactive */
 - dctx-legacy_hw_ctx.rcs_state-base.write_domain
 = 0;
 - dctx-legacy_hw_ctx.rcs_state-active = 0;
 + i915_gem_context_unreference(lctx);
 + ring-last_context = NULL;
   }
 -
 - if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
 - i915_gem_object_ggtt_unpin(lctx-
 legacy_hw_ctx.rcs_state);
 -
 - 

Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-19 Thread Mika Kuoppala
Mcaulay, Alistair alistair.mcau...@intel.com writes:

 Hi Mika,

 can you please review this patch, and verify it fixes the issues in your 
 previous review.

 Thanks,
 Alistair.

 -Original Message-
 From: Mcaulay, Alistair
 Sent: Friday, August 15, 2014 6:52 PM
 To: intel-gfx@lists.freedesktop.org
 Cc: Mcaulay, Alistair
 Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver
 load  thaw
 
 From: McAulay, Alistair alistair.mcau...@intel.com
 
 This patch is to address Daniels concerns over different code during reset:
 
 http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html
 
 The reason for aiming as hard as possible to use the exact same code for
 driver load, gpu reset and runtime pm/system resume is that we've simply
 seen too many bugs due to slight variations and unintended omissions.
 
 Tested using igt drv_hangman.
 
 V2: Cleaner way of preventing check_wedge returning -EAGAIN
 V3: Clean the last_context during reset, to ensure do_switch() does the
 MI_SET_CONTEXT. As per review.
 Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
 ---

Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com

  drivers/gpu/drm/i915/i915_drv.c |  6 +++
  drivers/gpu/drm/i915/i915_drv.h |  3 ++
  drivers/gpu/drm/i915/i915_gem.c |  4 +-
  drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
  drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
 +
  drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
  6 files changed, 28 insertions(+), 88 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.c
 b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644
 --- a/drivers/gpu/drm/i915/i915_drv.c
 +++ b/drivers/gpu/drm/i915/i915_drv.c
 @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
  !dev_priv-ums.mm_suspended) {
  dev_priv-ums.mm_suspended = 0;
 
 +/* Used to prevent gem_check_wedged returning -EAGAIN
 during gpu reset */
 +dev_priv-gpu_error.reload_in_reset = true;
 +
  ret = i915_gem_init_hw(dev);
 +
 +dev_priv-gpu_error.reload_in_reset = false;
 +
  mutex_unlock(dev-struct_mutex);
  if (ret) {
  DRM_ERROR(Failed hw init on reset %d\n, ret); diff
 --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index 991b663..116daff 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -1217,6 +1217,9 @@ struct i915_gpu_error {
 
  /* For missed irq/seqno simulation. */
  unsigned int test_irq_rings;
 +
 +/* Used to prevent gem_check_wedged returning -EAGAIN during
 gpu reset   */
 +bool reload_in_reset;
  };
 
  enum modeset_restore {
 diff --git a/drivers/gpu/drm/i915/i915_gem.c
 b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error
 *error,
  if (i915_terminally_wedged(error))
  return -EIO;
 
 -return -EAGAIN;
 +/* Check if GPU Reset is in progress */
 +if (!error-reload_in_reset)
 +return -EAGAIN;
  }
 
  return 0;
 diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
 b/drivers/gpu/drm/i915/i915_gem_context.c
 index de72a28..9378ad8 100644
 --- a/drivers/gpu/drm/i915/i915_gem_context.c
 +++ b/drivers/gpu/drm/i915/i915_gem_context.c
 @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device
 *dev)
  struct drm_i915_private *dev_priv = dev-dev_private;
  int i;
 
 -/* Prevent the hardware from restoring the last context (which
 hung) on
 - * the next switch */
  for (i = 0; i  I915_NUM_RINGS; i++) {
  struct intel_engine_cs *ring = dev_priv-ring[i];
 -struct intel_context *dctx = ring-default_context;
  struct intel_context *lctx = ring-last_context;
 
 -/* Do a fake switch to the default context */
 -if (lctx == dctx)
 -continue;
 -
 -if (!lctx)
 -continue;
 +if (lctx) {
 +if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
 +i915_gem_object_ggtt_unpin(lctx-
 legacy_hw_ctx.rcs_state);
 
 -if (dctx-legacy_hw_ctx.rcs_state  i == RCS) {
 -WARN_ON(i915_gem_obj_ggtt_pin(dctx-
 legacy_hw_ctx.rcs_state,
 -
 get_context_alignment(dev), 0));
 -/* Fake a finish/inactive */
 -dctx-legacy_hw_ctx.rcs_state-base.write_domain
 = 0;
 -dctx-legacy_hw_ctx.rcs_state-active = 0;
 +i915_gem_context_unreference(lctx);
 +ring-last_context = NULL;
  }
 -
 -if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
 -

[Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw

2014-08-15 Thread alistair . mcaulay
From: McAulay, Alistair alistair.mcau...@intel.com

This patch is to address Daniels concerns over different code during reset:

http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html

The reason for aiming as hard as possible to use the exact same code for
driver load, gpu reset and runtime pm/system resume is that we've simply
seen too many bugs due to slight variations and unintended omissions.

Tested using igt drv_hangman.

V2: Cleaner way of preventing check_wedge returning -EAGAIN
V3: Clean the last_context during reset, to ensure do_switch() does the 
MI_SET_CONTEXT. As per review.
Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com
---
 drivers/gpu/drm/i915/i915_drv.c |  6 +++
 drivers/gpu/drm/i915/i915_drv.h |  3 ++
 drivers/gpu/drm/i915/i915_gem.c |  4 +-
 drivers/gpu/drm/i915/i915_gem_context.c | 33 +++-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +
 drivers/gpu/drm/i915/i915_gem_gtt.h |  3 +-
 6 files changed, 28 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5e4fefd..3bfafe6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev)
!dev_priv-ums.mm_suspended) {
dev_priv-ums.mm_suspended = 0;
 
+   /* Used to prevent gem_check_wedged returning -EAGAIN during 
gpu reset */
+   dev_priv-gpu_error.reload_in_reset = true;
+
ret = i915_gem_init_hw(dev);
+
+   dev_priv-gpu_error.reload_in_reset = false;
+
mutex_unlock(dev-struct_mutex);
if (ret) {
DRM_ERROR(Failed hw init on reset %d\n, ret);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 991b663..116daff 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1217,6 +1217,9 @@ struct i915_gpu_error {
 
/* For missed irq/seqno simulation. */
unsigned int test_irq_rings;
+
+   /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset  
 */
+   bool reload_in_reset;
 };
 
 enum modeset_restore {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ef047bc..e7396eb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error,
if (i915_terminally_wedged(error))
return -EIO;
 
-   return -EAGAIN;
+   /* Check if GPU Reset is in progress */
+   if (!error-reload_in_reset)
+   return -EAGAIN;
}
 
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index de72a28..9378ad8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device *dev)
struct drm_i915_private *dev_priv = dev-dev_private;
int i;
 
-   /* Prevent the hardware from restoring the last context (which hung) on
-* the next switch */
for (i = 0; i  I915_NUM_RINGS; i++) {
struct intel_engine_cs *ring = dev_priv-ring[i];
-   struct intel_context *dctx = ring-default_context;
struct intel_context *lctx = ring-last_context;
 
-   /* Do a fake switch to the default context */
-   if (lctx == dctx)
-   continue;
-
-   if (!lctx)
-   continue;
+   if (lctx) {
+   if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
+   
i915_gem_object_ggtt_unpin(lctx-legacy_hw_ctx.rcs_state);
 
-   if (dctx-legacy_hw_ctx.rcs_state  i == RCS) {
-   
WARN_ON(i915_gem_obj_ggtt_pin(dctx-legacy_hw_ctx.rcs_state,
- 
get_context_alignment(dev), 0));
-   /* Fake a finish/inactive */
-   dctx-legacy_hw_ctx.rcs_state-base.write_domain = 0;
-   dctx-legacy_hw_ctx.rcs_state-active = 0;
+   i915_gem_context_unreference(lctx);
+   ring-last_context = NULL;
}
-
-   if (lctx-legacy_hw_ctx.rcs_state  i == RCS)
-   
i915_gem_object_ggtt_unpin(lctx-legacy_hw_ctx.rcs_state);
-
-   i915_gem_context_unreference(lctx);
-   i915_gem_context_reference(dctx);
-   ring-last_context = dctx;
}
 }
 
@@ -498,10 +481,6 @@ int i915_gem_context_enable(struct drm_i915_private 
*dev_priv)
ppgtt-enable(ppgtt);
}
 
-   /* FIXME: We should make this work, even in reset