Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
On Mon, Aug 25, 2014 at 10:18:09PM +0200, Daniel Vetter wrote: On Wed, Aug 20, 2014 at 04:56:41PM +0100, Chris Wilson wrote: On Wed, Aug 20, 2014 at 03:21:55PM +, Mcaulay, Alistair wrote: It is not the same. This is a special case when re-initialising the hw. This flag is to allow gem_init_hw() to complete successfully during reset. At any other point during reset, -EAGAIN should be returned. Indeed. You've missed the point. Look closer at the reset counter and reset ordering. We could try to mark the gpu as reset again before starting the reinit to avoid this kludge. But that has the problem that if the ring init fails we have a bit a mess in marking the gpu terminally wedged. So I think overall not prettier than what we have here ... And if I'm mistaken I guess I can put on my idiot hat and merge the fixup ;-) See other patches during the week. Marking the reset as complete after performing the reset and before resuming the hardware is not ugly. If the reset fails, you still have the wedge | in-progress. If the reset succeeds but resume fails, you then have wedge | !in-progress. Seriously read the request patch to see how it straightens out ring access, full stop. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
On Wed, Aug 20, 2014 at 04:56:41PM +0100, Chris Wilson wrote: On Wed, Aug 20, 2014 at 03:21:55PM +, Mcaulay, Alistair wrote: It is not the same. This is a special case when re-initialising the hw. This flag is to allow gem_init_hw() to complete successfully during reset. At any other point during reset, -EAGAIN should be returned. Indeed. You've missed the point. Look closer at the reset counter and reset ordering. We could try to mark the gpu as reset again before starting the reinit to avoid this kludge. But that has the problem that if the ring init fails we have a bit a mess in marking the gpu terminally wedged. So I think overall not prettier than what we have here ... And if I'm mistaken I guess I can put on my idiot hat and merge the fixup ;-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
On Thu, Aug 21, 2014 at 12:38:58PM +, Mcaulay, Alistair wrote: Hi Daniel, Is there anything else needing done before this patch can be merged? Me getting working internet mostly. Patch merged with the comment in check_wedge a bit extended. Also had to rebase, please double-check that I didn't fumble it. Thanks, Daniel Thanks, Alistair. -Original Message- From: Mika Kuoppala [mailto:mika.kuopp...@linux.intel.com] Sent: Tuesday, August 19, 2014 1:36 PM To: Mcaulay, Alistair; intel-gfx@lists.freedesktop.org Subject: RE: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw Mcaulay, Alistair alistair.mcau...@intel.com writes: Hi Mika, can you please review this patch, and verify it fixes the issues in your previous review. Thanks, Alistair. -Original Message- From: Mcaulay, Alistair Sent: Friday, August 15, 2014 6:52 PM To: intel-gfx@lists.freedesktop.org Cc: Mcaulay, Alistair Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +-- -- drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; +/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ +dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + +dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + +/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ +bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; -return -EAGAIN; +/* Check if GPU Reset is in progress */ +if (!error-reload_in_reset) +return -EAGAIN; } return 0; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index de72a28..9378ad8 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; int i; -/* Prevent the hardware from restoring the last context (which hung) on - * the next switch */ for (i = 0; i I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = dev_priv-ring[i]; -struct intel_context *dctx = ring-default_context; struct intel_context *lctx =
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
Hi Daniel, Is there anything else needing done before this patch can be merged? Thanks, Alistair. -Original Message- From: Mika Kuoppala [mailto:mika.kuopp...@linux.intel.com] Sent: Tuesday, August 19, 2014 1:36 PM To: Mcaulay, Alistair; intel-gfx@lists.freedesktop.org Subject: RE: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw Mcaulay, Alistair alistair.mcau...@intel.com writes: Hi Mika, can you please review this patch, and verify it fixes the issues in your previous review. Thanks, Alistair. -Original Message- From: Mcaulay, Alistair Sent: Friday, August 15, 2014 6:52 PM To: intel-gfx@lists.freedesktop.org Cc: Mcaulay, Alistair Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +-- -- drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + + dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; - return -EAGAIN; + /* Check if GPU Reset is in progress */ + if (!error-reload_in_reset) + return -EAGAIN; } return 0; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index de72a28..9378ad8 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; int i; - /* Prevent the hardware from restoring the last context (which hung) on - * the next switch */ for (i = 0; i I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = dev_priv-ring[i]; - struct intel_context *dctx = ring-default_context; struct intel_context *lctx = ring-last_context; - /* Do a fake switch to the default context */ - if (lctx == dctx) - continue; - - if (!lctx) - continue; + if (lctx) { + if (lctx-legacy_hw_ctx.rcs_state i == RCS) + i915_gem_object_ggtt_unpin(lctx- legacy_hw_ctx.rcs_state); - if (dctx-legacy_hw_ctx.rcs_state i == RCS) { - WARN_ON(i915_gem_obj_ggtt_pin(dctx- legacy_hw_ctx.rcs_state, - get_context_alignment(dev), 0));
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
-Original Message- From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of alistair.mcau...@intel.com Sent: Friday, August 15, 2014 6:52 PM To: intel-gfx@lists.freedesktop.org Subject: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 + drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + + dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; - return -EAGAIN; + /* Check if GPU Reset is in progress */ + if (!error-reload_in_reset) + return -EAGAIN; } return 0; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index de72a28..9378ad8 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; int i; - /* Prevent the hardware from restoring the last context (which hung) on - * the next switch */ for (i = 0; i I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = dev_priv-ring[i]; - struct intel_context *dctx = ring-default_context; struct intel_context *lctx = ring-last_context; - /* Do a fake switch to the default context */ - if (lctx == dctx) - continue; - - if (!lctx) - continue; + if (lctx) { + if (lctx-legacy_hw_ctx.rcs_state i == RCS) + i915_gem_object_ggtt_unpin(lctx- legacy_hw_ctx.rcs_state); - if (dctx-legacy_hw_ctx.rcs_state i == RCS) { - WARN_ON(i915_gem_obj_ggtt_pin(dctx- legacy_hw_ctx.rcs_state, - get_context_alignment(dev), 0)); - /* Fake a finish/inactive */ - dctx-legacy_hw_ctx.rcs_state-base.write_domain = 0; - dctx-legacy_hw_ctx.rcs_state-active = 0; + i915_gem_context_unreference(lctx); + ring-last_context = NULL; } - - if (lctx-legacy_hw_ctx.rcs_state i == RCS) - i915_gem_object_ggtt_unpin(lctx- legacy_hw_ctx.rcs_state); - - i915_gem_context_unreference(lctx
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
On Wed, Aug 20, 2014 at 02:46:37PM +, Daniel, Thomas wrote: -Original Message- From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of alistair.mcau...@intel.com Sent: Friday, August 15, 2014 6:52 PM To: intel-gfx@lists.freedesktop.org Subject: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 + drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + + dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; - return -EAGAIN; + /* Check if GPU Reset is in progress */ + if (!error-reload_in_reset) + return -EAGAIN; This is silly. You already have the same flag above. Look closer. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
-Original Message- From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] Sent: Wednesday, August 20, 2014 3:58 PM To: Daniel, Thomas Cc: Mcaulay, Alistair; intel-gfx@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw On Wed, Aug 20, 2014 at 02:46:37PM +, Daniel, Thomas wrote: -Original Message- From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of alistair.mcau...@intel.com Sent: Friday, August 15, 2014 6:52 PM To: intel-gfx@lists.freedesktop.org Subject: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.htm l The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +-- -- drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + + dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; - return -EAGAIN; + /* Check if GPU Reset is in progress */ + if (!error-reload_in_reset) + return -EAGAIN; This is silly. You already have the same flag above. Look closer. -Chris -- Chris Wilson, Intel Open Source Technology Centre It is not the same. This is a special case when re-initialising the hw. This flag is to allow gem_init_hw() to complete successfully during reset. At any other point during reset, -EAGAIN should be returned. Alistair. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
On Wed, Aug 20, 2014 at 03:21:55PM +, Mcaulay, Alistair wrote: It is not the same. This is a special case when re-initialising the hw. This flag is to allow gem_init_hw() to complete successfully during reset. At any other point during reset, -EAGAIN should be returned. Indeed. You've missed the point. Look closer at the reset counter and reset ordering. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
Hi Mika, can you please review this patch, and verify it fixes the issues in your previous review. Thanks, Alistair. -Original Message- From: Mcaulay, Alistair Sent: Friday, August 15, 2014 6:52 PM To: intel-gfx@lists.freedesktop.org Cc: Mcaulay, Alistair Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 + drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + + dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; - return -EAGAIN; + /* Check if GPU Reset is in progress */ + if (!error-reload_in_reset) + return -EAGAIN; } return 0; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index de72a28..9378ad8 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; int i; - /* Prevent the hardware from restoring the last context (which hung) on - * the next switch */ for (i = 0; i I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = dev_priv-ring[i]; - struct intel_context *dctx = ring-default_context; struct intel_context *lctx = ring-last_context; - /* Do a fake switch to the default context */ - if (lctx == dctx) - continue; - - if (!lctx) - continue; + if (lctx) { + if (lctx-legacy_hw_ctx.rcs_state i == RCS) + i915_gem_object_ggtt_unpin(lctx- legacy_hw_ctx.rcs_state); - if (dctx-legacy_hw_ctx.rcs_state i == RCS) { - WARN_ON(i915_gem_obj_ggtt_pin(dctx- legacy_hw_ctx.rcs_state, - get_context_alignment(dev), 0)); - /* Fake a finish/inactive */ - dctx-legacy_hw_ctx.rcs_state-base.write_domain = 0; - dctx-legacy_hw_ctx.rcs_state-active = 0; + i915_gem_context_unreference(lctx); + ring-last_context = NULL; } - - if (lctx-legacy_hw_ctx.rcs_state i == RCS) - i915_gem_object_ggtt_unpin(lctx- legacy_hw_ctx.rcs_state); - -
Re: [Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
Mcaulay, Alistair alistair.mcau...@intel.com writes: Hi Mika, can you please review this patch, and verify it fixes the issues in your previous review. Thanks, Alistair. -Original Message- From: Mcaulay, Alistair Sent: Friday, August 15, 2014 6:52 PM To: intel-gfx@lists.freedesktop.org Cc: Mcaulay, Alistair Subject: [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 + drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; +/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ +dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + +dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + +/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ +bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; -return -EAGAIN; +/* Check if GPU Reset is in progress */ +if (!error-reload_in_reset) +return -EAGAIN; } return 0; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index de72a28..9378ad8 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; int i; -/* Prevent the hardware from restoring the last context (which hung) on - * the next switch */ for (i = 0; i I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = dev_priv-ring[i]; -struct intel_context *dctx = ring-default_context; struct intel_context *lctx = ring-last_context; -/* Do a fake switch to the default context */ -if (lctx == dctx) -continue; - -if (!lctx) -continue; +if (lctx) { +if (lctx-legacy_hw_ctx.rcs_state i == RCS) +i915_gem_object_ggtt_unpin(lctx- legacy_hw_ctx.rcs_state); -if (dctx-legacy_hw_ctx.rcs_state i == RCS) { -WARN_ON(i915_gem_obj_ggtt_pin(dctx- legacy_hw_ctx.rcs_state, - get_context_alignment(dev), 0)); -/* Fake a finish/inactive */ -dctx-legacy_hw_ctx.rcs_state-base.write_domain = 0; -dctx-legacy_hw_ctx.rcs_state-active = 0; +i915_gem_context_unreference(lctx); +ring-last_context = NULL; } - -if (lctx-legacy_hw_ctx.rcs_state i == RCS) -
[Intel-gfx] [PATCH v3] drm/i915: Rework GPU reset sequence to match driver load thaw
From: McAulay, Alistair alistair.mcau...@intel.com This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions. Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair alistair.mcau...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_gem.c | 4 +- drivers/gpu/drm/i915/i915_gem_context.c | 33 +++- drivers/gpu/drm/i915/i915_gem_gtt.c | 67 + drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +- 6 files changed, 28 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5e4fefd..3bfafe6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -806,7 +806,13 @@ int i915_reset(struct drm_device *dev) !dev_priv-ums.mm_suspended) { dev_priv-ums.mm_suspended = 0; + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + dev_priv-gpu_error.reload_in_reset = true; + ret = i915_gem_init_hw(dev); + + dev_priv-gpu_error.reload_in_reset = false; + mutex_unlock(dev-struct_mutex); if (ret) { DRM_ERROR(Failed hw init on reset %d\n, ret); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 991b663..116daff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,9 @@ struct i915_gpu_error { /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; + + /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ + bool reload_in_reset; }; enum modeset_restore { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef047bc..e7396eb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1085,7 +1085,9 @@ i915_gem_check_wedge(struct i915_gpu_error *error, if (i915_terminally_wedged(error)) return -EIO; - return -EAGAIN; + /* Check if GPU Reset is in progress */ + if (!error-reload_in_reset) + return -EAGAIN; } return 0; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index de72a28..9378ad8 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -377,34 +377,17 @@ void i915_gem_context_reset(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; int i; - /* Prevent the hardware from restoring the last context (which hung) on -* the next switch */ for (i = 0; i I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = dev_priv-ring[i]; - struct intel_context *dctx = ring-default_context; struct intel_context *lctx = ring-last_context; - /* Do a fake switch to the default context */ - if (lctx == dctx) - continue; - - if (!lctx) - continue; + if (lctx) { + if (lctx-legacy_hw_ctx.rcs_state i == RCS) + i915_gem_object_ggtt_unpin(lctx-legacy_hw_ctx.rcs_state); - if (dctx-legacy_hw_ctx.rcs_state i == RCS) { - WARN_ON(i915_gem_obj_ggtt_pin(dctx-legacy_hw_ctx.rcs_state, - get_context_alignment(dev), 0)); - /* Fake a finish/inactive */ - dctx-legacy_hw_ctx.rcs_state-base.write_domain = 0; - dctx-legacy_hw_ctx.rcs_state-active = 0; + i915_gem_context_unreference(lctx); + ring-last_context = NULL; } - - if (lctx-legacy_hw_ctx.rcs_state i == RCS) - i915_gem_object_ggtt_unpin(lctx-legacy_hw_ctx.rcs_state); - - i915_gem_context_unreference(lctx); - i915_gem_context_reference(dctx); - ring-last_context = dctx; } } @@ -498,10 +481,6 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv) ppgtt-enable(ppgtt); } - /* FIXME: We should make this work, even in reset