Re: [Intel-gfx] [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands

2017-11-02 Thread Chris Wilson
Quoting Jeff McGee (2017-11-01 20:41:13)
> On Wed, Nov 01, 2017 at 01:58:04PM +, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-10-31 22:53:09)
> > > This patch adds per engine reset and recovery (TDR) support when GuC is
> > > used to submit workloads to GPU.
> > > 
> > > In the case of i915 directly submission to ELSP, driver manages hang
> > > detection, recovery and resubmission. With GuC submission these tasks
> > > are shared between driver and GuC. i915 is still responsible for detecting
> > > a hang, and when it does it only requests GuC to reset that Engine. GuC
> > > internally manages acquiring forcewake and idling the engine before
> > > resetting it.
> > > 
> > > Once the reset is successful, i915 takes over again and handles the
> > > resubmission. The scheduler in i915 knows which requests are pending so
> > > after resetting a engine, pending workloads/requests are resubmitted
> > > again.
> > > 
> > > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> > > non-guc function names.
> > > 
> > > v3: Removed debug message about engine restarting from which request,
> > > since the new baseline do it regardless of submission mode. (Chris)
> > > 
> > > v4: Rebase.
> > > 
> > > v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> > > tasklet_schedule(>irq_tasklet) handles the resubmit; rebase.
> > > 
> > > v6: Rename the existing reset engine function and share a similar
> > > interface between guc and non-guc paths (Chris).
> > > 
> > > Signed-off-by: Michel Thierry 
> > > Cc: Chris Wilson 
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.c   | 15 +--
> > >  drivers/gpu/drm/i915/i915_drv.h   |  2 ++
> > >  drivers/gpu/drm/i915/intel_guc.c  | 24 
> > >  drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
> > >  drivers/gpu/drm/i915/intel_uncore.c   |  5 -
> > >  5 files changed, 40 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> > > b/drivers/gpu/drm/i915/i915_drv.c
> > > index af745749509c..359333a423cf 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, 
> > > unsigned int flags)
> > > goto finish;
> > >  }
> > >  
> > > +static inline int intel_gt_reset_engine(struct drm_i915_private 
> > > *dev_priv,
> > > +   struct intel_engine_cs *engine)
> > > +{
> > > +   return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> > > +}
> > > +
> > >  /**
> > >   * i915_reset_engine - reset GPU engine to recover from a hang
> > >   * @engine: engine to reset
> > > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs 
> > > *engine, unsigned int flags)
> > > goto out;
> > > }
> > >  
> > > -   ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> > > +   if (!engine->i915->guc.execbuf_client)
> > > +   ret = intel_gt_reset_engine(engine->i915, engine);
> > > +   else
> > > +   ret = intel_guc_reset_engine(>i915->guc, engine);
> > > +
> > > if (ret) {
> > > /* If we fail here, we expect to fallback to a global 
> > > reset */
> > > -   DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> > > +   DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> > > +(engine->i915->guc.execbuf_client ? "GUC 
> > > ":""),
> > 
> > A bit overkill on the parentheses there ;)
> > 
> > Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
> > interaction?
> > -Chris
> 
> There is one small change needed in the GuC preemption protocol to make it
> compatible with GuC engine reset. I will send that shortly.
> 
> There are also a couple of corner case bugs with engine reset in our current
> firmware versions. We are planning a firmware update to address those. But
> the host-side code here is fine. So...
> 
> Reviewed-by: Jeff McGee 

Pushed this series, along with the preempt interaction fix.
Thanks,
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands

2017-11-01 Thread Jeff McGee
On Wed, Nov 01, 2017 at 01:58:04PM +, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-31 22:53:09)
> > This patch adds per engine reset and recovery (TDR) support when GuC is
> > used to submit workloads to GPU.
> > 
> > In the case of i915 directly submission to ELSP, driver manages hang
> > detection, recovery and resubmission. With GuC submission these tasks
> > are shared between driver and GuC. i915 is still responsible for detecting
> > a hang, and when it does it only requests GuC to reset that Engine. GuC
> > internally manages acquiring forcewake and idling the engine before
> > resetting it.
> > 
> > Once the reset is successful, i915 takes over again and handles the
> > resubmission. The scheduler in i915 knows which requests are pending so
> > after resetting a engine, pending workloads/requests are resubmitted
> > again.
> > 
> > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> > non-guc function names.
> > 
> > v3: Removed debug message about engine restarting from which request,
> > since the new baseline do it regardless of submission mode. (Chris)
> > 
> > v4: Rebase.
> > 
> > v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> > tasklet_schedule(>irq_tasklet) handles the resubmit; rebase.
> > 
> > v6: Rename the existing reset engine function and share a similar
> > interface between guc and non-guc paths (Chris).
> > 
> > Signed-off-by: Michel Thierry 
> > Cc: Chris Wilson 
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c   | 15 +--
> >  drivers/gpu/drm/i915/i915_drv.h   |  2 ++
> >  drivers/gpu/drm/i915/intel_guc.c  | 24 
> >  drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
> >  drivers/gpu/drm/i915/intel_uncore.c   |  5 -
> >  5 files changed, 40 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> > b/drivers/gpu/drm/i915/i915_drv.c
> > index af745749509c..359333a423cf 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, 
> > unsigned int flags)
> > goto finish;
> >  }
> >  
> > +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> > +   struct intel_engine_cs *engine)
> > +{
> > +   return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> > +}
> > +
> >  /**
> >   * i915_reset_engine - reset GPU engine to recover from a hang
> >   * @engine: engine to reset
> > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs 
> > *engine, unsigned int flags)
> > goto out;
> > }
> >  
> > -   ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> > +   if (!engine->i915->guc.execbuf_client)
> > +   ret = intel_gt_reset_engine(engine->i915, engine);
> > +   else
> > +   ret = intel_guc_reset_engine(>i915->guc, engine);
> > +
> > if (ret) {
> > /* If we fail here, we expect to fallback to a global reset 
> > */
> > -   DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> > +   DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> > +(engine->i915->guc.execbuf_client ? "GUC 
> > ":""),
> 
> A bit overkill on the parentheses there ;)
> 
> Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
> interaction?
> -Chris

There is one small change needed in the GuC preemption protocol to make it
compatible with GuC engine reset. I will send that shortly.

There are also a couple of corner case bugs with engine reset in our current
firmware versions. We are planning a firmware update to address those. But
the host-side code here is fine. So...

Reviewed-by: Jeff McGee 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands

2017-11-01 Thread Chris Wilson
Quoting Michel Thierry (2017-10-31 22:53:09)
> This patch adds per engine reset and recovery (TDR) support when GuC is
> used to submit workloads to GPU.
> 
> In the case of i915 directly submission to ELSP, driver manages hang
> detection, recovery and resubmission. With GuC submission these tasks
> are shared between driver and GuC. i915 is still responsible for detecting
> a hang, and when it does it only requests GuC to reset that Engine. GuC
> internally manages acquiring forcewake and idling the engine before
> resetting it.
> 
> Once the reset is successful, i915 takes over again and handles the
> resubmission. The scheduler in i915 knows which requests are pending so
> after resetting a engine, pending workloads/requests are resubmitted
> again.
> 
> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> non-guc function names.
> 
> v3: Removed debug message about engine restarting from which request,
> since the new baseline do it regardless of submission mode. (Chris)
> 
> v4: Rebase.
> 
> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> tasklet_schedule(>irq_tasklet) handles the resubmit; rebase.
> 
> v6: Rename the existing reset engine function and share a similar
> interface between guc and non-guc paths (Chris).
> 
> Signed-off-by: Michel Thierry 
> Cc: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/i915_drv.c   | 15 +--
>  drivers/gpu/drm/i915/i915_drv.h   |  2 ++
>  drivers/gpu/drm/i915/intel_guc.c  | 24 
>  drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
>  drivers/gpu/drm/i915/intel_uncore.c   |  5 -
>  5 files changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index af745749509c..359333a423cf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, 
> unsigned int flags)
> goto finish;
>  }
>  
> +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> +   struct intel_engine_cs *engine)
> +{
> +   return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> +}
> +
>  /**
>   * i915_reset_engine - reset GPU engine to recover from a hang
>   * @engine: engine to reset
> @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, 
> unsigned int flags)
> goto out;
> }
>  
> -   ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> +   if (!engine->i915->guc.execbuf_client)
> +   ret = intel_gt_reset_engine(engine->i915, engine);
> +   else
> +   ret = intel_guc_reset_engine(>i915->guc, engine);
> +
> if (ret) {
> /* If we fail here, we expect to fallback to a global reset */
> -   DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> +   DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> +(engine->i915->guc.execbuf_client ? "GUC 
> ":""),

A bit overkill on the parentheses there ;)

Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
interaction?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands

2017-10-31 Thread Michel Thierry
This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.

In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before
resetting it.

Once the reset is successful, i915 takes over again and handles the
resubmission. The scheduler in i915 knows which requests are pending so
after resetting a engine, pending workloads/requests are resubmitted
again.

v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc function names.

v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)

v4: Rebase.

v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(>irq_tasklet) handles the resubmit; rebase.

v6: Rename the existing reset engine function and share a similar
interface between guc and non-guc paths (Chris).

Signed-off-by: Michel Thierry 
Cc: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.c   | 15 +--
 drivers/gpu/drm/i915/i915_drv.h   |  2 ++
 drivers/gpu/drm/i915/intel_guc.c  | 24 
 drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
 drivers/gpu/drm/i915/intel_uncore.c   |  5 -
 5 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index af745749509c..359333a423cf 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned 
int flags)
goto finish;
 }
 
+static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
+   struct intel_engine_cs *engine)
+{
+   return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
+}
+
 /**
  * i915_reset_engine - reset GPU engine to recover from a hang
  * @engine: engine to reset
@@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, 
unsigned int flags)
goto out;
}
 
-   ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
+   if (!engine->i915->guc.execbuf_client)
+   ret = intel_gt_reset_engine(engine->i915, engine);
+   else
+   ret = intel_guc_reset_engine(>i915->guc, engine);
+
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
-   DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
+   DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
+(engine->i915->guc.execbuf_client ? "GUC ":""),
 engine->name, ret);
goto out;
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cff1b57598c3..ce2725696187 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3330,6 +3330,8 @@ extern int i915_reset_engine(struct intel_engine_cs 
*engine,
 
 extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
 extern int intel_reset_guc(struct drm_i915_private *dev_priv);
+extern int intel_guc_reset_engine(struct intel_guc *guc,
+ struct intel_engine_cs *engine);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
 extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index f74d50fdaeb0..9678630a1c70 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -24,6 +24,7 @@
 
 #include "intel_guc.h"
 #include "i915_drv.h"
+#include "i915_guc_submission.h"
 
 static void gen8_guc_raise_irq(struct intel_guc *guc)
 {
@@ -283,6 +284,29 @@ int intel_guc_suspend(struct drm_i915_private *dev_priv)
return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
 
+/**
+ * intel_guc_reset_engine() - ask GuC to reset an engine
+ * @guc:   intel_guc structure
+ * @engine:engine to be reset
+ */
+int intel_guc_reset_engine(struct intel_guc *guc,
+  struct intel_engine_cs *engine)
+{
+   u32 data[7];
+
+   GEM_BUG_ON(!guc->execbuf_client);
+
+   data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
+   data[1] = engine->guc_id;
+   data[2] = 0;
+   data[3] = 0;
+   data[4] = 0;
+   data[5] = guc->execbuf_client->stage_id;
+   data[6] = guc_ggtt_offset(guc->shared_data);
+
+   return intel_guc_send(guc, data, ARRAY_SIZE(data));
+}
+
 /**
  * intel_guc_resume() - notify GuC