Re: [Intel-gfx] [PATCH v6 14/20] drm/i915/guc: Add support for reset engine using GuC commands

2017-04-20 Thread Chris Wilson
On Wed, Apr 19, 2017 at 04:22:43PM -0700, Michel Thierry wrote:
> On 19/04/17 03:27, Chris Wilson wrote:
> >On Tue, Apr 18, 2017 at 01:23:29PM -0700, Michel Thierry wrote:
> >>This patch adds per engine reset and recovery (TDR) support when GuC is
> >>used to submit workloads to GPU.
> >>
> >>In the case of i915 directly submission to ELSP, driver manages hang
> >>detection, recovery and resubmission. With GuC submission these tasks
> >>are shared between driver and GuC. i915 is still responsible for detecting
> >>a hang, and when it does it only requests GuC to reset that Engine. GuC
> >>internally manages acquiring forcewake and idling the engine before actually
> >>resetting it.
> >>
> >>Once the reset is successful, i915 takes over again and handles 
> >>resubmission.
> >>The scheduler in i915 knows which requests are pending so after resetting
> >>a engine, pending workloads/requests are resubmitted again.
> >>
> >>v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> >>non-guc funtion names.
> >>
> >>Signed-off-by: Arun Siluvery 
> >>Signed-off-by: Jeff McGee 
> >>Signed-off-by: Michel Thierry 
> >>---
> >>diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> >>b/drivers/gpu/drm/i915/intel_lrc.c
> >>index 7df278fe492e..6295760098a1 100644
> >>--- a/drivers/gpu/drm/i915/intel_lrc.c
> >>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>@@ -1176,14 +1176,15 @@ static int gen8_init_common_ring(struct 
> >>intel_engine_cs *engine)
> >>
> >>/* After a GPU reset, we may have requests to replay */
> >>clear_bit(ENGINE_IRQ_EXECLIST, >irq_posted);
> >>-   if (!i915.enable_guc_submission && !execlists_elsp_idle(engine)) {
> >>+   if (!execlists_elsp_idle(engine)) {
> >>DRM_DEBUG_DRIVER("Restarting %s from requests [0x%x, 0x%x]\n",
> >> engine->name,
> >> port_seqno(>execlist_port[0]),
> >> port_seqno(>execlist_port[1]));
> >>engine->execlist_port[0].count = 0;
> >>engine->execlist_port[1].count = 0;
> >>-   execlists_submit_ports(engine);
> >>+   if (!dev_priv->guc.execbuf_client)
> >>+   execlists_submit_ports(engine);
> >
> >Not sure what you were intending to do here as this only resets the
> >submission count -- which is not used by guc dequeue. Some merit in the
> >making the code look similar, certainly adds the dbg message but I think
> >it is unrelated to the rest of the patch.
> 
> Yes, it only keeps the same debug message (originally added to check
> it was taking the right path). I can remove if you think it doesn't
> provide anything useful.

Just a small patch by itself, it is only a distraction to the larger
patch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 14/20] drm/i915/guc: Add support for reset engine using GuC commands

2017-04-19 Thread Michel Thierry

On 19/04/17 03:27, Chris Wilson wrote:

On Tue, Apr 18, 2017 at 01:23:29PM -0700, Michel Thierry wrote:

This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.

In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before actually
resetting it.

Once the reset is successful, i915 takes over again and handles resubmission.
The scheduler in i915 knows which requests are pending so after resetting
a engine, pending workloads/requests are resubmitted again.

v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc funtion names.

Signed-off-by: Arun Siluvery 
Signed-off-by: Jeff McGee 
Signed-off-by: Michel Thierry 
---
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7df278fe492e..6295760098a1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1176,14 +1176,15 @@ static int gen8_init_common_ring(struct intel_engine_cs 
*engine)

/* After a GPU reset, we may have requests to replay */
clear_bit(ENGINE_IRQ_EXECLIST, >irq_posted);
-   if (!i915.enable_guc_submission && !execlists_elsp_idle(engine)) {
+   if (!execlists_elsp_idle(engine)) {
DRM_DEBUG_DRIVER("Restarting %s from requests [0x%x, 0x%x]\n",
 engine->name,
 port_seqno(>execlist_port[0]),
 port_seqno(>execlist_port[1]));
engine->execlist_port[0].count = 0;
engine->execlist_port[1].count = 0;
-   execlists_submit_ports(engine);
+   if (!dev_priv->guc.execbuf_client)
+   execlists_submit_ports(engine);


Not sure what you were intending to do here as this only resets the
submission count -- which is not used by guc dequeue. Some merit in the
making the code look similar, certainly adds the dbg message but I think
it is unrelated to the rest of the patch.


Yes, it only keeps the same debug message (originally added to check it 
was taking the right path). I can remove if you think it doesn't provide 
anything useful.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 14/20] drm/i915/guc: Add support for reset engine using GuC commands

2017-04-19 Thread Chris Wilson
On Tue, Apr 18, 2017 at 01:23:29PM -0700, Michel Thierry wrote:
> This patch adds per engine reset and recovery (TDR) support when GuC is
> used to submit workloads to GPU.
> 
> In the case of i915 directly submission to ELSP, driver manages hang
> detection, recovery and resubmission. With GuC submission these tasks
> are shared between driver and GuC. i915 is still responsible for detecting
> a hang, and when it does it only requests GuC to reset that Engine. GuC
> internally manages acquiring forcewake and idling the engine before actually
> resetting it.
> 
> Once the reset is successful, i915 takes over again and handles resubmission.
> The scheduler in i915 knows which requests are pending so after resetting
> a engine, pending workloads/requests are resubmitted again.
> 
> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> non-guc funtion names.
> 
> Signed-off-by: Arun Siluvery 
> Signed-off-by: Jeff McGee 
> Signed-off-by: Michel Thierry 
> ---
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 7df278fe492e..6295760098a1 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1176,14 +1176,15 @@ static int gen8_init_common_ring(struct 
> intel_engine_cs *engine)
>  
>   /* After a GPU reset, we may have requests to replay */
>   clear_bit(ENGINE_IRQ_EXECLIST, >irq_posted);
> - if (!i915.enable_guc_submission && !execlists_elsp_idle(engine)) {
> + if (!execlists_elsp_idle(engine)) {
>   DRM_DEBUG_DRIVER("Restarting %s from requests [0x%x, 0x%x]\n",
>engine->name,
>port_seqno(>execlist_port[0]),
>port_seqno(>execlist_port[1]));
>   engine->execlist_port[0].count = 0;
>   engine->execlist_port[1].count = 0;
> - execlists_submit_ports(engine);
> + if (!dev_priv->guc.execbuf_client)
> + execlists_submit_ports(engine);

Not sure what you were intending to do here as this only resets the
submission count -- which is not used by guc dequeue. Some merit in the
making the code look similar, certainly adds the dbg message but I think
it is unrelated to the rest of the patch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 14/20] drm/i915/guc: Add support for reset engine using GuC commands

2017-04-18 Thread Michel Thierry
This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.

In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before actually
resetting it.

Once the reset is successful, i915 takes over again and handles resubmission.
The scheduler in i915 knows which requests are pending so after resetting
a engine, pending workloads/requests are resubmitted again.

v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc funtion names.

Signed-off-by: Arun Siluvery 
Signed-off-by: Jeff McGee 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_drv.c| 43 +-
 drivers/gpu/drm/i915/i915_drv.h|  1 +
 drivers/gpu/drm/i915/i915_guc_submission.c | 48 ++
 drivers/gpu/drm/i915/intel_guc_fwif.h  |  6 
 drivers/gpu/drm/i915/intel_lrc.c   |  5 ++--
 drivers/gpu/drm/i915/intel_uc.h|  1 +
 drivers/gpu/drm/i915/intel_uncore.c|  5 
 7 files changed, 88 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 974be1fa77f9..b7e2fa8a0036 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1910,23 +1910,34 @@ int i915_reset_engine(struct intel_engine_cs *engine)
 */
i915_gem_reset_engine(engine);
 
-   /* forcing engine to idle */
-   ret = intel_reset_engine_start(engine);
-   if (ret) {
-   DRM_ERROR("Failed to disable %s\n", engine->name);
-   goto error;
-   }
+   if (!dev_priv->guc.execbuf_client) {
+   /* forcing engine to idle */
+   ret = intel_reset_engine_start(engine);
+   if (ret) {
+   DRM_ERROR("Failed to disable %s\n", engine->name);
+   goto error;
+   }
 
-   /* finally, reset engine */
-   ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
-   if (ret) {
-   DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret);
+   /* finally, reset engine */
+   ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
+   if (ret) {
+   DRM_ERROR("Failed to reset %s, ret=%d\n",
+ engine->name, ret);
+   intel_reset_engine_cancel(engine);
+   goto error;
+   }
+
+   /* be sure the request reset bit gets cleared */
intel_reset_engine_cancel(engine);
-   goto error;
-   }
 
-   /* be sure the request reset bit gets cleared */
-   intel_reset_engine_cancel(engine);
+   } else {
+   ret = i915_guc_reset_engine(engine);
+   if (ret) {
+   DRM_ERROR("GuC failed to reset %s, ret=%d\n",
+ engine->name, ret);
+   goto error;
+   }
+   }
 
i915_gem_reset_finish_engine(engine);
 
@@ -1935,6 +1946,10 @@ int i915_reset_engine(struct intel_engine_cs *engine)
if (ret)
goto error;
 
+   /* for guc too */
+   if (dev_priv->guc.execbuf_client)
+   i915_guc_submission_reenable_engine(engine);
+
error->reset_engine_count[engine->id]++;
 
 wakeup:
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 71c34f15be64..5f2345fbff44 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3029,6 +3029,7 @@ extern int i915_reset_engine(struct intel_engine_cs 
*engine);
 extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
 extern int intel_reset_engine_start(struct intel_engine_cs *engine);
 extern void intel_reset_engine_cancel(struct intel_engine_cs *engine);
+extern int i915_guc_reset_engine(struct intel_engine_cs *engine);
 extern int intel_guc_reset(struct drm_i915_private *dev_priv);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
 extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
b/drivers/gpu/drm/i915/i915_guc_submission.c
index d772718861df..c8067aeab6f4 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1338,6 +1338,25 @@ void i915_guc_submission_disable(struct drm_i915_private 
*dev_priv)
guc->execbuf_client = NULL;
 }
 
+void i915_guc_submission_reenable_engine(struct intel_engine_cs *engine)
+{
+   struct