Re: [Intel-gfx] [PATCH 12/15] drm/i915/gem: Cancel non-persistent contexts on close

2019-10-14 Thread Tvrtko Ursulin


On 14/10/2019 14:34, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2019-10-14 14:10:30)


On 14/10/2019 13:21, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2019-10-14 13:11:46)


On 14/10/2019 10:07, Chris Wilson wrote:

Normally, we rely on our hangcheck to prevent persistent batches from
hogging the GPU. However, if the user disables hangcheck, this mechanism
breaks down. Despite our insistence that this is unsafe, the users are
equally insistent that they want to use endless batches and will disable
the hangcheck mechanism. We are looking at perhaps replacing hangcheck
with a softer mechanism, that sends a pulse down the engine to check if
it is well. We can use the same preemptive pulse to flush an active
persistent context off the GPU upon context close, preventing resources
being lost and unkillable requests remaining on the GPU after process
termination. To avoid changing the ABI and accidentally breaking
existing userspace, we make the persistence of a context explicit and
enable it by default (matching current ABI). Userspace can opt out of
persistent mode (forcing requests to be cancelled when the context is
closed by process termination or explicitly) by a context parameter. To
facilitate existing use-cases of disabling hangcheck, if the modparam is
disabled (i915.enable_hangcheck=0), we disable persistence mode by
default.  (Note, one of the outcomes for supporting endless mode will be
the removal of hangchecking, at which point opting into persistent mode
will be mandatory, or maybe the default perhaps controlled by cgroups.)

v2: Check for hangchecking at context termination, so that we are not
left with undying contexts from a crafty user.
v3: Force context termination even if forced-preemption is disabled.

Testcase: igt/gem_ctx_persistence
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Michał Winiarski 
Cc: Jon Bloomfield 
Reviewed-by: Jon Bloomfield 
---
drivers/gpu/drm/i915/gem/i915_gem_context.c   | 182 ++
drivers/gpu/drm/i915/gem/i915_gem_context.h   |  15 ++
.../gpu/drm/i915/gem/i915_gem_context_types.h |   1 +
.../gpu/drm/i915/gem/selftests/mock_context.c |   2 +
include/uapi/drm/i915_drm.h   |  15 ++
5 files changed, 215 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5d8221c7ba83..70b72456e2c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -70,6 +70,7 @@
#include 

#include "gt/intel_lrc_reg.h"

+#include "gt/intel_engine_heartbeat.h"
#include "gt/intel_engine_user.h"

#include "i915_gem_context.h"

@@ -269,6 +270,128 @@ void i915_gem_context_release(struct kref *ref)
schedule_work(>free_work);
}

+static inline struct i915_gem_engines *

+__context_engines_static(const struct i915_gem_context *ctx)
+{
+ return rcu_dereference_protected(ctx->engines, true);
+}
+
+static bool __reset_engine(struct intel_engine_cs *engine)
+{
+ struct intel_gt *gt = engine->gt;
+ bool success = false;
+
+ if (!intel_has_reset_engine(gt))
+ return false;
+
+ if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
+   >reset.flags)) {
+ success = intel_engine_reset(engine, NULL) == 0;
+ clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+   >reset.flags);
+ }
+
+ return success;
+}
+
+static void __reset_context(struct i915_gem_context *ctx,
+ struct intel_engine_cs *engine)
+{
+ intel_gt_handle_error(engine->gt, engine->mask, 0,
+   "context closure in %s", ctx->name);
+}
+
+static bool __cancel_engine(struct intel_engine_cs *engine)
+{
+ /*
+  * Send a "high priority pulse" down the engine to cause the
+  * current request to be momentarily preempted. (If it fails to
+  * be preempted, it will be reset). As we have marked our context
+  * as banned, any incomplete request, including any running, will
+  * be skipped following the preemption.
+  */
+ if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
+ return true;


Maybe I lost the train of thought here.. But why not even try with the
pulse even if forced preemption is not compiled in? There is a chance it
may preempt normally, no?


If there is no reset-on-preemption failure and no hangchecking, there is
no reset and we are left with the denial-of-service that we are seeking
to close.


Because there is no mechanism to send a pulse, see if it managed to
preempt, but if it did not come come back later and reset?


What are you going to preempt with? The mechanism you describe is what
the pulse + forced-preempt is meant to be handling. (I was going to use
a 2 birds with one stone allegory for the various features all pulling
together, but it's more like a flock with a 

Re: [Intel-gfx] [PATCH 12/15] drm/i915/gem: Cancel non-persistent contexts on close

2019-10-14 Thread Chris Wilson
Quoting Tvrtko Ursulin (2019-10-14 14:10:30)
> 
> On 14/10/2019 13:21, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-10-14 13:11:46)
> >>
> >> On 14/10/2019 10:07, Chris Wilson wrote:
> >>> Normally, we rely on our hangcheck to prevent persistent batches from
> >>> hogging the GPU. However, if the user disables hangcheck, this mechanism
> >>> breaks down. Despite our insistence that this is unsafe, the users are
> >>> equally insistent that they want to use endless batches and will disable
> >>> the hangcheck mechanism. We are looking at perhaps replacing hangcheck
> >>> with a softer mechanism, that sends a pulse down the engine to check if
> >>> it is well. We can use the same preemptive pulse to flush an active
> >>> persistent context off the GPU upon context close, preventing resources
> >>> being lost and unkillable requests remaining on the GPU after process
> >>> termination. To avoid changing the ABI and accidentally breaking
> >>> existing userspace, we make the persistence of a context explicit and
> >>> enable it by default (matching current ABI). Userspace can opt out of
> >>> persistent mode (forcing requests to be cancelled when the context is
> >>> closed by process termination or explicitly) by a context parameter. To
> >>> facilitate existing use-cases of disabling hangcheck, if the modparam is
> >>> disabled (i915.enable_hangcheck=0), we disable persistence mode by
> >>> default.  (Note, one of the outcomes for supporting endless mode will be
> >>> the removal of hangchecking, at which point opting into persistent mode
> >>> will be mandatory, or maybe the default perhaps controlled by cgroups.)
> >>>
> >>> v2: Check for hangchecking at context termination, so that we are not
> >>> left with undying contexts from a crafty user.
> >>> v3: Force context termination even if forced-preemption is disabled.
> >>>
> >>> Testcase: igt/gem_ctx_persistence
> >>> Signed-off-by: Chris Wilson 
> >>> Cc: Joonas Lahtinen 
> >>> Cc: Michał Winiarski 
> >>> Cc: Jon Bloomfield 
> >>> Reviewed-by: Jon Bloomfield 
> >>> ---
> >>>drivers/gpu/drm/i915/gem/i915_gem_context.c   | 182 ++
> >>>drivers/gpu/drm/i915/gem/i915_gem_context.h   |  15 ++
> >>>.../gpu/drm/i915/gem/i915_gem_context_types.h |   1 +
> >>>.../gpu/drm/i915/gem/selftests/mock_context.c |   2 +
> >>>include/uapi/drm/i915_drm.h   |  15 ++
> >>>5 files changed, 215 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> >>> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> index 5d8221c7ba83..70b72456e2c4 100644
> >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> @@ -70,6 +70,7 @@
> >>>#include 
> >>>
> >>>#include "gt/intel_lrc_reg.h"
> >>> +#include "gt/intel_engine_heartbeat.h"
> >>>#include "gt/intel_engine_user.h"
> >>>
> >>>#include "i915_gem_context.h"
> >>> @@ -269,6 +270,128 @@ void i915_gem_context_release(struct kref *ref)
> >>>schedule_work(>free_work);
> >>>}
> >>>
> >>> +static inline struct i915_gem_engines *
> >>> +__context_engines_static(const struct i915_gem_context *ctx)
> >>> +{
> >>> + return rcu_dereference_protected(ctx->engines, true);
> >>> +}
> >>> +
> >>> +static bool __reset_engine(struct intel_engine_cs *engine)
> >>> +{
> >>> + struct intel_gt *gt = engine->gt;
> >>> + bool success = false;
> >>> +
> >>> + if (!intel_has_reset_engine(gt))
> >>> + return false;
> >>> +
> >>> + if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> >>> +   >reset.flags)) {
> >>> + success = intel_engine_reset(engine, NULL) == 0;
> >>> + clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
> >>> +   >reset.flags);
> >>> + }
> >>> +
> >>> + return success;
> >>> +}
> >>> +
> >>> +static void __reset_context(struct i915_gem_context *ctx,
> >>> + struct intel_engine_cs *engine)
> >>> +{
> >>> + intel_gt_handle_error(engine->gt, engine->mask, 0,
> >>> +   "context closure in %s", ctx->name);
> >>> +}
> >>> +
> >>> +static bool __cancel_engine(struct intel_engine_cs *engine)
> >>> +{
> >>> + /*
> >>> +  * Send a "high priority pulse" down the engine to cause the
> >>> +  * current request to be momentarily preempted. (If it fails to
> >>> +  * be preempted, it will be reset). As we have marked our context
> >>> +  * as banned, any incomplete request, including any running, will
> >>> +  * be skipped following the preemption.
> >>> +  */
> >>> + if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
> >>> + return true;
> >>
> >> Maybe I lost the train of thought here.. But why not even try with the
> >> pulse even if forced preemption is not compiled in? There is a chance it
> >> may preempt normally, no?
> 

Re: [Intel-gfx] [PATCH 12/15] drm/i915/gem: Cancel non-persistent contexts on close

2019-10-14 Thread Tvrtko Ursulin


On 14/10/2019 13:21, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2019-10-14 13:11:46)


On 14/10/2019 10:07, Chris Wilson wrote:

Normally, we rely on our hangcheck to prevent persistent batches from
hogging the GPU. However, if the user disables hangcheck, this mechanism
breaks down. Despite our insistence that this is unsafe, the users are
equally insistent that they want to use endless batches and will disable
the hangcheck mechanism. We are looking at perhaps replacing hangcheck
with a softer mechanism, that sends a pulse down the engine to check if
it is well. We can use the same preemptive pulse to flush an active
persistent context off the GPU upon context close, preventing resources
being lost and unkillable requests remaining on the GPU after process
termination. To avoid changing the ABI and accidentally breaking
existing userspace, we make the persistence of a context explicit and
enable it by default (matching current ABI). Userspace can opt out of
persistent mode (forcing requests to be cancelled when the context is
closed by process termination or explicitly) by a context parameter. To
facilitate existing use-cases of disabling hangcheck, if the modparam is
disabled (i915.enable_hangcheck=0), we disable persistence mode by
default.  (Note, one of the outcomes for supporting endless mode will be
the removal of hangchecking, at which point opting into persistent mode
will be mandatory, or maybe the default perhaps controlled by cgroups.)

v2: Check for hangchecking at context termination, so that we are not
left with undying contexts from a crafty user.
v3: Force context termination even if forced-preemption is disabled.

Testcase: igt/gem_ctx_persistence
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Michał Winiarski 
Cc: Jon Bloomfield 
Reviewed-by: Jon Bloomfield 
---
   drivers/gpu/drm/i915/gem/i915_gem_context.c   | 182 ++
   drivers/gpu/drm/i915/gem/i915_gem_context.h   |  15 ++
   .../gpu/drm/i915/gem/i915_gem_context_types.h |   1 +
   .../gpu/drm/i915/gem/selftests/mock_context.c |   2 +
   include/uapi/drm/i915_drm.h   |  15 ++
   5 files changed, 215 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5d8221c7ba83..70b72456e2c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -70,6 +70,7 @@
   #include 
   
   #include "gt/intel_lrc_reg.h"

+#include "gt/intel_engine_heartbeat.h"
   #include "gt/intel_engine_user.h"
   
   #include "i915_gem_context.h"

@@ -269,6 +270,128 @@ void i915_gem_context_release(struct kref *ref)
   schedule_work(>free_work);
   }
   
+static inline struct i915_gem_engines *

+__context_engines_static(const struct i915_gem_context *ctx)
+{
+ return rcu_dereference_protected(ctx->engines, true);
+}
+
+static bool __reset_engine(struct intel_engine_cs *engine)
+{
+ struct intel_gt *gt = engine->gt;
+ bool success = false;
+
+ if (!intel_has_reset_engine(gt))
+ return false;
+
+ if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
+   >reset.flags)) {
+ success = intel_engine_reset(engine, NULL) == 0;
+ clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+   >reset.flags);
+ }
+
+ return success;
+}
+
+static void __reset_context(struct i915_gem_context *ctx,
+ struct intel_engine_cs *engine)
+{
+ intel_gt_handle_error(engine->gt, engine->mask, 0,
+   "context closure in %s", ctx->name);
+}
+
+static bool __cancel_engine(struct intel_engine_cs *engine)
+{
+ /*
+  * Send a "high priority pulse" down the engine to cause the
+  * current request to be momentarily preempted. (If it fails to
+  * be preempted, it will be reset). As we have marked our context
+  * as banned, any incomplete request, including any running, will
+  * be skipped following the preemption.
+  */
+ if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
+ return true;


Maybe I lost the train of thought here.. But why not even try with the
pulse even if forced preemption is not compiled in? There is a chance it
may preempt normally, no?


If there is no reset-on-preemption failure and no hangchecking, there is
no reset and we are left with the denial-of-service that we are seeking
to close.


Because there is no mechanism to send a pulse, see if it managed to 
preempt, but if it did not come come back later and reset?



Hm, or from the other angle, why bother with preemption and not just
reset? What is the value in letting the closed context complete if at
the same time, if it is preemptable, we will cancel all outstanding work
anyway?


The reset is the elephant gun; it is likely to cause collateral damage.
So we try with a bit of finesse first.


How so? Isn't our per-engine 

Re: [Intel-gfx] [PATCH 12/15] drm/i915/gem: Cancel non-persistent contexts on close

2019-10-14 Thread Chris Wilson
Quoting Tvrtko Ursulin (2019-10-14 13:11:46)
> 
> On 14/10/2019 10:07, Chris Wilson wrote:
> > Normally, we rely on our hangcheck to prevent persistent batches from
> > hogging the GPU. However, if the user disables hangcheck, this mechanism
> > breaks down. Despite our insistence that this is unsafe, the users are
> > equally insistent that they want to use endless batches and will disable
> > the hangcheck mechanism. We are looking at perhaps replacing hangcheck
> > with a softer mechanism, that sends a pulse down the engine to check if
> > it is well. We can use the same preemptive pulse to flush an active
> > persistent context off the GPU upon context close, preventing resources
> > being lost and unkillable requests remaining on the GPU after process
> > termination. To avoid changing the ABI and accidentally breaking
> > existing userspace, we make the persistence of a context explicit and
> > enable it by default (matching current ABI). Userspace can opt out of
> > persistent mode (forcing requests to be cancelled when the context is
> > closed by process termination or explicitly) by a context parameter. To
> > facilitate existing use-cases of disabling hangcheck, if the modparam is
> > disabled (i915.enable_hangcheck=0), we disable persistence mode by
> > default.  (Note, one of the outcomes for supporting endless mode will be
> > the removal of hangchecking, at which point opting into persistent mode
> > will be mandatory, or maybe the default perhaps controlled by cgroups.)
> > 
> > v2: Check for hangchecking at context termination, so that we are not
> > left with undying contexts from a crafty user.
> > v3: Force context termination even if forced-preemption is disabled.
> > 
> > Testcase: igt/gem_ctx_persistence
> > Signed-off-by: Chris Wilson 
> > Cc: Joonas Lahtinen 
> > Cc: Michał Winiarski 
> > Cc: Jon Bloomfield 
> > Reviewed-by: Jon Bloomfield 
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c   | 182 ++
> >   drivers/gpu/drm/i915/gem/i915_gem_context.h   |  15 ++
> >   .../gpu/drm/i915/gem/i915_gem_context_types.h |   1 +
> >   .../gpu/drm/i915/gem/selftests/mock_context.c |   2 +
> >   include/uapi/drm/i915_drm.h   |  15 ++
> >   5 files changed, 215 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 5d8221c7ba83..70b72456e2c4 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -70,6 +70,7 @@
> >   #include 
> >   
> >   #include "gt/intel_lrc_reg.h"
> > +#include "gt/intel_engine_heartbeat.h"
> >   #include "gt/intel_engine_user.h"
> >   
> >   #include "i915_gem_context.h"
> > @@ -269,6 +270,128 @@ void i915_gem_context_release(struct kref *ref)
> >   schedule_work(>free_work);
> >   }
> >   
> > +static inline struct i915_gem_engines *
> > +__context_engines_static(const struct i915_gem_context *ctx)
> > +{
> > + return rcu_dereference_protected(ctx->engines, true);
> > +}
> > +
> > +static bool __reset_engine(struct intel_engine_cs *engine)
> > +{
> > + struct intel_gt *gt = engine->gt;
> > + bool success = false;
> > +
> > + if (!intel_has_reset_engine(gt))
> > + return false;
> > +
> > + if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > +   >reset.flags)) {
> > + success = intel_engine_reset(engine, NULL) == 0;
> > + clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
> > +   >reset.flags);
> > + }
> > +
> > + return success;
> > +}
> > +
> > +static void __reset_context(struct i915_gem_context *ctx,
> > + struct intel_engine_cs *engine)
> > +{
> > + intel_gt_handle_error(engine->gt, engine->mask, 0,
> > +   "context closure in %s", ctx->name);
> > +}
> > +
> > +static bool __cancel_engine(struct intel_engine_cs *engine)
> > +{
> > + /*
> > +  * Send a "high priority pulse" down the engine to cause the
> > +  * current request to be momentarily preempted. (If it fails to
> > +  * be preempted, it will be reset). As we have marked our context
> > +  * as banned, any incomplete request, including any running, will
> > +  * be skipped following the preemption.
> > +  */
> > + if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
> > + return true;
> 
> Maybe I lost the train of thought here.. But why not even try with the 
> pulse even if forced preemption is not compiled in? There is a chance it 
> may preempt normally, no?

If there is no reset-on-preemption failure and no hangchecking, there is
no reset and we are left with the denial-of-service that we are seeking
to close.
 
> Hm, or from the other angle, why bother with preemption and not just 
> reset? What is the value in letting the closed context complete if at 
> the same 

Re: [Intel-gfx] [PATCH 12/15] drm/i915/gem: Cancel non-persistent contexts on close

2019-10-14 Thread Tvrtko Ursulin


On 14/10/2019 10:07, Chris Wilson wrote:

Normally, we rely on our hangcheck to prevent persistent batches from
hogging the GPU. However, if the user disables hangcheck, this mechanism
breaks down. Despite our insistence that this is unsafe, the users are
equally insistent that they want to use endless batches and will disable
the hangcheck mechanism. We are looking at perhaps replacing hangcheck
with a softer mechanism, that sends a pulse down the engine to check if
it is well. We can use the same preemptive pulse to flush an active
persistent context off the GPU upon context close, preventing resources
being lost and unkillable requests remaining on the GPU after process
termination. To avoid changing the ABI and accidentally breaking
existing userspace, we make the persistence of a context explicit and
enable it by default (matching current ABI). Userspace can opt out of
persistent mode (forcing requests to be cancelled when the context is
closed by process termination or explicitly) by a context parameter. To
facilitate existing use-cases of disabling hangcheck, if the modparam is
disabled (i915.enable_hangcheck=0), we disable persistence mode by
default.  (Note, one of the outcomes for supporting endless mode will be
the removal of hangchecking, at which point opting into persistent mode
will be mandatory, or maybe the default perhaps controlled by cgroups.)

v2: Check for hangchecking at context termination, so that we are not
left with undying contexts from a crafty user.
v3: Force context termination even if forced-preemption is disabled.

Testcase: igt/gem_ctx_persistence
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Michał Winiarski 
Cc: Jon Bloomfield 
Reviewed-by: Jon Bloomfield 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 182 ++
  drivers/gpu/drm/i915/gem/i915_gem_context.h   |  15 ++
  .../gpu/drm/i915/gem/i915_gem_context_types.h |   1 +
  .../gpu/drm/i915/gem/selftests/mock_context.c |   2 +
  include/uapi/drm/i915_drm.h   |  15 ++
  5 files changed, 215 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5d8221c7ba83..70b72456e2c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -70,6 +70,7 @@
  #include 
  
  #include "gt/intel_lrc_reg.h"

+#include "gt/intel_engine_heartbeat.h"
  #include "gt/intel_engine_user.h"
  
  #include "i915_gem_context.h"

@@ -269,6 +270,128 @@ void i915_gem_context_release(struct kref *ref)
schedule_work(>free_work);
  }
  
+static inline struct i915_gem_engines *

+__context_engines_static(const struct i915_gem_context *ctx)
+{
+   return rcu_dereference_protected(ctx->engines, true);
+}
+
+static bool __reset_engine(struct intel_engine_cs *engine)
+{
+   struct intel_gt *gt = engine->gt;
+   bool success = false;
+
+   if (!intel_has_reset_engine(gt))
+   return false;
+
+   if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
+ >reset.flags)) {
+   success = intel_engine_reset(engine, NULL) == 0;
+   clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+ >reset.flags);
+   }
+
+   return success;
+}
+
+static void __reset_context(struct i915_gem_context *ctx,
+   struct intel_engine_cs *engine)
+{
+   intel_gt_handle_error(engine->gt, engine->mask, 0,
+ "context closure in %s", ctx->name);
+}
+
+static bool __cancel_engine(struct intel_engine_cs *engine)
+{
+   /*
+* Send a "high priority pulse" down the engine to cause the
+* current request to be momentarily preempted. (If it fails to
+* be preempted, it will be reset). As we have marked our context
+* as banned, any incomplete request, including any running, will
+* be skipped following the preemption.
+*/
+   if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
+   return true;


Maybe I lost the train of thought here.. But why not even try with the 
pulse even if forced preemption is not compiled in? There is a chance it 
may preempt normally, no?


Hm, or from the other angle, why bother with preemption and not just 
reset? What is the value in letting the closed context complete if at 
the same time, if it is preemptable, we will cancel all outstanding work 
anyway?



+
+   /* If we are unable to send a pulse, try resseting this engine. */


Typo in resetting.

Regards,

Tvrtko


+   return __reset_engine(engine);
+}
+
+static struct intel_engine_cs *
+active_engine(struct dma_fence *fence, struct intel_context *ce)
+{
+   struct i915_request *rq = to_request(fence);
+   struct intel_engine_cs *engine, *locked;
+
+   /*
+* Serialise with __i915_request_submit() so that it sees
+* is-banned?, or we know