On Mon, Dec 07, 2020 at 07:38:10PM +, Chris Wilson wrote:
> The issue with stale virtual breadcrumbs remain. Now we have the problem
> that if the irq-signaler is still referencing the stale breadcrumb as we
> transfer it to a new sibling, the list becomes spaghetti. This is a very
> small window, but that doesn't stop it being hit infrequently. To
> prevent the lists being tangled (the iterator starting on one engine's
> b->signalers but walking onto another list), always decouple the virtual
> breadcrumb on schedule-out and make sure that the walker has stepped out
> of the lists.
>
> Signed-off-by: Chris Wilson
Makes sense to me.
Reviewed-by: Matthew Brost
> ---
> drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 +++--
> drivers/gpu/drm/i915/gt/intel_lrc.c | 15 +++
> 2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 00918300f53f..63900edbde88 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -454,15 +454,16 @@ void i915_request_cancel_breadcrumb(struct i915_request
> *rq)
> {
> struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
> struct intel_context *ce = rq->context;
> + unsigned long flags;
> bool release;
>
> if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, >fence.flags))
> return;
>
> - spin_lock(>signal_lock);
> + spin_lock_irqsave(>signal_lock, flags);
> list_del_rcu(>signal_link);
> release = remove_signaling_context(b, ce);
> - spin_unlock(>signal_lock);
> + spin_unlock_irqrestore(>signal_lock, flags);
> if (release)
> intel_context_put(ce);
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 0b7e86240f3b..b3db16b2a5a4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1388,6 +1388,21 @@ static inline void execlists_schedule_in(struct
> i915_request *rq, int idx)
> static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
> {
> struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> + struct intel_engine_cs *engine = rq->engine;
> +
> + /* Flush concurrent rcu iterators in signal_irq_work */
> + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, >fence.flags)) {
> + /*
> + * After this point, the rq may be transferred to a new
> + * sibling, so before we clear ce->inflight make sure that
> + * the context has been removed from the b->signalers and
> + * furthermore we need to make sure that the concurrent
> + * iterator in signal_irq_work is no longer following
> + * ce->signal_link.
> + */
> + i915_request_cancel_breadcrumb(rq);
> + irq_work_sync(>breadcrumbs->irq_work);
> + }
>
> if (READ_ONCE(ve->request))
> tasklet_hi_schedule(>base.execlists.tasklet);
> --
> 2.20.1
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx