Re: [Intel-gfx] [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

2018-03-05 Thread Chris Wilson
Quoting Chris Wilson (2018-03-05 14:34:42)
> Quoting Mika Kuoppala (2018-03-05 13:59:43)
> > Chris Wilson  writes:
> > 
> > > Similar to the staging around handling of engine->submit_request, we
> > > need to stop adding to the execlists->queue prior to calling
> > > engine->cancel_requests. cancel_requests will move requests from the
> > > queue onto the timeline, so if we add a request onto the queue after that
> > > point, it will be lost.
> > >
> > > Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in 
> > > set_wedged")
> > > Signed-off-by: Chris Wilson 
> > > Cc: Mika Kuoppala 
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c | 13 +++--
> > >  drivers/gpu/drm/i915/i915_request.c |  2 ++
> > >  2 files changed, 9 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > > b/drivers/gpu/drm/i915/i915_gem.c
> > > index a5bd07338b46..8d913d833ab9 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence 
> > > *fence, int prio)
> > >  
> > >   rq = to_request(fence);
> > >   engine = rq->engine;
> > > - if (!engine->schedule)
> > > - return;
> > >  
> > > - engine->schedule(rq, prio);
> > > + rcu_read_lock();
> > > + if (engine->schedule)
> > > + engine->schedule(rq, prio);
> > > + rcu_read_unlock();
> > >  }
> > >  
> > >  static void fence_set_priority(struct dma_fence *fence, int prio)
> > > @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private 
> > > *i915)
> > >*/
> > >   for_each_engine(engine, i915, id) {
> > >   i915_gem_reset_prepare_engine(engine);
> > > +
> > >   engine->submit_request = nop_submit_request;
> > > + engine->schedule = NULL;
> > 
> > Why we are not using rcu_assign_pointer and rcu_deference pair
> > in the upper part where we check the schedule?
> 
> We are not using RCU protection. RCU here is being abused as a
> free-flowing stop-machine.

I'm sorely tempted to put it back to stop_machine as the races are just
plain weird and proving hard to fix :(
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

2018-03-05 Thread Chris Wilson
Quoting Mika Kuoppala (2018-03-05 13:59:43)
> Chris Wilson  writes:
> 
> > Similar to the staging around handling of engine->submit_request, we
> > need to stop adding to the execlists->queue prior to calling
> > engine->cancel_requests. cancel_requests will move requests from the
> > queue onto the timeline, so if we add a request onto the queue after that
> > point, it will be lost.
> >
> > Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in 
> > set_wedged")
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 13 +++--
> >  drivers/gpu/drm/i915/i915_request.c |  2 ++
> >  2 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index a5bd07338b46..8d913d833ab9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence 
> > *fence, int prio)
> >  
> >   rq = to_request(fence);
> >   engine = rq->engine;
> > - if (!engine->schedule)
> > - return;
> >  
> > - engine->schedule(rq, prio);
> > + rcu_read_lock();
> > + if (engine->schedule)
> > + engine->schedule(rq, prio);
> > + rcu_read_unlock();
> >  }
> >  
> >  static void fence_set_priority(struct dma_fence *fence, int prio)
> > @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private 
> > *i915)
> >*/
> >   for_each_engine(engine, i915, id) {
> >   i915_gem_reset_prepare_engine(engine);
> > +
> >   engine->submit_request = nop_submit_request;
> > + engine->schedule = NULL;
> 
> Why we are not using rcu_assign_pointer and rcu_deference pair
> in the upper part where we check the schedule?

We are not using RCU protection. RCU here is being abused as a
free-flowing stop-machine.
 
> Further, is there are risk that we lose sync between the two
> assigments. In another words, should we combine both callbaks
> behind a single deferensable pointer in the engine struct?

They are only tied together by how the backend uses them, not by request
flow, so I don't think so.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

2018-03-05 Thread Mika Kuoppala
Chris Wilson  writes:

> Similar to the staging around handling of engine->submit_request, we
> need to stop adding to the execlists->queue prior to calling
> engine->cancel_requests. cancel_requests will move requests from the
> queue onto the timeline, so if we add a request onto the queue after that
> point, it will be lost.
>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in 
> set_wedged")
> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 13 +++--
>  drivers/gpu/drm/i915/i915_request.c |  2 ++
>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a5bd07338b46..8d913d833ab9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence 
> *fence, int prio)
>  
>   rq = to_request(fence);
>   engine = rq->engine;
> - if (!engine->schedule)
> - return;
>  
> - engine->schedule(rq, prio);
> + rcu_read_lock();
> + if (engine->schedule)
> + engine->schedule(rq, prio);
> + rcu_read_unlock();
>  }
>  
>  static void fence_set_priority(struct dma_fence *fence, int prio)
> @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>*/
>   for_each_engine(engine, i915, id) {
>   i915_gem_reset_prepare_engine(engine);
> +
>   engine->submit_request = nop_submit_request;
> + engine->schedule = NULL;

Why we are not using rcu_assign_pointer and rcu_deference pair
in the upper part where we check the schedule?

Further, is there are risk that we lose sync between the two
assigments. In another words, should we combine both callbacks
behind a single deferensable pointer in the engine struct?

-Mika


>   }
> + i915->caps.scheduler = 0;
>  
>   /*
>* Make sure no one is running the old callback before we proceed with
> @@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>* start to complete all requests.
>*/
>   engine->submit_request = nop_complete_submit_request;
> - engine->schedule = NULL;
>   }
>  
> - i915->caps.scheduler = 0;
> -
>   /*
>* Make sure no request can slip through without getting completed by
>* either this call here to intel_engine_init_global_seqno, or the one
> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> b/drivers/gpu/drm/i915/i915_request.c
> index 2265bb8ff4fa..59a87afd83b6 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1081,8 +1081,10 @@ void __i915_request_add(struct i915_request *request, 
> bool flush_caches)
>* decide whether to preempt the entire chain so that it is ready to
>* run at the earliest possible convenience.
>*/
> + rcu_read_lock();
>   if (engine->schedule)
>   engine->schedule(request, request->ctx->priority);
> + rcu_read_unlock();
>  
>   local_bh_disable();
>   i915_sw_fence_commit(>submit);
> -- 
> 2.16.2
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

2018-03-03 Thread Chris Wilson
Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
---
 drivers/gpu/drm/i915/i915_gem.c | 13 +++--
 drivers/gpu/drm/i915/i915_request.c |  2 ++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a5bd07338b46..8d913d833ab9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, 
int prio)
 
rq = to_request(fence);
engine = rq->engine;
-   if (!engine->schedule)
-   return;
 
-   engine->schedule(rq, prio);
+   rcu_read_lock();
+   if (engine->schedule)
+   engine->schedule(rq, prio);
+   rcu_read_unlock();
 }
 
 static void fence_set_priority(struct dma_fence *fence, int prio)
@@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 */
for_each_engine(engine, i915, id) {
i915_gem_reset_prepare_engine(engine);
+
engine->submit_request = nop_submit_request;
+   engine->schedule = NULL;
}
+   i915->caps.scheduler = 0;
 
/*
 * Make sure no one is running the old callback before we proceed with
@@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 * start to complete all requests.
 */
engine->submit_request = nop_complete_submit_request;
-   engine->schedule = NULL;
}
 
-   i915->caps.scheduler = 0;
-
/*
 * Make sure no request can slip through without getting completed by
 * either this call here to intel_engine_init_global_seqno, or the one
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 2265bb8ff4fa..59a87afd83b6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1081,8 +1081,10 @@ void __i915_request_add(struct i915_request *request, 
bool flush_caches)
 * decide whether to preempt the entire chain so that it is ready to
 * run at the earliest possible convenience.
 */
+   rcu_read_lock();
if (engine->schedule)
engine->schedule(request, request->ctx->priority);
+   rcu_read_unlock();
 
local_bh_disable();
i915_sw_fence_commit(>submit);
-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx