Re: [Intel-gfx] [PATCH 15/21] drm/i915: Coordinate i915_active with its own mutex

2019-09-20 Thread Chris Wilson
Quoting Tvrtko Ursulin (2019-09-20 17:14:43)
> 
> On 02/09/2019 05:02, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> > index 2b1baf2fcc8e..6d7ac129ce8a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> > @@ -63,7 +63,7 @@ struct intel_timeline {
> >* the request using i915_active_request_get_request_rcu(), or hold 
> > the
> 
> Looks like a stale comment.
> 
> >* struct_mutex.
> >*/
> > - struct i915_active_request last_request;
> > + struct i915_active_fence last_request;
> 
> Worth renaming to last_fence now that request naming is otherwise gone 
> from i915_active?

Hmm, although i915_active is taking on more generic dma-fences, this is
still assumed to be a i915_request in a couple of places.

There's good arguments for either last_request or last_fence. If I throw
a GEM_BUG_ON(!is_request()) around, that probably swings the debate in
favour of last_request. At least tries to capture that we do assume in
some places this is i915_request.

> >   void i915_active_set_exclusive(struct i915_active *ref, struct dma_fence 
> > *f)
> >   {
> >   GEM_BUG_ON(i915_active_is_idle(ref));
> >   
> > - dma_fence_get(f);
> > -
> > - rcu_read_lock();
> > - if (rcu_access_pointer(ref->excl)) {
> > - struct dma_fence *old;
> > -
> > - old = dma_fence_get_rcu_safe(>excl);
> > - if (old) {
> > - if (dma_fence_remove_callback(old, >excl_cb))
> > - atomic_dec(>count);
> > - dma_fence_put(old);
> > - }
> > - }
> > - rcu_read_unlock();
> > -
> > - atomic_inc(>count);
> > - rcu_assign_pointer(ref->excl, f);
> > + mutex_acquire(>mutex.dep_map, 0, 0, _THIS_IP_);
> 
> This just lockdep annotation? Probably deserves a comment.

Yup.

> 
> > + if (!__i915_active_fence_set(>excl, f))
> > + atomic_inc(>count);
> 
> Refcount management is not better done from inside __i915_active_fence_set?

No, The active counting is on i915_active; i915_active_fence is just a
component.

E.g. other users want to know the fence that used to be in the
i915_active_fence:

int i915_active_fence_set(struct i915_active_fence *active,
  struct i915_request *rq)
{
struct dma_fence *fence;
int err;

#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
lockdep_assert_held(active->lock);
#endif

/* Must maintain ordering wrt previous active requests */
rcu_read_lock();
fence = __i915_active_fence_set(active, >fence);
if (fence)
fence = dma_fence_get_rcu(fence);
rcu_read_unlock();

if (fence) {
err = i915_request_await_dma_fence(rq, fence);
dma_fence_put(fence);
if (err)
return err;
}

return 0;
}

and similarly in __i915_request_add_to_timeline()

> > +struct dma_fence *
> > +__i915_active_fence_set(struct i915_active_fence *active,
> > + struct dma_fence *fence)
> > +{
> > + struct dma_fence *prev;
> > + unsigned long flags;
> > +
> > + /* NB: updates must be serialised by an outer timeline mutex */
> > + spin_lock_irqsave(fence->lock, flags);
> > + GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags));
> > +
> > + prev = rcu_dereference_protected(active->fence, 
> > active_is_held(active));
> > + if (prev) {
> > + spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
> > + __list_del_entry(>cb.node);
> > + spin_unlock(prev->lock); /* serialise with prev->cb_list */
> > + prev = rcu_access_pointer(active->fence);
> 
> Why it is important to re-read active->fence and does it then need a 
> comment?

Because we have to serialise with the prev->cb_list. ;)

The write to active->fence is made on signaling, and that is performed
under the prev->lock. So we have to take the prev->lock ourselves to
flush any concurrent callbacks before we know whether they ran (having
taken the lock, we remove the cb from the list and so now that if it
has not run, it can not run).

> How does it tie with i915_active->count refcounting which is done on the 
> basis of return value from this function?

Same as above, we have to flush the signal callback to determine whether
or not we need to increment the active count.

> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 3eed2efa8d13..1aadab1cdd24 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -900,28 +900,38 @@ wait_for_timelines(struct drm_i915_private *i915,
> >   
> >   spin_lock_irqsave(>lock, flags);
> >   list_for_each_entry(tl, >active_list, link) {
> > - struct 

Re: [Intel-gfx] [PATCH 15/21] drm/i915: Coordinate i915_active with its own mutex

2019-09-20 Thread Tvrtko Ursulin


On 02/09/2019 05:02, Chris Wilson wrote:

Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

Signed-off-by: Chris Wilson 
---
  .../gpu/drm/i915/display/intel_frontbuffer.c  |   2 +-
  drivers/gpu/drm/i915/display/intel_overlay.c  |   5 +-
  .../gpu/drm/i915/gem/i915_gem_client_blt.c|   2 +-
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
  .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
  drivers/gpu/drm/i915/gem/i915_gem_pm.c|   9 +-
  drivers/gpu/drm/i915/gt/intel_context.c   |   6 +-
  drivers/gpu/drm/i915/gt/intel_engine_pool.c   |   2 +-
  drivers/gpu/drm/i915/gt/intel_engine_pool.h   |   2 +-
  drivers/gpu/drm/i915/gt/intel_reset.c |  10 +-
  drivers/gpu/drm/i915/gt/intel_timeline.c  |   9 +-
  .../gpu/drm/i915/gt/intel_timeline_types.h|   2 +-
  drivers/gpu/drm/i915/gt/selftest_context.c|  16 +-
  drivers/gpu/drm/i915/gt/selftest_lrc.c|  10 +-
  .../gpu/drm/i915/gt/selftests/mock_timeline.c |   2 +-
  drivers/gpu/drm/i915/gvt/scheduler.c  |   3 -
  drivers/gpu/drm/i915/i915_active.c| 291 +++-
  drivers/gpu/drm/i915/i915_active.h| 318 --
  drivers/gpu/drm/i915/i915_active_types.h  |  23 +-
  drivers/gpu/drm/i915/i915_gem.c   |  42 ++-
  drivers/gpu/drm/i915/i915_gem_gtt.c   |   3 +-
  drivers/gpu/drm/i915/i915_gpu_error.c |   4 +-
  drivers/gpu/drm/i915/i915_request.c   |  39 +--
  drivers/gpu/drm/i915/i915_request.h   |   1 -
  drivers/gpu/drm/i915/i915_vma.c   |   8 +-
  drivers/gpu/drm/i915/selftests/i915_active.c  |  36 +-
  26 files changed, 274 insertions(+), 580 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_frontbuffer.c 
b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
index 6428b8dd70d3..84b164f31895 100644
--- a/drivers/gpu/drm/i915/display/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
@@ -257,7 +257,7 @@ intel_frontbuffer_get(struct drm_i915_gem_object *obj)
front->obj = obj;
kref_init(>ref);
atomic_set(>bits, 0);
-   i915_active_init(i915, >write,
+   i915_active_init(>write,
 frontbuffer_active,
 i915_active_may_sleep(frontbuffer_retire));
  
diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c

index 4f36557b3f3b..544e953342ea 100644
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@@ -230,7 +230,7 @@ alloc_request(struct intel_overlay *overlay, void 
(*fn)(struct intel_overlay *))
if (IS_ERR(rq))
return rq;
  
-	err = i915_active_ref(>last_flip, rq->timeline, rq);

+   err = i915_active_ref(>last_flip, rq->timeline, >fence);
if (err) {
i915_request_add(rq);
return ERR_PTR(err);
@@ -1360,8 +1360,7 @@ void intel_overlay_setup(struct drm_i915_private 
*dev_priv)
overlay->contrast = 75;
overlay->saturation = 146;
  
-	i915_active_init(dev_priv,

->last_flip,
+   i915_active_init(>last_flip,
 NULL, intel_overlay_last_flip_retire);
  
  	ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv));

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 9e72b42a86f5..ace50bb9ee1f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -160,7 +160,7 @@ static int move_to_active(struct i915_vma *vma, struct 
i915_request *rq)
if (err)
return err;
  
-	return i915_active_ref(>active, rq->timeline, rq);

+   return i915_active_ref(>active, rq->timeline, >fence);
  }
  
  static void clear_pages_worker(struct work_struct *work)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c

[Intel-gfx] [PATCH 15/21] drm/i915: Coordinate i915_active with its own mutex

2019-09-01 Thread Chris Wilson
Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/display/intel_frontbuffer.c  |   2 +-
 drivers/gpu/drm/i915/display/intel_overlay.c  |   5 +-
 .../gpu/drm/i915/gem/i915_gem_client_blt.c|   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pm.c|   9 +-
 drivers/gpu/drm/i915/gt/intel_context.c   |   6 +-
 drivers/gpu/drm/i915/gt/intel_engine_pool.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_engine_pool.h   |   2 +-
 drivers/gpu/drm/i915/gt/intel_reset.c |  10 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c  |   9 +-
 .../gpu/drm/i915/gt/intel_timeline_types.h|   2 +-
 drivers/gpu/drm/i915/gt/selftest_context.c|  16 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|  10 +-
 .../gpu/drm/i915/gt/selftests/mock_timeline.c |   2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c  |   3 -
 drivers/gpu/drm/i915/i915_active.c| 291 +++-
 drivers/gpu/drm/i915/i915_active.h| 318 --
 drivers/gpu/drm/i915/i915_active_types.h  |  23 +-
 drivers/gpu/drm/i915/i915_gem.c   |  42 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |   3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |   4 +-
 drivers/gpu/drm/i915/i915_request.c   |  39 +--
 drivers/gpu/drm/i915/i915_request.h   |   1 -
 drivers/gpu/drm/i915/i915_vma.c   |   8 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  36 +-
 26 files changed, 274 insertions(+), 580 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_frontbuffer.c 
b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
index 6428b8dd70d3..84b164f31895 100644
--- a/drivers/gpu/drm/i915/display/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
@@ -257,7 +257,7 @@ intel_frontbuffer_get(struct drm_i915_gem_object *obj)
front->obj = obj;
kref_init(>ref);
atomic_set(>bits, 0);
-   i915_active_init(i915, >write,
+   i915_active_init(>write,
 frontbuffer_active,
 i915_active_may_sleep(frontbuffer_retire));
 
diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
b/drivers/gpu/drm/i915/display/intel_overlay.c
index 4f36557b3f3b..544e953342ea 100644
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@@ -230,7 +230,7 @@ alloc_request(struct intel_overlay *overlay, void 
(*fn)(struct intel_overlay *))
if (IS_ERR(rq))
return rq;
 
-   err = i915_active_ref(>last_flip, rq->timeline, rq);
+   err = i915_active_ref(>last_flip, rq->timeline, >fence);
if (err) {
i915_request_add(rq);
return ERR_PTR(err);
@@ -1360,8 +1360,7 @@ void intel_overlay_setup(struct drm_i915_private 
*dev_priv)
overlay->contrast = 75;
overlay->saturation = 146;
 
-   i915_active_init(dev_priv,
->last_flip,
+   i915_active_init(>last_flip,
 NULL, intel_overlay_last_flip_retire);
 
ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv));
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 9e72b42a86f5..ace50bb9ee1f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -160,7 +160,7 @@ static int move_to_active(struct i915_vma *vma, struct 
i915_request *rq)
if (err)
return err;
 
-   return i915_active_ref(>active, rq->timeline, rq);
+   return i915_active_ref(>active, rq->timeline, >fence);
 }
 
 static void clear_pages_worker(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 4f303cb94698..b8ddcf7899a1 100644
--- 

[Intel-gfx] [PATCH 15/21] drm/i915: Coordinate i915_active with its own mutex

2019-08-30 Thread Chris Wilson
Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/display/intel_frontbuffer.c  |   2 +-
 drivers/gpu/drm/i915/display/intel_overlay.c  |   5 +-
 .../gpu/drm/i915/gem/i915_gem_client_blt.c|   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pm.c|   9 +-
 drivers/gpu/drm/i915/gt/intel_context.c   |   6 +-
 drivers/gpu/drm/i915/gt/intel_engine_pool.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_engine_pool.h   |   2 +-
 drivers/gpu/drm/i915/gt/intel_reset.c |  10 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c  |   9 +-
 .../gpu/drm/i915/gt/intel_timeline_types.h|   2 +-
 drivers/gpu/drm/i915/gt/selftest_context.c|  16 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|  10 +-
 .../gpu/drm/i915/gt/selftests/mock_timeline.c |   2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c  |   3 -
 drivers/gpu/drm/i915/i915_active.c| 291 +++-
 drivers/gpu/drm/i915/i915_active.h| 316 --
 drivers/gpu/drm/i915/i915_active_types.h  |  23 +-
 drivers/gpu/drm/i915/i915_gem.c   |  42 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |   3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |   4 +-
 drivers/gpu/drm/i915/i915_request.c   |  39 +--
 drivers/gpu/drm/i915/i915_request.h   |   1 -
 drivers/gpu/drm/i915/i915_vma.c   |   8 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  36 +-
 26 files changed, 273 insertions(+), 579 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_frontbuffer.c 
b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
index 6428b8dd70d3..84b164f31895 100644
--- a/drivers/gpu/drm/i915/display/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
@@ -257,7 +257,7 @@ intel_frontbuffer_get(struct drm_i915_gem_object *obj)
front->obj = obj;
kref_init(>ref);
atomic_set(>bits, 0);
-   i915_active_init(i915, >write,
+   i915_active_init(>write,
 frontbuffer_active,
 i915_active_may_sleep(frontbuffer_retire));
 
diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
b/drivers/gpu/drm/i915/display/intel_overlay.c
index 4f36557b3f3b..544e953342ea 100644
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@@ -230,7 +230,7 @@ alloc_request(struct intel_overlay *overlay, void 
(*fn)(struct intel_overlay *))
if (IS_ERR(rq))
return rq;
 
-   err = i915_active_ref(>last_flip, rq->timeline, rq);
+   err = i915_active_ref(>last_flip, rq->timeline, >fence);
if (err) {
i915_request_add(rq);
return ERR_PTR(err);
@@ -1360,8 +1360,7 @@ void intel_overlay_setup(struct drm_i915_private 
*dev_priv)
overlay->contrast = 75;
overlay->saturation = 146;
 
-   i915_active_init(dev_priv,
->last_flip,
+   i915_active_init(>last_flip,
 NULL, intel_overlay_last_flip_retire);
 
ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv));
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 9e72b42a86f5..ace50bb9ee1f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -160,7 +160,7 @@ static int move_to_active(struct i915_vma *vma, struct 
i915_request *rq)
if (err)
return err;
 
-   return i915_active_ref(>active, rq->timeline, rq);
+   return i915_active_ref(>active, rq->timeline, >fence);
 }
 
 static void clear_pages_worker(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 378ba9a2cd8a..d3ce8b5bccc5 100644
---