Re: [Intel-gfx] [PATCH 1/2] drm/i915/breadcrumbs: Reduce signaler rbtree to a sorted list

2018-03-06 Thread Chris Wilson
Quoting Joonas Lahtinen (2018-03-05 14:25:07)
> As a pre-existing condition it's bit surprising that intel_breadcrumbs_busy()
> kicks the signaler as a side-effect.

Since we explicitly cancel the signaling on retire now (to prevent the
accumulation of references), we can actually say that we don't need to
check the signaler for busy any more. We still need to spin over kicking
the waiters though... Hmm, though we can just walk the tree and mark
them as complete now.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915/breadcrumbs: Reduce signaler rbtree to a sorted list

2018-03-05 Thread Joonas Lahtinen
Quoting Chris Wilson (2018-02-22 11:25:44)
> The goal here is to try and reduce the latency of signaling additional
> requests following the wakeup from interrupt by reducing the list of
> to-be-signaled requests from an rbtree to a sorted linked list. The
> original choice of using an rbtree was to facilitate random insertions
> of request into the signaler while maintaining a sorted list. However,
> if we assume that most new requests are added when they are submitted,
> we see those new requests in execution order making a insertion sort
> fast, and the reduction in overhead of each signaler iteration
> significant.
> 
> Since commit 56299fb7d904 ("drm/i915: Signal first fence from irq handler
> if complete"), we signal most fences directly from notify_ring() in the
> interrupt handler greatly reducing the amount of work that actually
> needs to be done by the signaler kthread. All the thread is then
> required to do is operate as the bottom-half, cleaning up after the
> interrupt handler and preparing the next waiter. This includes signaling
> all later completed fences in a saturated system, but on a mostly idle
> system we only have to rebuild the wait rbtree in time for the next
> interrupt. With this de-emphasis of the signaler's role, we want to
> rejig it's datastructures to reduce the amount of work we require to
> both setup the signal tree and maintain it on every interrupt.
> 
> References: 56299fb7d904 ("drm/i915: Signal first fence from irq handler if 
> complete")
> Signed-off-by: Chris Wilson 
> Cc: Tvrtko Ursulin 
> Cc: Mika Kuoppala 
> Cc: Joonas Lahtinen 

"rb_lock" is now bit of a misleading name. We could generally improve a
lot on documenting which locks apply to which data.

As a pre-existing condition it's bit surprising that intel_breadcrumbs_busy()
kicks the signaler as a side-effect.

Reviewed-by: Joonas Lahtinen 

Regards, Joonas
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/2] drm/i915/breadcrumbs: Reduce signaler rbtree to a sorted list

2018-02-22 Thread Chris Wilson
The goal here is to try and reduce the latency of signaling additional
requests following the wakeup from interrupt by reducing the list of
to-be-signaled requests from an rbtree to a sorted linked list. The
original choice of using an rbtree was to facilitate random insertions
of request into the signaler while maintaining a sorted list. However,
if we assume that most new requests are added when they are submitted,
we see those new requests in execution order making a insertion sort
fast, and the reduction in overhead of each signaler iteration
significant.

Since commit 56299fb7d904 ("drm/i915: Signal first fence from irq handler
if complete"), we signal most fences directly from notify_ring() in the
interrupt handler greatly reducing the amount of work that actually
needs to be done by the signaler kthread. All the thread is then
required to do is operate as the bottom-half, cleaning up after the
interrupt handler and preparing the next waiter. This includes signaling
all later completed fences in a saturated system, but on a mostly idle
system we only have to rebuild the wait rbtree in time for the next
interrupt. With this de-emphasis of the signaler's role, we want to
rejig it's datastructures to reduce the amount of work we require to
both setup the signal tree and maintain it on every interrupt.

References: 56299fb7d904 ("drm/i915: Signal first fence from irq handler if 
complete")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: Mika Kuoppala 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_request.h  |   2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 261 +--
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   4 +-
 3 files changed, 116 insertions(+), 151 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.h 
b/drivers/gpu/drm/i915/i915_request.h
index 74311fc53e2f..7d6eb82eeb91 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -44,8 +44,8 @@ struct intel_wait {
 };
 
 struct intel_signal_node {
-   struct rb_node node;
struct intel_wait wait;
+   struct list_head link;
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index a83690642aab..be50c2bebdf0 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -336,7 +336,8 @@ static inline void __intel_breadcrumbs_finish(struct 
intel_breadcrumbs *b,
lockdep_assert_held(>rb_lock);
GEM_BUG_ON(b->irq_wait == wait);
 
-   /* This request is completed, so remove it from the tree, mark it as
+   /*
+* This request is completed, so remove it from the tree, mark it as
 * complete, and *then* wake up the associated task. N.B. when the
 * task wakes up, it will find the empty rb_node, discern that it
 * has already been removed from the tree and skip the serialisation
@@ -347,7 +348,8 @@ static inline void __intel_breadcrumbs_finish(struct 
intel_breadcrumbs *b,
rb_erase(>node, >waiters);
RB_CLEAR_NODE(>node);
 
-   wake_up_process(wait->tsk); /* implicit smp_wmb() */
+   if (wait->tsk->state != TASK_RUNNING)
+   wake_up_process(wait->tsk); /* implicit smp_wmb() */
 }
 
 static inline void __intel_breadcrumbs_next(struct intel_engine_cs *engine,
@@ -588,23 +590,6 @@ void intel_engine_remove_wait(struct intel_engine_cs 
*engine,
spin_unlock_irq(>rb_lock);
 }
 
-static bool signal_complete(const struct i915_request *request)
-{
-   if (!request)
-   return false;
-
-   /*
-* Carefully check if the request is complete, giving time for the
-* seqno to be visible or if the GPU hung.
-*/
-   return __i915_request_irq_complete(request);
-}
-
-static struct i915_request *to_signaler(struct rb_node *rb)
-{
-   return rb_entry(rb, struct i915_request, signaling.node);
-}
-
 static void signaler_set_rtpriority(void)
 {
 struct sched_param param = { .sched_priority = 1 };
@@ -612,78 +597,26 @@ static void signaler_set_rtpriority(void)
 sched_setscheduler_nocheck(current, SCHED_FIFO, );
 }
 
-static void __intel_engine_remove_signal(struct intel_engine_cs *engine,
-struct i915_request *request)
-{
-   struct intel_breadcrumbs *b = >breadcrumbs;
-
-   lockdep_assert_held(>rb_lock);
-
-   /*
-* Wake up all other completed waiters and select the
-* next bottom-half for the next user interrupt.
-*/
-   __intel_engine_remove_wait(engine, >signaling.wait);
-
-   /*
-* Find the next oldest signal. Note that as we have
-* not been holding the lock, another client may
-* have installed an even older signal than the one
-* we just completed - so double check