Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission

2016-09-26 Thread Chris Wilson
On Mon, Sep 26, 2016 at 12:06:19PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-09-26 at 10:04 +0100, Chris Wilson wrote:
> > On Mon, Sep 26, 2016 at 11:53:05AM +0300, Joonas Lahtinen wrote:
> > > 
> > > On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote:
> > > > 
> > > > Move the actual emission of the request (the closing breadcrumb) from
> > > > i915_add_request() to the submit callback. (It can be moved later when
> > > > required.) This allows us to defer the allocation of the global_seqno
> > > > from request construction to actual submission, allowing us to emit the
> > > > requests out of order (wrt to the order of their construction, they
> > > > still will only be executed one all of their dependencies are resolved
> > > > including that all earlier requests on their timeline have been
> > > > submitted.) We have to specialise how we then emit the request in order
> > > > to write into the preallocated space, rather than at the tail of the
> > > > ringbuffer (which will have been advanced by the addition of new
> > > > requests).
> > > 
> > > No changelog, so assuming *out++ style change is the only one.
> > 
> > Yeah, it was only stylistic changes, there should have been no
> > functional changes. I went with *out++ after confirming Tvrtko's report
> > that gcc is now smart enough to emit the same code as out[0..N] using
> > *out++.
> 
> It's a nice to have the changelog for the reviewer no matter if it was
> stylistic or not :P
> 
> Anyway, does it somehow get worse if "out" is embedded in the struct
> and not passed alongside it?

I'm not a fan. We have the start of the breadcrumb already, and the
current position of the emitter is not interesting after the event, only
through the function flow. That should be more consistent once we start
only refering to a local pointer for request emission rather than
intel_ring_emit(). (Who knows maybe we won't always be emitting the
requests into the ring in future...) I think storing request->ring_out (or
whatever) just leaves a dangling pointer behind.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission

2016-09-26 Thread Joonas Lahtinen
On ma, 2016-09-26 at 10:04 +0100, Chris Wilson wrote:
> On Mon, Sep 26, 2016 at 11:53:05AM +0300, Joonas Lahtinen wrote:
> > 
> > On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote:
> > > 
> > > Move the actual emission of the request (the closing breadcrumb) from
> > > i915_add_request() to the submit callback. (It can be moved later when
> > > required.) This allows us to defer the allocation of the global_seqno
> > > from request construction to actual submission, allowing us to emit the
> > > requests out of order (wrt to the order of their construction, they
> > > still will only be executed one all of their dependencies are resolved
> > > including that all earlier requests on their timeline have been
> > > submitted.) We have to specialise how we then emit the request in order
> > > to write into the preallocated space, rather than at the tail of the
> > > ringbuffer (which will have been advanced by the addition of new
> > > requests).
> > 
> > No changelog, so assuming *out++ style change is the only one.
> 
> Yeah, it was only stylistic changes, there should have been no
> functional changes. I went with *out++ after confirming Tvrtko's report
> that gcc is now smart enough to emit the same code as out[0..N] using
> *out++.

It's a nice to have the changelog for the reviewer no matter if it was
stylistic or not :P

Anyway, does it somehow get worse if "out" is embedded in the struct
and not passed alongside it?

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission

2016-09-26 Thread Chris Wilson
On Mon, Sep 26, 2016 at 11:53:05AM +0300, Joonas Lahtinen wrote:
> On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote:
> > Move the actual emission of the request (the closing breadcrumb) from
> > i915_add_request() to the submit callback. (It can be moved later when
> > required.) This allows us to defer the allocation of the global_seqno
> > from request construction to actual submission, allowing us to emit the
> > requests out of order (wrt to the order of their construction, they
> > still will only be executed one all of their dependencies are resolved
> > including that all earlier requests on their timeline have been
> > submitted.) We have to specialise how we then emit the request in order
> > to write into the preallocated space, rather than at the tail of the
> > ringbuffer (which will have been advanced by the addition of new
> > requests).
> 
> No changelog, so assuming *out++ style change is the only one.

Yeah, it was only stylistic changes, there should have been no
functional changes. I went with *out++ after confirming Tvrtko's report
that gcc is now smart enough to emit the same code as out[0..N] using
*out++.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission

2016-09-26 Thread Joonas Lahtinen
On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote:
> Move the actual emission of the request (the closing breadcrumb) from
> i915_add_request() to the submit callback. (It can be moved later when
> required.) This allows us to defer the allocation of the global_seqno
> from request construction to actual submission, allowing us to emit the
> requests out of order (wrt to the order of their construction, they
> still will only be executed one all of their dependencies are resolved
> including that all earlier requests on their timeline have been
> submitted.) We have to specialise how we then emit the request in order
> to write into the preallocated space, rather than at the tail of the
> ringbuffer (which will have been advanced by the addition of new
> requests).

No changelog, so assuming *out++ style change is the only one.

Reviewed-by: Joonas Lahtinen 

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission

2016-09-20 Thread Chris Wilson
Move the actual emission of the request (the closing breadcrumb) from
i915_add_request() to the submit callback. (It can be moved later when
required.) This allows us to defer the allocation of the global_seqno
from request construction to actual submission, allowing us to emit the
requests out of order (wrt to the order of their construction, they
still will only be executed one all of their dependencies are resolved
including that all earlier requests on their timeline have been
submitted.) We have to specialise how we then emit the request in order
to write into the preallocated space, rather than at the tail of the
ringbuffer (which will have been advanced by the addition of new
requests).

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem_request.c |  26 ++---
 drivers/gpu/drm/i915/intel_lrc.c| 120 ---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 168 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  10 +-
 4 files changed, 111 insertions(+), 213 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c 
b/drivers/gpu/drm/i915/i915_gem_request.c
index c6c31aab9e38..c31cb7c4a61f 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -632,9 +632,7 @@ void __i915_add_request(struct drm_i915_gem_request 
*request, bool flush_caches)
struct intel_ring *ring = request->ring;
struct intel_timeline *timeline = request->timeline;
struct drm_i915_gem_request *prev;
-   u32 request_start;
-   u32 reserved_tail;
-   int ret;
+   int err;
 
lockdep_assert_held(>i915->drm.struct_mutex);
trace_i915_gem_request_add(request);
@@ -644,8 +642,6 @@ void __i915_add_request(struct drm_i915_gem_request 
*request, bool flush_caches)
 * should already have been reserved in the ring buffer. Let the ring
 * know that it is time to use that space up.
 */
-   request_start = ring->tail;
-   reserved_tail = request->reserved_space;
request->reserved_space = 0;
 
/*
@@ -656,10 +652,10 @@ void __i915_add_request(struct drm_i915_gem_request 
*request, bool flush_caches)
 * what.
 */
if (flush_caches) {
-   ret = engine->emit_flush(request, EMIT_FLUSH);
+   err = engine->emit_flush(request, EMIT_FLUSH);
 
/* Not allowed to fail! */
-   WARN(ret, "engine->emit_flush() failed: %d!\n", ret);
+   WARN(err, "engine->emit_flush() failed: %d!\n", err);
}
 
/* Record the position of the start of the breadcrumb so that
@@ -667,20 +663,12 @@ void __i915_add_request(struct drm_i915_gem_request 
*request, bool flush_caches)
 * GPU processing the request, we never over-estimate the
 * position of the ring's HEAD.
 */
+   err = intel_ring_begin(request, engine->emit_request_sz);
+   GEM_BUG_ON(err);
request->postfix = ring->tail;
 
-   /* Not allowed to fail! */
-   ret = engine->emit_request(request);
-   WARN(ret, "(%s)->emit_request failed: %d!\n", engine->name, ret);
-
-   /* Sanity check that the reserved size was large enough. */
-   ret = ring->tail - request_start;
-   if (ret < 0)
-   ret += ring->size;
-   WARN_ONCE(ret > reserved_tail,
- "Not enough space reserved (%d bytes) "
- "for adding the request (%d bytes)\n",
- reserved_tail, ret);
+   engine->emit_request(request, request->ring->vaddr + request->postfix);
+   ring->tail += engine->emit_request_sz * sizeof(u32);
 
/* Seal the request and mark it as pending execution. Note that
 * we may inspect this state, without holding any locks, during
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 87e6bf4392fd..ffbb4da6cf95 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -360,7 +360,7 @@ static u64 execlists_update_context(struct 
drm_i915_gem_request *rq)
struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
u32 *reg_state = ce->lrc_reg_state;
 
-   reg_state[CTX_RING_TAIL+1] = intel_ring_offset(rq->ring, rq->tail);
+   reg_state[CTX_RING_TAIL+1] = rq->tail;
 
/* True 32b PPGTT with dynamic page allocation: update PDP
 * registers and point the unallocated PDPs to scratch page.
@@ -594,6 +594,15 @@ static void execlists_submit_request(struct 
drm_i915_gem_request *request)
 
spin_lock_irqsave(>execlist_lock, flags);
 
+   /* We keep the previous context alive until we retire the following
+* request. This ensures that any the context object is still pinned
+* for any residual writes the HW makes into it on the context switch
+* into the next object following the breadcrumb. Otherwise, we may
+* retire the context too early.
+*/
+