Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission
On Mon, Sep 26, 2016 at 12:06:19PM +0300, Joonas Lahtinen wrote: > On ma, 2016-09-26 at 10:04 +0100, Chris Wilson wrote: > > On Mon, Sep 26, 2016 at 11:53:05AM +0300, Joonas Lahtinen wrote: > > > > > > On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote: > > > > > > > > Move the actual emission of the request (the closing breadcrumb) from > > > > i915_add_request() to the submit callback. (It can be moved later when > > > > required.) This allows us to defer the allocation of the global_seqno > > > > from request construction to actual submission, allowing us to emit the > > > > requests out of order (wrt to the order of their construction, they > > > > still will only be executed one all of their dependencies are resolved > > > > including that all earlier requests on their timeline have been > > > > submitted.) We have to specialise how we then emit the request in order > > > > to write into the preallocated space, rather than at the tail of the > > > > ringbuffer (which will have been advanced by the addition of new > > > > requests). > > > > > > No changelog, so assuming *out++ style change is the only one. > > > > Yeah, it was only stylistic changes, there should have been no > > functional changes. I went with *out++ after confirming Tvrtko's report > > that gcc is now smart enough to emit the same code as out[0..N] using > > *out++. > > It's a nice to have the changelog for the reviewer no matter if it was > stylistic or not :P > > Anyway, does it somehow get worse if "out" is embedded in the struct > and not passed alongside it? I'm not a fan. We have the start of the breadcrumb already, and the current position of the emitter is not interesting after the event, only through the function flow. That should be more consistent once we start only refering to a local pointer for request emission rather than intel_ring_emit(). (Who knows maybe we won't always be emitting the requests into the ring in future...) I think storing request->ring_out (or whatever) just leaves a dangling pointer behind. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission
On ma, 2016-09-26 at 10:04 +0100, Chris Wilson wrote: > On Mon, Sep 26, 2016 at 11:53:05AM +0300, Joonas Lahtinen wrote: > > > > On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote: > > > > > > Move the actual emission of the request (the closing breadcrumb) from > > > i915_add_request() to the submit callback. (It can be moved later when > > > required.) This allows us to defer the allocation of the global_seqno > > > from request construction to actual submission, allowing us to emit the > > > requests out of order (wrt to the order of their construction, they > > > still will only be executed one all of their dependencies are resolved > > > including that all earlier requests on their timeline have been > > > submitted.) We have to specialise how we then emit the request in order > > > to write into the preallocated space, rather than at the tail of the > > > ringbuffer (which will have been advanced by the addition of new > > > requests). > > > > No changelog, so assuming *out++ style change is the only one. > > Yeah, it was only stylistic changes, there should have been no > functional changes. I went with *out++ after confirming Tvrtko's report > that gcc is now smart enough to emit the same code as out[0..N] using > *out++. It's a nice to have the changelog for the reviewer no matter if it was stylistic or not :P Anyway, does it somehow get worse if "out" is embedded in the struct and not passed alongside it? Regards, Joonas > -Chris > -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission
On Mon, Sep 26, 2016 at 11:53:05AM +0300, Joonas Lahtinen wrote: > On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote: > > Move the actual emission of the request (the closing breadcrumb) from > > i915_add_request() to the submit callback. (It can be moved later when > > required.) This allows us to defer the allocation of the global_seqno > > from request construction to actual submission, allowing us to emit the > > requests out of order (wrt to the order of their construction, they > > still will only be executed one all of their dependencies are resolved > > including that all earlier requests on their timeline have been > > submitted.) We have to specialise how we then emit the request in order > > to write into the preallocated space, rather than at the tail of the > > ringbuffer (which will have been advanced by the addition of new > > requests). > > No changelog, so assuming *out++ style change is the only one. Yeah, it was only stylistic changes, there should have been no functional changes. I went with *out++ after confirming Tvrtko's report that gcc is now smart enough to emit the same code as out[0..N] using *out++. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission
On ti, 2016-09-20 at 09:30 +0100, Chris Wilson wrote: > Move the actual emission of the request (the closing breadcrumb) from > i915_add_request() to the submit callback. (It can be moved later when > required.) This allows us to defer the allocation of the global_seqno > from request construction to actual submission, allowing us to emit the > requests out of order (wrt to the order of their construction, they > still will only be executed one all of their dependencies are resolved > including that all earlier requests on their timeline have been > submitted.) We have to specialise how we then emit the request in order > to write into the preallocated space, rather than at the tail of the > ringbuffer (which will have been advanced by the addition of new > requests). No changelog, so assuming *out++ style change is the only one. Reviewed-by: Joonas LahtinenRegards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 32/38] drm/i915: Defer request emission
Move the actual emission of the request (the closing breadcrumb) from i915_add_request() to the submit callback. (It can be moved later when required.) This allows us to defer the allocation of the global_seqno from request construction to actual submission, allowing us to emit the requests out of order (wrt to the order of their construction, they still will only be executed one all of their dependencies are resolved including that all earlier requests on their timeline have been submitted.) We have to specialise how we then emit the request in order to write into the preallocated space, rather than at the tail of the ringbuffer (which will have been advanced by the addition of new requests). Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_gem_request.c | 26 ++--- drivers/gpu/drm/i915/intel_lrc.c| 120 --- drivers/gpu/drm/i915/intel_ringbuffer.c | 168 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 10 +- 4 files changed, 111 insertions(+), 213 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index c6c31aab9e38..c31cb7c4a61f 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -632,9 +632,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches) struct intel_ring *ring = request->ring; struct intel_timeline *timeline = request->timeline; struct drm_i915_gem_request *prev; - u32 request_start; - u32 reserved_tail; - int ret; + int err; lockdep_assert_held(>i915->drm.struct_mutex); trace_i915_gem_request_add(request); @@ -644,8 +642,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches) * should already have been reserved in the ring buffer. Let the ring * know that it is time to use that space up. */ - request_start = ring->tail; - reserved_tail = request->reserved_space; request->reserved_space = 0; /* @@ -656,10 +652,10 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches) * what. */ if (flush_caches) { - ret = engine->emit_flush(request, EMIT_FLUSH); + err = engine->emit_flush(request, EMIT_FLUSH); /* Not allowed to fail! */ - WARN(ret, "engine->emit_flush() failed: %d!\n", ret); + WARN(err, "engine->emit_flush() failed: %d!\n", err); } /* Record the position of the start of the breadcrumb so that @@ -667,20 +663,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches) * GPU processing the request, we never over-estimate the * position of the ring's HEAD. */ + err = intel_ring_begin(request, engine->emit_request_sz); + GEM_BUG_ON(err); request->postfix = ring->tail; - /* Not allowed to fail! */ - ret = engine->emit_request(request); - WARN(ret, "(%s)->emit_request failed: %d!\n", engine->name, ret); - - /* Sanity check that the reserved size was large enough. */ - ret = ring->tail - request_start; - if (ret < 0) - ret += ring->size; - WARN_ONCE(ret > reserved_tail, - "Not enough space reserved (%d bytes) " - "for adding the request (%d bytes)\n", - reserved_tail, ret); + engine->emit_request(request, request->ring->vaddr + request->postfix); + ring->tail += engine->emit_request_sz * sizeof(u32); /* Seal the request and mark it as pending execution. Note that * we may inspect this state, without holding any locks, during diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 87e6bf4392fd..ffbb4da6cf95 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -360,7 +360,7 @@ static u64 execlists_update_context(struct drm_i915_gem_request *rq) struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt; u32 *reg_state = ce->lrc_reg_state; - reg_state[CTX_RING_TAIL+1] = intel_ring_offset(rq->ring, rq->tail); + reg_state[CTX_RING_TAIL+1] = rq->tail; /* True 32b PPGTT with dynamic page allocation: update PDP * registers and point the unallocated PDPs to scratch page. @@ -594,6 +594,15 @@ static void execlists_submit_request(struct drm_i915_gem_request *request) spin_lock_irqsave(>execlist_lock, flags); + /* We keep the previous context alive until we retire the following +* request. This ensures that any the context object is still pinned +* for any residual writes the HW makes into it on the context switch +* into the next object following the breadcrumb. Otherwise, we may +* retire the context too early. +*/ +