Re: [Intel-gfx] [PATCH 13/27] drm/i915/execlists: Pack the count into the low bits of the port.request

2017-04-28 Thread Chris Wilson
On Fri, Apr 28, 2017 at 01:02:25PM +0100, Tvrtko Ursulin wrote:
> 
> On 27/04/2017 15:37, Chris Wilson wrote:
> >On Thu, Apr 20, 2017 at 03:58:19PM +0100, Tvrtko Ursulin wrote:
> >>>static void record_context(struct drm_i915_error_context *e,
> >>>diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
> >>>b/drivers/gpu/drm/i915/i915_guc_submission.c
> >>>index 1642fff9cf13..370373c97b81 100644
> >>>--- a/drivers/gpu/drm/i915/i915_guc_submission.c
> >>>+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> >>>@@ -658,7 +658,7 @@ static void nested_enable_signaling(struct 
> >>>drm_i915_gem_request *rq)
> >>>static bool i915_guc_dequeue(struct intel_engine_cs *engine)
> >>>{
> >>>   struct execlist_port *port = engine->execlist_port;
> >>>-  struct drm_i915_gem_request *last = port[0].request;
> >>>+  struct drm_i915_gem_request *last = port[0].request_count;
> >>
> >>It's confusing that in this new scheme sometimes we have direct
> >>access to the request and sometimes we have to go through the
> >>port_request macro.
> >>
> >>So maybe we should always use the port_request macro. Hm, could we
> >>invent a new type to help enforce that? Like:
> >>
> >>struct drm_i915_gem_port_request_slot {
> >>struct drm_i915_gem_request *req_count;
> >>};
> >>
> >>And then execlist port would contain these and helpers would need to
> >>be functions?
> >>
> >>I've also noticed some GVT/GuC patches which sounded like they are
> >>adding the same single submission constraints so maybe now is the
> >>time to unify the dequeue? (Haven't looked at those patches deeper
> >>than the subject line so might be wrong.)
> >>
> >>Not sure 100% of all the above, would need to sketch it. What are
> >>your thoughts?
> >
> >I forsee a use for the count in guc as well, so conversion is ok with
> >me.
> 
> Conversion to a wrapper structure as I proposed or keeping it as you
> have it?
> 
> >>>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
> >>>b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >>>index d25b88467e5e..39b733e5cfd3 100644
> >>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> >>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >>>@@ -377,8 +377,12 @@ struct intel_engine_cs {
> >>>   /* Execlists */
> >>>   struct tasklet_struct irq_tasklet;
> >>>   struct execlist_port {
> >>>-  struct drm_i915_gem_request *request;
> >>>-  unsigned int count;
> >>>+  struct drm_i915_gem_request *request_count;
> >>
> >>Would req(uest)_slot maybe be better?
> >
> >It's definitely a count (of how many times this request has been
> >submitted), and I like long verbose names when I don't want them to be
> >used directly. So expect guc to be tidied.
> 
> It is a pointer and a count. My point was that request_count sounds
> too much like a count of how many times has something been done or
> happened to the request.
> 
> Request_slot was my attempt to make it obvious in the name itself
> there is more to it. And the wrapper struct was another step
> further, plus the idea was it would make sure you always need to
> access this field via the accessor. Since I think going sometimes
> directly and sometimes via wrapper is too fragile.

I read slot as port[slot], whereas I am using as a count of how many
times I have done something with this request/context.

> Anyway, my big issue is I am not sure if we are in agreement or not.
> Do you agree going with the wrapper structure makes sense or not?

I'm using port_request() in guc, see the version in #prescheduler.

What I haven't come up with is a good plan for assignment, which
is still using port->request_count = port_pack() but now that is limited
to just the port_assign() functions.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 13/27] drm/i915/execlists: Pack the count into the low bits of the port.request

2017-04-28 Thread Tvrtko Ursulin


On 27/04/2017 15:37, Chris Wilson wrote:

On Thu, Apr 20, 2017 at 03:58:19PM +0100, Tvrtko Ursulin wrote:

static void record_context(struct drm_i915_error_context *e,
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
b/drivers/gpu/drm/i915/i915_guc_submission.c
index 1642fff9cf13..370373c97b81 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -658,7 +658,7 @@ static void nested_enable_signaling(struct 
drm_i915_gem_request *rq)
static bool i915_guc_dequeue(struct intel_engine_cs *engine)
{
struct execlist_port *port = engine->execlist_port;
-   struct drm_i915_gem_request *last = port[0].request;
+   struct drm_i915_gem_request *last = port[0].request_count;


It's confusing that in this new scheme sometimes we have direct
access to the request and sometimes we have to go through the
port_request macro.

So maybe we should always use the port_request macro. Hm, could we
invent a new type to help enforce that? Like:

struct drm_i915_gem_port_request_slot {
struct drm_i915_gem_request *req_count;
};

And then execlist port would contain these and helpers would need to
be functions?

I've also noticed some GVT/GuC patches which sounded like they are
adding the same single submission constraints so maybe now is the
time to unify the dequeue? (Haven't looked at those patches deeper
than the subject line so might be wrong.)

Not sure 100% of all the above, would need to sketch it. What are
your thoughts?


I forsee a use for the count in guc as well, so conversion is ok with
me.


Conversion to a wrapper structure as I proposed or keeping it as you 
have it?



diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d25b88467e5e..39b733e5cfd3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -377,8 +377,12 @@ struct intel_engine_cs {
/* Execlists */
struct tasklet_struct irq_tasklet;
struct execlist_port {
-   struct drm_i915_gem_request *request;
-   unsigned int count;
+   struct drm_i915_gem_request *request_count;


Would req(uest)_slot maybe be better?


It's definitely a count (of how many times this request has been
submitted), and I like long verbose names when I don't want them to be
used directly. So expect guc to be tidied.


It is a pointer and a count. My point was that request_count sounds too 
much like a count of how many times has something been done or happened 
to the request.


Request_slot was my attempt to make it obvious in the name itself there 
is more to it. And the wrapper struct was another step further, plus the 
idea was it would make sure you always need to access this field via the 
accessor. Since I think going sometimes directly and sometimes via 
wrapper is too fragile.


Anyway, my big issue is I am not sure if we are in agreement or not. Do 
you agree going with the wrapper structure makes sense or not?


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 13/27] drm/i915/execlists: Pack the count into the low bits of the port.request

2017-04-27 Thread Chris Wilson
On Thu, Apr 20, 2017 at 03:58:19PM +0100, Tvrtko Ursulin wrote:
> > static void record_context(struct drm_i915_error_context *e,
> >diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
> >b/drivers/gpu/drm/i915/i915_guc_submission.c
> >index 1642fff9cf13..370373c97b81 100644
> >--- a/drivers/gpu/drm/i915/i915_guc_submission.c
> >+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> >@@ -658,7 +658,7 @@ static void nested_enable_signaling(struct 
> >drm_i915_gem_request *rq)
> > static bool i915_guc_dequeue(struct intel_engine_cs *engine)
> > {
> > struct execlist_port *port = engine->execlist_port;
> >-struct drm_i915_gem_request *last = port[0].request;
> >+struct drm_i915_gem_request *last = port[0].request_count;
> 
> It's confusing that in this new scheme sometimes we have direct
> access to the request and sometimes we have to go through the
> port_request macro.
> 
> So maybe we should always use the port_request macro. Hm, could we
> invent a new type to help enforce that? Like:
> 
> struct drm_i915_gem_port_request_slot {
>   struct drm_i915_gem_request *req_count;
> };
> 
> And then execlist port would contain these and helpers would need to
> be functions?
> 
> I've also noticed some GVT/GuC patches which sounded like they are
> adding the same single submission constraints so maybe now is the
> time to unify the dequeue? (Haven't looked at those patches deeper
> than the subject line so might be wrong.)
> 
> Not sure 100% of all the above, would need to sketch it. What are
> your thoughts?

I forsee a use for the count in guc as well, so conversion is ok with
me.

> >diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> >b/drivers/gpu/drm/i915/intel_lrc.c
> >index 7df278fe492e..69299fbab4f9 100644
> >--- a/drivers/gpu/drm/i915/intel_lrc.c
> >+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >@@ -342,39 +342,32 @@ static u64 execlists_update_context(struct 
> >drm_i915_gem_request *rq)
> >
> > static void execlists_submit_ports(struct intel_engine_cs *engine)
> > {
> >-struct drm_i915_private *dev_priv = engine->i915;
> > struct execlist_port *port = engine->execlist_port;
> > u32 __iomem *elsp =
> >-dev_priv->regs + i915_mmio_reg_offset(RING_ELSP(engine));
> >-u64 desc[2];
> >-
> >-GEM_BUG_ON(port[0].count > 1);
> >-if (!port[0].count)
> >-execlists_context_status_change(port[0].request,
> >-INTEL_CONTEXT_SCHEDULE_IN);
> >-desc[0] = execlists_update_context(port[0].request);
> >-GEM_DEBUG_EXEC(port[0].context_id = upper_32_bits(desc[0]));
> >-port[0].count++;
> >-
> >-if (port[1].request) {
> >-GEM_BUG_ON(port[1].count);
> >-execlists_context_status_change(port[1].request,
> >-INTEL_CONTEXT_SCHEDULE_IN);
> >-desc[1] = execlists_update_context(port[1].request);
> >-GEM_DEBUG_EXEC(port[1].context_id = upper_32_bits(desc[1]));
> >-port[1].count = 1;
> >-} else {
> >-desc[1] = 0;
> >-}
> >-GEM_BUG_ON(desc[0] == desc[1]);
> >-
> >-/* You must always write both descriptors in the order below. */
> >-writel(upper_32_bits(desc[1]), elsp);
> >-writel(lower_32_bits(desc[1]), elsp);
> >+engine->i915->regs + i915_mmio_reg_offset(RING_ELSP(engine));
> >+unsigned int n;
> >+
> >+for (n = ARRAY_SIZE(engine->execlist_port); n--; ) {
> 
> We could also add for_each_req_port or something, to iterate and
> unpack either req only or the count as well?

for_each_port_reverse? We're looking at very special cases here!

I'm not sure and I'm playing with different structures.

> >diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
> >b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >index d25b88467e5e..39b733e5cfd3 100644
> >--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >@@ -377,8 +377,12 @@ struct intel_engine_cs {
> > /* Execlists */
> > struct tasklet_struct irq_tasklet;
> > struct execlist_port {
> >-struct drm_i915_gem_request *request;
> >-unsigned int count;
> >+struct drm_i915_gem_request *request_count;
> 
> Would req(uest)_slot maybe be better?

It's definitely a count (of how many times this request has been
submitted), and I like long verbose names when I don't want them to be
used directly. So expect guc to be tidied.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 13/27] drm/i915/execlists: Pack the count into the low bits of the port.request

2017-04-20 Thread Tvrtko Ursulin


On 19/04/2017 10:41, Chris Wilson wrote:

add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function old new   delta
execlists_submit_ports   262 471+209
port_assign.isra   - 136+136
capture 63446359 +15
reset_common_ring438 452 +14
execlists_submit_request 228 238 +10
gen8_init_common_ring334 341  +7
intel_engine_is_idle 106 105  -1
i915_engine_info23142290 -24
__i915_gem_set_wedged_BKL485 411 -74
intel_lrc_irq_handler   17891604-185
execlists_update_context 294   --294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_debugfs.c|  32 ---
 drivers/gpu/drm/i915/i915_gem.c|   6 +-
 drivers/gpu/drm/i915/i915_gpu_error.c  |  13 ++-
 drivers/gpu/drm/i915/i915_guc_submission.c |  18 ++--
 drivers/gpu/drm/i915/intel_engine_cs.c |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c   | 133 -
 drivers/gpu/drm/i915/intel_ringbuffer.h|   8 +-
 7 files changed, 120 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 870c470177b5..0b5d7142d8d9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
if (i915.enable_execlists) {
u32 ptr, read, write;
struct rb_node *rb;
+   unsigned int idx;

seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
   I915_READ(RING_EXECLIST_STATUS_LO(engine)),
@@ -3332,8 +,7 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
if (read > write)
write += GEN8_CSB_ENTRIES;
while (read < write) {
-   unsigned int idx = ++read % GEN8_CSB_ENTRIES;
-
+   idx = ++read % GEN8_CSB_ENTRIES;
seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: 
%d\n",
   idx,
   
I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
@@ -3341,21 +3341,19 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
}

rcu_read_lock();
-   rq = READ_ONCE(engine->execlist_port[0].request);
-   if (rq) {
-   seq_printf(m, "\t\tELSP[0] count=%d, ",
-  engine->execlist_port[0].count);
-   print_request(m, rq, "rq: ");
-   } else {
-   seq_printf(m, "\t\tELSP[0] idle\n");
-   }
-   rq = READ_ONCE(engine->execlist_port[1].request);
-   if (rq) {
-   seq_printf(m, "\t\tELSP[1] count=%d, ",
-  engine->execlist_port[1].count);
-   print_request(m, rq, "rq: ");
-   } else {
-   seq_printf(m, "\t\tELSP[1] idle\n");
+   for (idx = 0; idx < ARRAY_SIZE(engine->execlist_port); 
idx++) {
+   unsigned int count;
+
+   rq = port_unpack(>execlist_port[idx],
+);
+   if (rq) {
+   seq_printf(m, "\t\tELSP[%d] count=%d, ",
+  idx, count);
+   print_request(m, rq, "rq: ");
+   } else {
+   seq_printf(m, "\t\tELSP[%d] idle\n",
+  idx);
+   }
}
rcu_read_unlock();

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2bc72314cdd1..f6df402a5247 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3039,12 +3039,14 @@ static void engine_set_wedged(struct intel_engine_cs 

[Intel-gfx] [PATCH 13/27] drm/i915/execlists: Pack the count into the low bits of the port.request

2017-04-19 Thread Chris Wilson
add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function old new   delta
execlists_submit_ports   262 471+209
port_assign.isra   - 136+136
capture 63446359 +15
reset_common_ring438 452 +14
execlists_submit_request 228 238 +10
gen8_init_common_ring334 341  +7
intel_engine_is_idle 106 105  -1
i915_engine_info23142290 -24
__i915_gem_set_wedged_BKL485 411 -74
intel_lrc_irq_handler   17891604-185
execlists_update_context 294   --294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_debugfs.c|  32 ---
 drivers/gpu/drm/i915/i915_gem.c|   6 +-
 drivers/gpu/drm/i915/i915_gpu_error.c  |  13 ++-
 drivers/gpu/drm/i915/i915_guc_submission.c |  18 ++--
 drivers/gpu/drm/i915/intel_engine_cs.c |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c   | 133 -
 drivers/gpu/drm/i915/intel_ringbuffer.h|   8 +-
 7 files changed, 120 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 870c470177b5..0b5d7142d8d9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
if (i915.enable_execlists) {
u32 ptr, read, write;
struct rb_node *rb;
+   unsigned int idx;
 
seq_printf(m, "\tExeclist status: 0x%08x %08x\n",
   I915_READ(RING_EXECLIST_STATUS_LO(engine)),
@@ -3332,8 +,7 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
if (read > write)
write += GEN8_CSB_ENTRIES;
while (read < write) {
-   unsigned int idx = ++read % GEN8_CSB_ENTRIES;
-
+   idx = ++read % GEN8_CSB_ENTRIES;
seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, 
context: %d\n",
   idx,
   
I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
@@ -3341,21 +3341,19 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
}
 
rcu_read_lock();
-   rq = READ_ONCE(engine->execlist_port[0].request);
-   if (rq) {
-   seq_printf(m, "\t\tELSP[0] count=%d, ",
-  engine->execlist_port[0].count);
-   print_request(m, rq, "rq: ");
-   } else {
-   seq_printf(m, "\t\tELSP[0] idle\n");
-   }
-   rq = READ_ONCE(engine->execlist_port[1].request);
-   if (rq) {
-   seq_printf(m, "\t\tELSP[1] count=%d, ",
-  engine->execlist_port[1].count);
-   print_request(m, rq, "rq: ");
-   } else {
-   seq_printf(m, "\t\tELSP[1] idle\n");
+   for (idx = 0; idx < ARRAY_SIZE(engine->execlist_port); 
idx++) {
+   unsigned int count;
+
+   rq = port_unpack(>execlist_port[idx],
+);
+   if (rq) {
+   seq_printf(m, "\t\tELSP[%d] count=%d, ",
+  idx, count);
+   print_request(m, rq, "rq: ");
+   } else {
+   seq_printf(m, "\t\tELSP[%d] idle\n",
+  idx);
+   }
}
rcu_read_unlock();
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2bc72314cdd1..f6df402a5247 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3039,12 +3039,14 @@ static void engine_set_wedged(struct intel_engine_cs 
*engine)
 */
 
if