Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Chris Wilson
On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> +static void
> +run_workload(unsigned int id, struct workload *wrk, unsigned int repeat,
> +  enum intel_engine_id (*balance)(struct workload *wrk,
> +  struct w_step *w),
> +  unsigned int flags)
> +{
> + struct timespec t_start, t_end;
> + struct w_step *w;
> + double t;
> + int i, j;
> +
> + clock_gettime(CLOCK_MONOTONIC, _start);
> +
> + srand(t_start.tv_nsec);

Let's supply the seed with the workload specification. And use a
portable prng so we can be sure we can reproduce results from one system
to the next.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Chris Wilson
On Thu, Apr 20, 2017 at 03:34:56PM +0100, Tvrtko Ursulin wrote:
> 
> On 20/04/2017 15:23, Chris Wilson wrote:
> >On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote:
> >>+static void
> >>+alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb 
> >>*b,
> >>+enum intel_engine_id engine, unsigned int flags)
> >>+{
> >>+   unsigned int bb_i, j = 0;
> >>+
> >>+   b->obj[j].handle = gem_create(fd, 4096);
> >>+   b->obj[j].flags = EXEC_OBJECT_WRITE;
> >>+   j++;
> >>+
> >>+   if (flags & SEQNO) {
> >>+   b->obj[j].handle = wrk->status_page_handle;
> >>+   j++;
> >>+   }
> >>+
> >>+   bb_i = j++;
> >>+   b->bb_sz = get_bb_sz(w->duration.max);
> >>+   b->bb_handle = b->obj[bb_i].handle = gem_create(fd, b->bb_sz);
> >>+   terminate_bb(w, b, engine, flags);
> >>+
> >>+   igt_assert(w->dependency <= 0);
> >>+   if (w->dependency) {
> >>+   int dep_idx = w->idx + w->dependency;
> >>+
> >>+   igt_assert(dep_idx >= 0 && dep_idx < wrk->nr_steps);
> >>+   igt_assert(wrk->steps[dep_idx].type == BATCH);
> >>+
> >>+   b->obj[j].handle = b->obj[bb_i].handle;
> >>+   bb_i = j;
> >>+   b->obj[j - 1].handle = wrk->steps[dep_idx].b[0].obj[0].handle;
> >>+   j++;
> >>+
> >>+   if (wrk->steps[dep_idx].b[1].obj[0].handle) {
> >>+   b->obj[j].handle = b->obj[bb_i].handle;
> >>+   bb_i = j;
> >>+   b->obj[j - 1].handle =
> >>+   wrk->steps[dep_idx].b[1].obj[0].handle;
> >>+   j++;
> >>+   }
> >>+   }
> >>+
> >>+   if (flags & SEQNO) {
> >>+   b->reloc.presumed_offset = -1;
> >
> >So as I understand it, you are caching the execbuf/obj/reloc for the
> >workload and then may reissue later with different seqno on different
> >rings? In which case we have a problem as the kernel will write back the >
> >
> >updated offsets to b->reloc.presumed_offset and b->obj[].offset and in
> >future passes they will match and the seqno write will go into the wrong
> >slot (if it swaps rings).
> >
> >You either want to reset presumed_offset=-1 each time, or better for all
> >concerned write the correct address alongside the seqno (which also
> >enables NORELOC).
> >
> >Delta incoming.
> 
> Only the seqno changes, but each engine has its own eb/obj/reloc. So
> I think there is no problem. Or is there still?

bb_handle is per engine as well. Ugh. No, that seems like you are
self-consistent, you just need to remove the -1 and your code is NORELOC
correct.

I wouldn't go so far as having entirely separate batches for each engine
though :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Tvrtko Ursulin


On 20/04/2017 15:52, Chris Wilson wrote:

On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote:

+   wrk->nr_bb[engine]++;
+
+   if (engine == VCS && balance) {
+   engine = balance(wrk, w);
+   wrk->nr_bb[engine]++;
+   b = >b[engine - VCS1];
+
+   if (flags & SEQNO)
+   update_bb_seqno(b, engine,
+   ++wrk->seqno[engine]);
+   }
+
+   if (w->duration.min != w->duration.max) {
+   unsigned int d = get_duration(>duration);
+   unsigned long offset;
+
+   offset = ALIGN(b->bb_sz - get_bb_sz(d),
+  2 * sizeof(uint32_t));
+   b->eb.batch_start_offset = offset;
+   }
+
+   gem_execbuf(fd, >eb);


Likely double counting wrk->nr_bb. I suggest placing it next to the
gem_execbuf().


Just convenience in balancing mode so that nr(VCS) = nr(VCS1) + 
nr(VCS2). Also from a different angle, if the sum does not hold, that 
means workload had auto-balancing and explicit VCS1/2 batches. It's only 
used to print out the stats at the end of the run.


Regards,

Tvrtko

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Chris Wilson
On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote:
> + wrk->nr_bb[engine]++;
> +
> + if (engine == VCS && balance) {
> + engine = balance(wrk, w);
> + wrk->nr_bb[engine]++;
> + b = >b[engine - VCS1];
> +
> + if (flags & SEQNO)
> + update_bb_seqno(b, engine,
> + ++wrk->seqno[engine]);
> + }
> +
> + if (w->duration.min != w->duration.max) {
> + unsigned int d = get_duration(>duration);
> + unsigned long offset;
> +
> + offset = ALIGN(b->bb_sz - get_bb_sz(d),
> +2 * sizeof(uint32_t));
> + b->eb.batch_start_offset = offset;
> + }
> +
> + gem_execbuf(fd, >eb);

Likely double counting wrk->nr_bb. I suggest placing it next to the
gem_execbuf().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Tvrtko Ursulin


On 20/04/2017 15:33, Chris Wilson wrote:

On Thu, Apr 20, 2017 at 03:23:27PM +0100, Chris Wilson wrote:

You either want to reset presumed_offset=-1 each time, or better for all
concerned write the correct address alongside the seqno (which also
enables NORELOC).

Delta incoming.


See attached.

Next concern is that I have full rings which implies that we are not
waiting on each batch before resubmitting with a new seqno?

If I throw a assert(!busy(batch_bo)) before the *b->mapped_seqno am I
going to be upset?


Yes you would. :) I had a sync (as a move to cpu domain) before seqno 
update in the last version but it disappeared as I was fixing the whole 
area of seqno tracking. So the balancing results in the patch are bogus 
since the seqno can jump to latest ahead of the time...


Regards,

Tvrtko

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Tvrtko Ursulin


On 20/04/2017 15:23, Chris Wilson wrote:

On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote:

+static void
+alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b,
+enum intel_engine_id engine, unsigned int flags)
+{
+   unsigned int bb_i, j = 0;
+
+   b->obj[j].handle = gem_create(fd, 4096);
+   b->obj[j].flags = EXEC_OBJECT_WRITE;
+   j++;
+
+   if (flags & SEQNO) {
+   b->obj[j].handle = wrk->status_page_handle;
+   j++;
+   }
+
+   bb_i = j++;
+   b->bb_sz = get_bb_sz(w->duration.max);
+   b->bb_handle = b->obj[bb_i].handle = gem_create(fd, b->bb_sz);
+   terminate_bb(w, b, engine, flags);
+
+   igt_assert(w->dependency <= 0);
+   if (w->dependency) {
+   int dep_idx = w->idx + w->dependency;
+
+   igt_assert(dep_idx >= 0 && dep_idx < wrk->nr_steps);
+   igt_assert(wrk->steps[dep_idx].type == BATCH);
+
+   b->obj[j].handle = b->obj[bb_i].handle;
+   bb_i = j;
+   b->obj[j - 1].handle = wrk->steps[dep_idx].b[0].obj[0].handle;
+   j++;
+
+   if (wrk->steps[dep_idx].b[1].obj[0].handle) {
+   b->obj[j].handle = b->obj[bb_i].handle;
+   bb_i = j;
+   b->obj[j - 1].handle =
+   wrk->steps[dep_idx].b[1].obj[0].handle;
+   j++;
+   }
+   }
+
+   if (flags & SEQNO) {
+   b->reloc.presumed_offset = -1;


So as I understand it, you are caching the execbuf/obj/reloc for the
workload and then may reissue later with different seqno on different
rings? In which case we have a problem as the kernel will write back the >

>

updated offsets to b->reloc.presumed_offset and b->obj[].offset and in
future passes they will match and the seqno write will go into the wrong
slot (if it swaps rings).

You either want to reset presumed_offset=-1 each time, or better for all
concerned write the correct address alongside the seqno (which also
enables NORELOC).

Delta incoming.


Only the seqno changes, but each engine has its own eb/obj/reloc. So I 
think there is no problem. Or is there still?


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Chris Wilson
On Thu, Apr 20, 2017 at 03:23:27PM +0100, Chris Wilson wrote:
> You either want to reset presumed_offset=-1 each time, or better for all
> concerned write the correct address alongside the seqno (which also
> enables NORELOC).
> 
> Delta incoming.

See attached.

Next concern is that I have full rings which implies that we are not
waiting on each batch before resubmitting with a new seqno?

If I throw a assert(!busy(batch_bo)) before the *b->mapped_seqno am I
going to be upset?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
>From 5bf424c2719e81f926b74f4136610cbdfd26a4d8 Mon Sep 17 00:00:00 2001
From: Chris Wilson 
Date: Thu, 20 Apr 2017 15:30:07 +0100
Subject: [PATCH] seqno-reloc

---
 benchmarks/gem_wsim.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index adf2d6de..e616335b 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -45,6 +45,8 @@
 #include "drmtest.h"
 #include "intel_io.h"
 
+#define LOCAL_I915_GEM_DOMAIN_WC 0x80
+
 enum intel_engine_id {
 	RCS,
 	BCS,
@@ -86,7 +88,9 @@ struct w_step
 		struct drm_i915_gem_relocation_entry reloc;
 		unsigned long bb_sz;
 		uint32_t bb_handle;
-		uint32_t *mapped_batch, *mapped_seqno;
+		uint32_t *mapped_batch;
+		uint64_t *mapped_address;
+		uint32_t *mapped_seqno;
 		unsigned int mapped_len;
 	} b[2]; /* One for each VCS when load balancing */
 };
@@ -405,7 +409,8 @@ terminate_bb(struct w_step *w, struct w_step_eb *b, enum intel_engine_id engine,
 	mmap_len += cmd_offset - mmap_start;
 
 	gem_set_domain(fd, b->bb_handle,
-		   I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
+		   LOCAL_I915_GEM_DOMAIN_WC,
+		   LOCAL_I915_GEM_DOMAIN_WC);
 
 	ptr = gem_mmap__wc(fd, b->bb_handle, mmap_start, mmap_len, PROT_WRITE);
 	cs = (uint32_t *)((char *)ptr + cmd_offset - mmap_start);
@@ -415,6 +420,7 @@ terminate_bb(struct w_step *w, struct w_step_eb *b, enum intel_engine_id engine,
 		b->reloc.delta = (engine - VCS1) * sizeof(uint32_t);
 
 		*cs++ = MI_STORE_DWORD_IMM;
+		b->mapped_address = (uint64_t *)cs;
 		*cs++ = 0;
 		*cs++ = 0;
 		b->mapped_seqno = cs;
@@ -469,7 +475,6 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b,
 	}
 
 	if (flags & SEQNO) {
-		b->reloc.presumed_offset = -1;
 		b->reloc.target_handle = 1;
 		b->obj[bb_i].relocs_ptr = to_user_pointer(>reloc);
 		b->obj[bb_i].relocation_count = 1;
@@ -485,8 +490,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b,
 		engine = VCS1;
 	b->eb.flags = eb_engine_map[engine];
 	b->eb.flags |= I915_EXEC_HANDLE_LUT;
-	if (!(flags & SEQNO))
-		b->eb.flags |= I915_EXEC_NO_RELOC;
+	b->eb.flags |= I915_EXEC_NO_RELOC;
 #ifdef DEBUG
 	printf("%u: %u:%x|%x|%x|%x %10lu flags=%llx bb=%x[%u] ctx[%u]=%u\n",
 		w->idx, b->eb.buffer_count, b->obj[0].handle,
@@ -628,8 +632,9 @@ static void
 update_bb_seqno(struct w_step_eb *b, enum intel_engine_id engine,
 		uint32_t seqno)
 {
-	*b->mapped_seqno = seqno;
 	b->reloc.delta = (engine - VCS1) * sizeof(uint32_t);
+	*b->mapped_address = b->reloc.presumed_offset + b->reloc.delta;
+	*b->mapped_seqno = seqno;
 }
 
 static void
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator

2017-04-20 Thread Chris Wilson
On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote:
> +static void
> +alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b,
> +  enum intel_engine_id engine, unsigned int flags)
> +{
> + unsigned int bb_i, j = 0;
> +
> + b->obj[j].handle = gem_create(fd, 4096);
> + b->obj[j].flags = EXEC_OBJECT_WRITE;
> + j++;
> +
> + if (flags & SEQNO) {
> + b->obj[j].handle = wrk->status_page_handle;
> + j++;
> + }
> +
> + bb_i = j++;
> + b->bb_sz = get_bb_sz(w->duration.max);
> + b->bb_handle = b->obj[bb_i].handle = gem_create(fd, b->bb_sz);
> + terminate_bb(w, b, engine, flags);
> +
> + igt_assert(w->dependency <= 0);
> + if (w->dependency) {
> + int dep_idx = w->idx + w->dependency;
> +
> + igt_assert(dep_idx >= 0 && dep_idx < wrk->nr_steps);
> + igt_assert(wrk->steps[dep_idx].type == BATCH);
> +
> + b->obj[j].handle = b->obj[bb_i].handle;
> + bb_i = j;
> + b->obj[j - 1].handle = wrk->steps[dep_idx].b[0].obj[0].handle;
> + j++;
> +
> + if (wrk->steps[dep_idx].b[1].obj[0].handle) {
> + b->obj[j].handle = b->obj[bb_i].handle;
> + bb_i = j;
> + b->obj[j - 1].handle =
> + wrk->steps[dep_idx].b[1].obj[0].handle;
> + j++;
> + }
> + }
> +
> + if (flags & SEQNO) {
> + b->reloc.presumed_offset = -1;

So as I understand it, you are caching the execbuf/obj/reloc for the
workload and then may reissue later with different seqno on different
rings? In which case we have a problem as the kernel will write back the
updated offsets to b->reloc.presumed_offset and b->obj[].offset and in
future passes they will match and the seqno write will go into the wrong
slot (if it swaps rings).

You either want to reset presumed_offset=-1 each time, or better for all
concerned write the correct address alongside the seqno (which also
enables NORELOC).

Delta incoming.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx