Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker

2015-04-29 Thread Jordan Justen
On 2015-04-27 19:02:38, Kenneth Graunke wrote:
 On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote:
  Tested on Ivybridge, Haswell and Broadwell.
  
  Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
  ---
   src/mesa/drivers/dri/i965/brw_compute.c | 39 
  -
   src/mesa/drivers/dri/i965/brw_defines.h |  1 +
   2 files changed, 39 insertions(+), 1 deletion(-)
  
  diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
  b/src/mesa/drivers/dri/i965/brw_compute.c
  index baed701..06ef448 100644
  --- a/src/mesa/drivers/dri/i965/brw_compute.c
  +++ b/src/mesa/drivers/dri/i965/brw_compute.c
  @@ -31,12 +31,49 @@
   #include brw_draw.h
   #include brw_state.h
   #include intel_batchbuffer.h
  +#include brw_defines.h
   
   
   static void
   brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups)
   {
  -   _mesa_problem(brw-ctx, TODO: implement brw_emit_gpgpu_walker);
  +   const struct brw_cs_prog_data *prog_data = brw-cs.prog_data;
  +
  +   const unsigned simd_size = prog_data-simd_size;
  +   unsigned group_size = prog_data-local_size[0] *
  +  prog_data-local_size[1] * prog_data-local_size[2];
  +   unsigned thread_width_max =
  +  (group_size + simd_size - 1) / simd_size;
  +
  +   uint32_t right_mask = (1u  simd_size) - 1;
  +   const unsigned right_non_aligned = group_size  (simd_size - 1);
  +   if (right_non_aligned != 0)
  +  right_mask = (simd_size - right_non_aligned);
 
 I think this is equvalent to:
 
 uint32_t right_mask = (1u  (simd_size - (group_size % simd_size))) - 1;
 
 which might be a bit simpler...
 
  +   BEGIN_BATCH(dwords);
  +   OUT_BATCH(GPGPU_WALKER  16 | (dwords - 2));
 
 I was going to suggest splitting this into separate Gen8+ and Gen7
 blocks, but now that I look at the code...these two are slightly
 different indirect handling, and the later one is just a DWord of MBZ,
 so...it's not really that different.  I think what you have is fine :)
 
  +   uint32_t dwords = brw-gen  8 ? 11 : 15;
  +   OUT_BATCH(0);
  +   if (brw-gen = 8) {
  +  OUT_BATCH(0);
  +  OUT_BATCH(0);
  +   }
  +   assert(thread_width_max = brw-max_cs_threads);
  +   OUT_BATCH(((simd_size == 8) ? 0 : 1)  30 |
 
 You might want to write this as ((simd_size / 8) - 1).  That will work
 for SIMD8/16/32.

Good idea, but I think simd_size / 16 will be needed, since we need 2
for SIMD32.

 Topi would probably suggest using SET_FIELD, i.e.
 
 #define BRW_GPGPU_SIMD_SIZE_SHIFT 30
 #define BRW_GPGPU_SIMD_SIZE_MASK  INTEL_MASK(31, 30)
 
 SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE)
 
 It's probably a good idea here too.

Will do.

  + (thread_width_max - 1));
 
 Don't you need to set the thread height/depth maximums as well?
 I'm not really sure how this works.

We flatten the 3-dims out above in group_size, and then
thread_width_max. So, this basically focuses getting it to execute the
correct number of times. When height is not used, we can set
bottom_mask to all 1's, and only use the right_mask.

In terms of GLSL's gl_LocalInvocationID, that is a whole separate
matter. (And a whole separate patch series! :)

  +   OUT_BATCH(0);
 
 It'd be nice to label the 0's, i.e.

Will do.

Thanks for the review!

-Jordan

OUT_BATCH(0); /* Thread Group ID Starting X */
OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */
 
 With those changes, the whole series is:
 Reviewed-by: Kenneth Graunke kenn...@whitecape.org
 
 I haven't verified that these execution masks are really what you want.
 You know more about this than I do. :)
 
  +   if (brw-gen = 8)
  +  OUT_BATCH(0);
  +   OUT_BATCH(num_groups[0]);
  +   OUT_BATCH(0);
  +   if (brw-gen = 8)
  +  OUT_BATCH(0);
  +   OUT_BATCH(num_groups[1]);
  +   OUT_BATCH(0);
  +   OUT_BATCH(num_groups[2]);
  +   OUT_BATCH(right_mask);
  +   OUT_BATCH(0x);
  +   ADVANCE_BATCH();
   }
   
   
  diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
  b/src/mesa/drivers/dri/i965/brw_defines.h
  index 36f46af..cd25511 100644
  --- a/src/mesa/drivers/dri/i965/brw_defines.h
  +++ b/src/mesa/drivers/dri/i965/brw_defines.h
  @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode {
   
   #define MEDIA_VFE_STATE 0x7000
   #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
  +#define GPGPU_WALKER0x7105
   
   #endif
  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker

2015-04-28 Thread Kenneth Graunke
On Tuesday, April 28, 2015 12:04:50 AM Jordan Justen wrote:
 On 2015-04-27 19:02:38, Kenneth Graunke wrote:
  On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote:
   +   BEGIN_BATCH(dwords);
   +   OUT_BATCH(GPGPU_WALKER  16 | (dwords - 2));
  
  I was going to suggest splitting this into separate Gen8+ and Gen7
  blocks, but now that I look at the code...these two are slightly
  different indirect handling, and the later one is just a DWord of MBZ,
  so...it's not really that different.  I think what you have is fine :)
 
 Hmm. Maybe time to press my luck. :)
 
 In my other 20 patch series
 [PATCH v2 19/20] i965/cs: Upload brw_cs_state
 
 We discussed this somewhat ugly code:
  +   int dw = 0;
  +   desc[dw++] = brw-cs.base.prog_offset;
  +   if (brw-gen = 8)
  +  dw++; /* Kernel Start Pointer High */
  +   dw++;
  +   dw++;
  +   desc[dw++] = stage_state-bind_bo_offset;
 
 It turns out it eventually doesn't look quite so pointless to use the
 dw var. Later, it would look like:
 
 http://cgit.freedesktop.org/~jljusten/mesa/tree/src/mesa/drivers/dri/i965/brw_cs.cpp?h=cs-27#n392
 
 Basically, the structure is pretty similar, but an extra dword appears
 for the high address in gen8.
 
 If it seems cleaner, I wouldn't mind splitting either or both of these
 to be initialized in separate paths based on the gen.
 
 Does the link above change your opinion on the other patch?
 
 Thanks for your time,
 
 -Jordan

I guess that's fine.  In the OUT_BATCH paradigm, we'd do things like:

   if (brw-gen = 8) {
   OUT_RELOC64(...)
   } else {
   OUT_RELOC(...)
   }

which effectively does the same thing.  But here you're writing indirect
state, so you have to track dwords yourself.

How about just changing 'dw++' to something like:

   desc[dw++] = 0; /* MBZ */
   desc[dw++] = 0; /* Kernel Start Pointer High */

I feel like that makes it readily apparent that you're just filling in
DWords in order, and not doing anything fancy.

Thoughts?
--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker

2015-04-28 Thread Jordan Justen
On 2015-04-27 19:02:38, Kenneth Graunke wrote:
 On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote:
  +   BEGIN_BATCH(dwords);
  +   OUT_BATCH(GPGPU_WALKER  16 | (dwords - 2));
 
 I was going to suggest splitting this into separate Gen8+ and Gen7
 blocks, but now that I look at the code...these two are slightly
 different indirect handling, and the later one is just a DWord of MBZ,
 so...it's not really that different.  I think what you have is fine :)

Hmm. Maybe time to press my luck. :)

In my other 20 patch series
[PATCH v2 19/20] i965/cs: Upload brw_cs_state

We discussed this somewhat ugly code:
 +   int dw = 0;
 +   desc[dw++] = brw-cs.base.prog_offset;
 +   if (brw-gen = 8)
 +  dw++; /* Kernel Start Pointer High */
 +   dw++;
 +   dw++;
 +   desc[dw++] = stage_state-bind_bo_offset;

It turns out it eventually doesn't look quite so pointless to use the
dw var. Later, it would look like:

http://cgit.freedesktop.org/~jljusten/mesa/tree/src/mesa/drivers/dri/i965/brw_cs.cpp?h=cs-27#n392

Basically, the structure is pretty similar, but an extra dword appears
for the high address in gen8.

If it seems cleaner, I wouldn't mind splitting either or both of these
to be initialized in separate paths based on the gen.

Does the link above change your opinion on the other patch?

Thanks for your time,

-Jordan

  +   uint32_t dwords = brw-gen  8 ? 11 : 15;
  +   OUT_BATCH(0);
  +   if (brw-gen = 8) {
  +  OUT_BATCH(0);
  +  OUT_BATCH(0);
  +   }
  +   assert(thread_width_max = brw-max_cs_threads);
  +   OUT_BATCH(((simd_size == 8) ? 0 : 1)  30 |
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker

2015-04-27 Thread Kenneth Graunke
On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote:
 Tested on Ivybridge, Haswell and Broadwell.
 
 Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
 ---
  src/mesa/drivers/dri/i965/brw_compute.c | 39 
 -
  src/mesa/drivers/dri/i965/brw_defines.h |  1 +
  2 files changed, 39 insertions(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
 b/src/mesa/drivers/dri/i965/brw_compute.c
 index baed701..06ef448 100644
 --- a/src/mesa/drivers/dri/i965/brw_compute.c
 +++ b/src/mesa/drivers/dri/i965/brw_compute.c
 @@ -31,12 +31,49 @@
  #include brw_draw.h
  #include brw_state.h
  #include intel_batchbuffer.h
 +#include brw_defines.h
  
  
  static void
  brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups)
  {
 -   _mesa_problem(brw-ctx, TODO: implement brw_emit_gpgpu_walker);
 +   const struct brw_cs_prog_data *prog_data = brw-cs.prog_data;
 +
 +   const unsigned simd_size = prog_data-simd_size;
 +   unsigned group_size = prog_data-local_size[0] *
 +  prog_data-local_size[1] * prog_data-local_size[2];
 +   unsigned thread_width_max =
 +  (group_size + simd_size - 1) / simd_size;
 +
 +   uint32_t right_mask = (1u  simd_size) - 1;
 +   const unsigned right_non_aligned = group_size  (simd_size - 1);
 +   if (right_non_aligned != 0)
 +  right_mask = (simd_size - right_non_aligned);

I think this is equvalent to:

uint32_t right_mask = (1u  (simd_size - (group_size % simd_size))) - 1;

which might be a bit simpler...

 +   BEGIN_BATCH(dwords);
 +   OUT_BATCH(GPGPU_WALKER  16 | (dwords - 2));

I was going to suggest splitting this into separate Gen8+ and Gen7
blocks, but now that I look at the code...these two are slightly
different indirect handling, and the later one is just a DWord of MBZ,
so...it's not really that different.  I think what you have is fine :)

 +   uint32_t dwords = brw-gen  8 ? 11 : 15;
 +   OUT_BATCH(0);
 +   if (brw-gen = 8) {
 +  OUT_BATCH(0);
 +  OUT_BATCH(0);
 +   }
 +   assert(thread_width_max = brw-max_cs_threads);
 +   OUT_BATCH(((simd_size == 8) ? 0 : 1)  30 |

You might want to write this as ((simd_size / 8) - 1).  That will work
for SIMD8/16/32.

Topi would probably suggest using SET_FIELD, i.e.

#define BRW_GPGPU_SIMD_SIZE_SHIFT 30
#define BRW_GPGPU_SIMD_SIZE_MASK  INTEL_MASK(31, 30)

SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE)

It's probably a good idea here too.

 + (thread_width_max - 1));

Don't you need to set the thread height/depth maximums as well?
I'm not really sure how this works.

 +   OUT_BATCH(0);

It'd be nice to label the 0's, i.e.

   OUT_BATCH(0); /* Thread Group ID Starting X */
   OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */

With those changes, the whole series is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

I haven't verified that these execution masks are really what you want.
You know more about this than I do. :)

 +   if (brw-gen = 8)
 +  OUT_BATCH(0);
 +   OUT_BATCH(num_groups[0]);
 +   OUT_BATCH(0);
 +   if (brw-gen = 8)
 +  OUT_BATCH(0);
 +   OUT_BATCH(num_groups[1]);
 +   OUT_BATCH(0);
 +   OUT_BATCH(num_groups[2]);
 +   OUT_BATCH(right_mask);
 +   OUT_BATCH(0x);
 +   ADVANCE_BATCH();
  }
  
  
 diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
 b/src/mesa/drivers/dri/i965/brw_defines.h
 index 36f46af..cd25511 100644
 --- a/src/mesa/drivers/dri/i965/brw_defines.h
 +++ b/src/mesa/drivers/dri/i965/brw_defines.h
 @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode {
  
  #define MEDIA_VFE_STATE 0x7000
  #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
 +#define GPGPU_WALKER0x7105
  
  #endif
 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker

2015-04-24 Thread Jordan Justen
Tested on Ivybridge, Haswell and Broadwell.

Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
---
 src/mesa/drivers/dri/i965/brw_compute.c | 39 -
 src/mesa/drivers/dri/i965/brw_defines.h |  1 +
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index baed701..06ef448 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -31,12 +31,49 @@
 #include brw_draw.h
 #include brw_state.h
 #include intel_batchbuffer.h
+#include brw_defines.h
 
 
 static void
 brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups)
 {
-   _mesa_problem(brw-ctx, TODO: implement brw_emit_gpgpu_walker);
+   const struct brw_cs_prog_data *prog_data = brw-cs.prog_data;
+
+   const unsigned simd_size = prog_data-simd_size;
+   unsigned group_size = prog_data-local_size[0] *
+  prog_data-local_size[1] * prog_data-local_size[2];
+   unsigned thread_width_max =
+  (group_size + simd_size - 1) / simd_size;
+
+   uint32_t right_mask = (1u  simd_size) - 1;
+   const unsigned right_non_aligned = group_size  (simd_size - 1);
+   if (right_non_aligned != 0)
+  right_mask = (simd_size - right_non_aligned);
+
+   uint32_t dwords = brw-gen  8 ? 11 : 15;
+   BEGIN_BATCH(dwords);
+   OUT_BATCH(GPGPU_WALKER  16 | (dwords - 2));
+   OUT_BATCH(0);
+   if (brw-gen = 8) {
+  OUT_BATCH(0);
+  OUT_BATCH(0);
+   }
+   assert(thread_width_max = brw-max_cs_threads);
+   OUT_BATCH(((simd_size == 8) ? 0 : 1)  30 |
+ (thread_width_max - 1));
+   OUT_BATCH(0);
+   if (brw-gen = 8)
+  OUT_BATCH(0);
+   OUT_BATCH(num_groups[0]);
+   OUT_BATCH(0);
+   if (brw-gen = 8)
+  OUT_BATCH(0);
+   OUT_BATCH(num_groups[1]);
+   OUT_BATCH(0);
+   OUT_BATCH(num_groups[2]);
+   OUT_BATCH(right_mask);
+   OUT_BATCH(0x);
+   ADVANCE_BATCH();
 }
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 36f46af..cd25511 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode {
 
 #define MEDIA_VFE_STATE 0x7000
 #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
+#define GPGPU_WALKER0x7105
 
 #endif
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev