Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker
On 2015-04-27 19:02:38, Kenneth Graunke wrote: On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote: Tested on Ivybridge, Haswell and Broadwell. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com --- src/mesa/drivers/dri/i965/brw_compute.c | 39 - src/mesa/drivers/dri/i965/brw_defines.h | 1 + 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c index baed701..06ef448 100644 --- a/src/mesa/drivers/dri/i965/brw_compute.c +++ b/src/mesa/drivers/dri/i965/brw_compute.c @@ -31,12 +31,49 @@ #include brw_draw.h #include brw_state.h #include intel_batchbuffer.h +#include brw_defines.h static void brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups) { - _mesa_problem(brw-ctx, TODO: implement brw_emit_gpgpu_walker); + const struct brw_cs_prog_data *prog_data = brw-cs.prog_data; + + const unsigned simd_size = prog_data-simd_size; + unsigned group_size = prog_data-local_size[0] * + prog_data-local_size[1] * prog_data-local_size[2]; + unsigned thread_width_max = + (group_size + simd_size - 1) / simd_size; + + uint32_t right_mask = (1u simd_size) - 1; + const unsigned right_non_aligned = group_size (simd_size - 1); + if (right_non_aligned != 0) + right_mask = (simd_size - right_non_aligned); I think this is equvalent to: uint32_t right_mask = (1u (simd_size - (group_size % simd_size))) - 1; which might be a bit simpler... + BEGIN_BATCH(dwords); + OUT_BATCH(GPGPU_WALKER 16 | (dwords - 2)); I was going to suggest splitting this into separate Gen8+ and Gen7 blocks, but now that I look at the code...these two are slightly different indirect handling, and the later one is just a DWord of MBZ, so...it's not really that different. I think what you have is fine :) + uint32_t dwords = brw-gen 8 ? 11 : 15; + OUT_BATCH(0); + if (brw-gen = 8) { + OUT_BATCH(0); + OUT_BATCH(0); + } + assert(thread_width_max = brw-max_cs_threads); + OUT_BATCH(((simd_size == 8) ? 0 : 1) 30 | You might want to write this as ((simd_size / 8) - 1). That will work for SIMD8/16/32. Good idea, but I think simd_size / 16 will be needed, since we need 2 for SIMD32. Topi would probably suggest using SET_FIELD, i.e. #define BRW_GPGPU_SIMD_SIZE_SHIFT 30 #define BRW_GPGPU_SIMD_SIZE_MASK INTEL_MASK(31, 30) SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE) It's probably a good idea here too. Will do. + (thread_width_max - 1)); Don't you need to set the thread height/depth maximums as well? I'm not really sure how this works. We flatten the 3-dims out above in group_size, and then thread_width_max. So, this basically focuses getting it to execute the correct number of times. When height is not used, we can set bottom_mask to all 1's, and only use the right_mask. In terms of GLSL's gl_LocalInvocationID, that is a whole separate matter. (And a whole separate patch series! :) + OUT_BATCH(0); It'd be nice to label the 0's, i.e. Will do. Thanks for the review! -Jordan OUT_BATCH(0); /* Thread Group ID Starting X */ OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */ With those changes, the whole series is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org I haven't verified that these execution masks are really what you want. You know more about this than I do. :) + if (brw-gen = 8) + OUT_BATCH(0); + OUT_BATCH(num_groups[0]); + OUT_BATCH(0); + if (brw-gen = 8) + OUT_BATCH(0); + OUT_BATCH(num_groups[1]); + OUT_BATCH(0); + OUT_BATCH(num_groups[2]); + OUT_BATCH(right_mask); + OUT_BATCH(0x); + ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 36f46af..cd25511 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode { #define MEDIA_VFE_STATE 0x7000 #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002 +#define GPGPU_WALKER0x7105 #endif ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker
On Tuesday, April 28, 2015 12:04:50 AM Jordan Justen wrote: On 2015-04-27 19:02:38, Kenneth Graunke wrote: On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote: + BEGIN_BATCH(dwords); + OUT_BATCH(GPGPU_WALKER 16 | (dwords - 2)); I was going to suggest splitting this into separate Gen8+ and Gen7 blocks, but now that I look at the code...these two are slightly different indirect handling, and the later one is just a DWord of MBZ, so...it's not really that different. I think what you have is fine :) Hmm. Maybe time to press my luck. :) In my other 20 patch series [PATCH v2 19/20] i965/cs: Upload brw_cs_state We discussed this somewhat ugly code: + int dw = 0; + desc[dw++] = brw-cs.base.prog_offset; + if (brw-gen = 8) + dw++; /* Kernel Start Pointer High */ + dw++; + dw++; + desc[dw++] = stage_state-bind_bo_offset; It turns out it eventually doesn't look quite so pointless to use the dw var. Later, it would look like: http://cgit.freedesktop.org/~jljusten/mesa/tree/src/mesa/drivers/dri/i965/brw_cs.cpp?h=cs-27#n392 Basically, the structure is pretty similar, but an extra dword appears for the high address in gen8. If it seems cleaner, I wouldn't mind splitting either or both of these to be initialized in separate paths based on the gen. Does the link above change your opinion on the other patch? Thanks for your time, -Jordan I guess that's fine. In the OUT_BATCH paradigm, we'd do things like: if (brw-gen = 8) { OUT_RELOC64(...) } else { OUT_RELOC(...) } which effectively does the same thing. But here you're writing indirect state, so you have to track dwords yourself. How about just changing 'dw++' to something like: desc[dw++] = 0; /* MBZ */ desc[dw++] = 0; /* Kernel Start Pointer High */ I feel like that makes it readily apparent that you're just filling in DWords in order, and not doing anything fancy. Thoughts? --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker
On 2015-04-27 19:02:38, Kenneth Graunke wrote: On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote: + BEGIN_BATCH(dwords); + OUT_BATCH(GPGPU_WALKER 16 | (dwords - 2)); I was going to suggest splitting this into separate Gen8+ and Gen7 blocks, but now that I look at the code...these two are slightly different indirect handling, and the later one is just a DWord of MBZ, so...it's not really that different. I think what you have is fine :) Hmm. Maybe time to press my luck. :) In my other 20 patch series [PATCH v2 19/20] i965/cs: Upload brw_cs_state We discussed this somewhat ugly code: + int dw = 0; + desc[dw++] = brw-cs.base.prog_offset; + if (brw-gen = 8) + dw++; /* Kernel Start Pointer High */ + dw++; + dw++; + desc[dw++] = stage_state-bind_bo_offset; It turns out it eventually doesn't look quite so pointless to use the dw var. Later, it would look like: http://cgit.freedesktop.org/~jljusten/mesa/tree/src/mesa/drivers/dri/i965/brw_cs.cpp?h=cs-27#n392 Basically, the structure is pretty similar, but an extra dword appears for the high address in gen8. If it seems cleaner, I wouldn't mind splitting either or both of these to be initialized in separate paths based on the gen. Does the link above change your opinion on the other patch? Thanks for your time, -Jordan + uint32_t dwords = brw-gen 8 ? 11 : 15; + OUT_BATCH(0); + if (brw-gen = 8) { + OUT_BATCH(0); + OUT_BATCH(0); + } + assert(thread_width_max = brw-max_cs_threads); + OUT_BATCH(((simd_size == 8) ? 0 : 1) 30 | ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker
On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote: Tested on Ivybridge, Haswell and Broadwell. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com --- src/mesa/drivers/dri/i965/brw_compute.c | 39 - src/mesa/drivers/dri/i965/brw_defines.h | 1 + 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c index baed701..06ef448 100644 --- a/src/mesa/drivers/dri/i965/brw_compute.c +++ b/src/mesa/drivers/dri/i965/brw_compute.c @@ -31,12 +31,49 @@ #include brw_draw.h #include brw_state.h #include intel_batchbuffer.h +#include brw_defines.h static void brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups) { - _mesa_problem(brw-ctx, TODO: implement brw_emit_gpgpu_walker); + const struct brw_cs_prog_data *prog_data = brw-cs.prog_data; + + const unsigned simd_size = prog_data-simd_size; + unsigned group_size = prog_data-local_size[0] * + prog_data-local_size[1] * prog_data-local_size[2]; + unsigned thread_width_max = + (group_size + simd_size - 1) / simd_size; + + uint32_t right_mask = (1u simd_size) - 1; + const unsigned right_non_aligned = group_size (simd_size - 1); + if (right_non_aligned != 0) + right_mask = (simd_size - right_non_aligned); I think this is equvalent to: uint32_t right_mask = (1u (simd_size - (group_size % simd_size))) - 1; which might be a bit simpler... + BEGIN_BATCH(dwords); + OUT_BATCH(GPGPU_WALKER 16 | (dwords - 2)); I was going to suggest splitting this into separate Gen8+ and Gen7 blocks, but now that I look at the code...these two are slightly different indirect handling, and the later one is just a DWord of MBZ, so...it's not really that different. I think what you have is fine :) + uint32_t dwords = brw-gen 8 ? 11 : 15; + OUT_BATCH(0); + if (brw-gen = 8) { + OUT_BATCH(0); + OUT_BATCH(0); + } + assert(thread_width_max = brw-max_cs_threads); + OUT_BATCH(((simd_size == 8) ? 0 : 1) 30 | You might want to write this as ((simd_size / 8) - 1). That will work for SIMD8/16/32. Topi would probably suggest using SET_FIELD, i.e. #define BRW_GPGPU_SIMD_SIZE_SHIFT 30 #define BRW_GPGPU_SIMD_SIZE_MASK INTEL_MASK(31, 30) SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE) It's probably a good idea here too. + (thread_width_max - 1)); Don't you need to set the thread height/depth maximums as well? I'm not really sure how this works. + OUT_BATCH(0); It'd be nice to label the 0's, i.e. OUT_BATCH(0); /* Thread Group ID Starting X */ OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */ With those changes, the whole series is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org I haven't verified that these execution masks are really what you want. You know more about this than I do. :) + if (brw-gen = 8) + OUT_BATCH(0); + OUT_BATCH(num_groups[0]); + OUT_BATCH(0); + if (brw-gen = 8) + OUT_BATCH(0); + OUT_BATCH(num_groups[1]); + OUT_BATCH(0); + OUT_BATCH(num_groups[2]); + OUT_BATCH(right_mask); + OUT_BATCH(0x); + ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 36f46af..cd25511 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode { #define MEDIA_VFE_STATE 0x7000 #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002 +#define GPGPU_WALKER0x7105 #endif signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker
Tested on Ivybridge, Haswell and Broadwell. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com --- src/mesa/drivers/dri/i965/brw_compute.c | 39 - src/mesa/drivers/dri/i965/brw_defines.h | 1 + 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c index baed701..06ef448 100644 --- a/src/mesa/drivers/dri/i965/brw_compute.c +++ b/src/mesa/drivers/dri/i965/brw_compute.c @@ -31,12 +31,49 @@ #include brw_draw.h #include brw_state.h #include intel_batchbuffer.h +#include brw_defines.h static void brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups) { - _mesa_problem(brw-ctx, TODO: implement brw_emit_gpgpu_walker); + const struct brw_cs_prog_data *prog_data = brw-cs.prog_data; + + const unsigned simd_size = prog_data-simd_size; + unsigned group_size = prog_data-local_size[0] * + prog_data-local_size[1] * prog_data-local_size[2]; + unsigned thread_width_max = + (group_size + simd_size - 1) / simd_size; + + uint32_t right_mask = (1u simd_size) - 1; + const unsigned right_non_aligned = group_size (simd_size - 1); + if (right_non_aligned != 0) + right_mask = (simd_size - right_non_aligned); + + uint32_t dwords = brw-gen 8 ? 11 : 15; + BEGIN_BATCH(dwords); + OUT_BATCH(GPGPU_WALKER 16 | (dwords - 2)); + OUT_BATCH(0); + if (brw-gen = 8) { + OUT_BATCH(0); + OUT_BATCH(0); + } + assert(thread_width_max = brw-max_cs_threads); + OUT_BATCH(((simd_size == 8) ? 0 : 1) 30 | + (thread_width_max - 1)); + OUT_BATCH(0); + if (brw-gen = 8) + OUT_BATCH(0); + OUT_BATCH(num_groups[0]); + OUT_BATCH(0); + if (brw-gen = 8) + OUT_BATCH(0); + OUT_BATCH(num_groups[1]); + OUT_BATCH(0); + OUT_BATCH(num_groups[2]); + OUT_BATCH(right_mask); + OUT_BATCH(0x); + ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 36f46af..cd25511 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode { #define MEDIA_VFE_STATE 0x7000 #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002 +#define GPGPU_WALKER0x7105 #endif -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev