Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Jason Ekstrand
On Fri, Sep 16, 2016 at 5:59 PM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > On Sep 16, 2016 3:04 PM, "Francisco Jerez" 
> wrote:
> >>
> >> Not intended for upstream.  Should cause a GPU hang if some thread is
> >> executed with a non-contiguous dispatch mask breaking assumptions of
> >> brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
> >> Piglit regressions, while replacing brw_stage_has_packed_dispatch()
> >> with a dummy implementation that unconditionally returns true on top
> >> of this patch causes multiple GPU hangs.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
> >>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
> >>  2 files changed, 38 insertions(+)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> index 042203d..b3eec49 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> @@ -33,6 +33,23 @@ using namespace brw::surface_access;
> >>  void
> >>  fs_visitor::emit_nir_code()
> >>  {
> >> +   if (brw_stage_has_packed_dispatch(stage, prog_data)) {
> >
> > Mind adding "0 &&" and merging this patch so we remain aware of the
> issue,
> > keep it building, and can easily test future hardware.
> >
> I guess that would work -- An alternative that would keep the
> NIR-to-i965 pass tidier would be to assert(devinfo->gen <= 9) in
> brw_stage_has_packed_dispatch() to make sure we don't forget to do a
> full Piglit/DEQP/CTS run with this patch applied when a new generation
> is powered on.  I can also add a comment with a link to this patch.
>

Can we do both?  Maybe something like this instead of an assert:

if (devinfo->gen > 9) {
   static bool warned = false;
   if (!warned) {
  fprintf(stderr, "WARNING: VMask/DMask power-of-two assumptions need
to be verified.  Once verified using emit_fs_mask_pow2_check(), this
warning may be disabled for gen%d", devinfo->gen);
  warned = true;
   }
}

where emit_fs_mask_pow2_check() is a helper that we put the check code
into.  That way it's not an assert-failure that will get instantly removed
but something that will constantly bug the bring-up person until they have
verified the assumption.


> >> +  const fs_builder ubld = bld.exec_all().group(1, 0);
> >> +  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
> >> +  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ?
> > brw_vmask_reg() :
> >> +   brw_dmask_reg());
> >> +
> >> +  ubld.ADD(tmp, mask, brw_imm_ud(1));
> >> +  ubld.AND(tmp, mask, tmp);
> >> +
> >> +  /* This will loop forever if the dispatch mask doesn't have the
> > expected
> >> +   * form '2^n-1', in which case tmp will be non-zero.
> >> +   */
> >> +  bld.emit(BRW_OPCODE_DO);
> >> +  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0),
> BRW_CONDITIONAL_NZ);
> >> +  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
> >> +   }
> >> +
> >> /* emit the arrays used for inputs and outputs - load/store
> > intrinsics will
> >>  * be converted to reads/writes of these arrays
> >>  */
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> index ba3bbdf..9f7a1f0 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> @@ -35,6 +35,27 @@ namespace brw {
> >>  void
> >>  vec4_visitor::emit_nir_code()
> >>  {
> >> +   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
> >> +  const dst_reg tmp = writemask(dst_reg(this,
> glsl_type::uint_type),
> >> +WRITEMASK_X);
> >> +  const src_reg mask =
> >> +
> >  brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
> > BRW_ARF_STATE, 0),
> >> +   0, 4, 1),
> >> +BRW_REGISTER_TYPE_UD),
> >> + BRW_SWIZZLE_);
> >
> > Can we just do vec4_reg(brw_vmask_reg)?
> >
> That didn't seem to work, because both brw_dmask_reg() and
> brw_vmask_reg() return a register region that is equivalent to
> sr0.0. in Align16 addressing mode (maybe a bug of PATCH 1?  It's
> unlikely to matter a lot in practice though), so the Z component where
> the DMask is stored in is not even part of the returned Align16 region.
>

Right.  As I said in another patch, I'm not terribly concerned about vec4.
We've shown it works and we won't be bringing up more vec4.  I'm fine with
just merging the fs version.


> >> +
> >> +  emit(ADD(tmp, mask, brw_imm_ud(1)));
> >> +  emit(AND(tmp, mask, src_reg(tmp)));
> >> +
> >> +  /* This will loop forever if the dispatch mask doesn't have the
> > expected
> >> +   * form '2^n-1', in which case tmp will be non-zero.
> >> +   */
> >> +  

Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Sep 16, 2016 3:04 PM, "Francisco Jerez"  wrote:
>>
>> Not intended for upstream.  Should cause a GPU hang if some thread is
>> executed with a non-contiguous dispatch mask breaking assumptions of
>> brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
>> Piglit regressions, while replacing brw_stage_has_packed_dispatch()
>> with a dummy implementation that unconditionally returns true on top
>> of this patch causes multiple GPU hangs.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
>>  2 files changed, 38 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index 042203d..b3eec49 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -33,6 +33,23 @@ using namespace brw::surface_access;
>>  void
>>  fs_visitor::emit_nir_code()
>>  {
>> +   if (brw_stage_has_packed_dispatch(stage, prog_data)) {
>
> Mind adding "0 &&" and merging this patch so we remain aware of the issue,
> keep it building, and can easily test future hardware.
>
I guess that would work -- An alternative that would keep the
NIR-to-i965 pass tidier would be to assert(devinfo->gen <= 9) in
brw_stage_has_packed_dispatch() to make sure we don't forget to do a
full Piglit/DEQP/CTS run with this patch applied when a new generation
is powered on.  I can also add a comment with a link to this patch.

>> +  const fs_builder ubld = bld.exec_all().group(1, 0);
>> +  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
>> +  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ?
> brw_vmask_reg() :
>> +   brw_dmask_reg());
>> +
>> +  ubld.ADD(tmp, mask, brw_imm_ud(1));
>> +  ubld.AND(tmp, mask, tmp);
>> +
>> +  /* This will loop forever if the dispatch mask doesn't have the
> expected
>> +   * form '2^n-1', in which case tmp will be non-zero.
>> +   */
>> +  bld.emit(BRW_OPCODE_DO);
>> +  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ);
>> +  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
>> +   }
>> +
>> /* emit the arrays used for inputs and outputs - load/store
> intrinsics will
>>  * be converted to reads/writes of these arrays
>>  */
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> index ba3bbdf..9f7a1f0 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> @@ -35,6 +35,27 @@ namespace brw {
>>  void
>>  vec4_visitor::emit_nir_code()
>>  {
>> +   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
>> +  const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type),
>> +WRITEMASK_X);
>> +  const src_reg mask =
>> +
>  brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
> BRW_ARF_STATE, 0),
>> +   0, 4, 1),
>> +BRW_REGISTER_TYPE_UD),
>> + BRW_SWIZZLE_);
>
> Can we just do vec4_reg(brw_vmask_reg)?
>
That didn't seem to work, because both brw_dmask_reg() and
brw_vmask_reg() return a register region that is equivalent to
sr0.0. in Align16 addressing mode (maybe a bug of PATCH 1?  It's
unlikely to matter a lot in practice though), so the Z component where
the DMask is stored in is not even part of the returned Align16 region.

>> +
>> +  emit(ADD(tmp, mask, brw_imm_ud(1)));
>> +  emit(AND(tmp, mask, src_reg(tmp)));
>> +
>> +  /* This will loop forever if the dispatch mask doesn't have the
> expected
>> +   * form '2^n-1', in which case tmp will be non-zero.
>> +   */
>> +  emit(BRW_OPCODE_DO);
>> +  emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0),
>> +   BRW_CONDITIONAL_NZ));
>> +  emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL;
>> +   }
>> +
>> if (nir->num_uniforms > 0)
>>nir_setup_uniforms();
>>
>> --
>> 2.9.0
>>


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Jason Ekstrand
On Sep 16, 2016 3:04 PM, "Francisco Jerez"  wrote:
>
> Not intended for upstream.  Should cause a GPU hang if some thread is
> executed with a non-contiguous dispatch mask breaking assumptions of
> brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
> Piglit regressions, while replacing brw_stage_has_packed_dispatch()
> with a dummy implementation that unconditionally returns true on top
> of this patch causes multiple GPU hangs.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
>  2 files changed, 38 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 042203d..b3eec49 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -33,6 +33,23 @@ using namespace brw::surface_access;
>  void
>  fs_visitor::emit_nir_code()
>  {
> +   if (brw_stage_has_packed_dispatch(stage, prog_data)) {

Mind adding "0 &&" and merging this patch so we remain aware of the issue,
keep it building, and can easily test future hardware.

> +  const fs_builder ubld = bld.exec_all().group(1, 0);
> +  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
> +  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ?
brw_vmask_reg() :
> +   brw_dmask_reg());
> +
> +  ubld.ADD(tmp, mask, brw_imm_ud(1));
> +  ubld.AND(tmp, mask, tmp);
> +
> +  /* This will loop forever if the dispatch mask doesn't have the
expected
> +   * form '2^n-1', in which case tmp will be non-zero.
> +   */
> +  bld.emit(BRW_OPCODE_DO);
> +  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ);
> +  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
> +   }
> +
> /* emit the arrays used for inputs and outputs - load/store
intrinsics will
>  * be converted to reads/writes of these arrays
>  */
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index ba3bbdf..9f7a1f0 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -35,6 +35,27 @@ namespace brw {
>  void
>  vec4_visitor::emit_nir_code()
>  {
> +   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
> +  const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type),
> +WRITEMASK_X);
> +  const src_reg mask =
> +
 brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
BRW_ARF_STATE, 0),
> +   0, 4, 1),
> +BRW_REGISTER_TYPE_UD),
> + BRW_SWIZZLE_);

Can we just do vec4_reg(brw_vmask_reg)?

> +
> +  emit(ADD(tmp, mask, brw_imm_ud(1)));
> +  emit(AND(tmp, mask, src_reg(tmp)));
> +
> +  /* This will loop forever if the dispatch mask doesn't have the
expected
> +   * form '2^n-1', in which case tmp will be non-zero.
> +   */
> +  emit(BRW_OPCODE_DO);
> +  emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0),
> +   BRW_CONDITIONAL_NZ));
> +  emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL;
> +   }
> +
> if (nir->num_uniforms > 0)
>nir_setup_uniforms();
>
> --
> 2.9.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Francisco Jerez
Not intended for upstream.  Should cause a GPU hang if some thread is
executed with a non-contiguous dispatch mask breaking assumptions of
brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
Piglit regressions, while replacing brw_stage_has_packed_dispatch()
with a dummy implementation that unconditionally returns true on top
of this patch causes multiple GPU hangs.
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
 2 files changed, 38 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 042203d..b3eec49 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -33,6 +33,23 @@ using namespace brw::surface_access;
 void
 fs_visitor::emit_nir_code()
 {
+   if (brw_stage_has_packed_dispatch(stage, prog_data)) {
+  const fs_builder ubld = bld.exec_all().group(1, 0);
+  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
+  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() :
+   brw_dmask_reg());
+
+  ubld.ADD(tmp, mask, brw_imm_ud(1));
+  ubld.AND(tmp, mask, tmp);
+
+  /* This will loop forever if the dispatch mask doesn't have the expected
+   * form '2^n-1', in which case tmp will be non-zero.
+   */
+  bld.emit(BRW_OPCODE_DO);
+  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ);
+  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
+   }
+
/* emit the arrays used for inputs and outputs - load/store intrinsics will
 * be converted to reads/writes of these arrays
 */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ba3bbdf..9f7a1f0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -35,6 +35,27 @@ namespace brw {
 void
 vec4_visitor::emit_nir_code()
 {
+   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
+  const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type),
+WRITEMASK_X);
+  const src_reg mask =
+ 
brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE, 
BRW_ARF_STATE, 0),
+   0, 4, 1),
+BRW_REGISTER_TYPE_UD),
+ BRW_SWIZZLE_);
+
+  emit(ADD(tmp, mask, brw_imm_ud(1)));
+  emit(AND(tmp, mask, src_reg(tmp)));
+
+  /* This will loop forever if the dispatch mask doesn't have the expected
+   * form '2^n-1', in which case tmp will be non-zero.
+   */
+  emit(BRW_OPCODE_DO);
+  emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0),
+   BRW_CONDITIONAL_NZ));
+  emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL;
+   }
+
if (nir->num_uniforms > 0)
   nir_setup_uniforms();
 
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev