Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.
On Fri, Sep 16, 2016 at 5:59 PM, Francisco Jerezwrote: > Jason Ekstrand writes: > > > On Sep 16, 2016 3:04 PM, "Francisco Jerez" > wrote: > >> > >> Not intended for upstream. Should cause a GPU hang if some thread is > >> executed with a non-contiguous dispatch mask breaking assumptions of > >> brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or > >> Piglit regressions, while replacing brw_stage_has_packed_dispatch() > >> with a dummy implementation that unconditionally returns true on top > >> of this patch causes multiple GPU hangs. > >> --- > >> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 17 + > >> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 + > >> 2 files changed, 38 insertions(+) > >> > >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >> index 042203d..b3eec49 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >> @@ -33,6 +33,23 @@ using namespace brw::surface_access; > >> void > >> fs_visitor::emit_nir_code() > >> { > >> + if (brw_stage_has_packed_dispatch(stage, prog_data)) { > > > > Mind adding "0 &&" and merging this patch so we remain aware of the > issue, > > keep it building, and can easily test future hardware. > > > I guess that would work -- An alternative that would keep the > NIR-to-i965 pass tidier would be to assert(devinfo->gen <= 9) in > brw_stage_has_packed_dispatch() to make sure we don't forget to do a > full Piglit/DEQP/CTS run with this patch applied when a new generation > is powered on. I can also add a comment with a link to this patch. > Can we do both? Maybe something like this instead of an assert: if (devinfo->gen > 9) { static bool warned = false; if (!warned) { fprintf(stderr, "WARNING: VMask/DMask power-of-two assumptions need to be verified. Once verified using emit_fs_mask_pow2_check(), this warning may be disabled for gen%d", devinfo->gen); warned = true; } } where emit_fs_mask_pow2_check() is a helper that we put the check code into. That way it's not an assert-failure that will get instantly removed but something that will constantly bug the bring-up person until they have verified the assumption. > >> + const fs_builder ubld = bld.exec_all().group(1, 0); > >> + const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0); > >> + const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ? > > brw_vmask_reg() : > >> + brw_dmask_reg()); > >> + > >> + ubld.ADD(tmp, mask, brw_imm_ud(1)); > >> + ubld.AND(tmp, mask, tmp); > >> + > >> + /* This will loop forever if the dispatch mask doesn't have the > > expected > >> + * form '2^n-1', in which case tmp will be non-zero. > >> + */ > >> + bld.emit(BRW_OPCODE_DO); > >> + bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), > BRW_CONDITIONAL_NZ); > >> + set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE)); > >> + } > >> + > >> /* emit the arrays used for inputs and outputs - load/store > > intrinsics will > >> * be converted to reads/writes of these arrays > >> */ > >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > >> index ba3bbdf..9f7a1f0 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > >> @@ -35,6 +35,27 @@ namespace brw { > >> void > >> vec4_visitor::emit_nir_code() > >> { > >> + if (brw_stage_has_packed_dispatch(stage, _data->base)) { > >> + const dst_reg tmp = writemask(dst_reg(this, > glsl_type::uint_type), > >> +WRITEMASK_X); > >> + const src_reg mask = > >> + > > brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE, > > BRW_ARF_STATE, 0), > >> + 0, 4, 1), > >> +BRW_REGISTER_TYPE_UD), > >> + BRW_SWIZZLE_); > > > > Can we just do vec4_reg(brw_vmask_reg)? > > > That didn't seem to work, because both brw_dmask_reg() and > brw_vmask_reg() return a register region that is equivalent to > sr0.0. in Align16 addressing mode (maybe a bug of PATCH 1? It's > unlikely to matter a lot in practice though), so the Z component where > the DMask is stored in is not even part of the returned Align16 region. > Right. As I said in another patch, I'm not terribly concerned about vec4. We've shown it works and we won't be bringing up more vec4. I'm fine with just merging the fs version. > >> + > >> + emit(ADD(tmp, mask, brw_imm_ud(1))); > >> + emit(AND(tmp, mask, src_reg(tmp))); > >> + > >> + /* This will loop forever if the dispatch mask doesn't have the > > expected > >> + * form '2^n-1', in which case tmp will be non-zero. > >> + */ > >> +
Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.
Jason Ekstrandwrites: > On Sep 16, 2016 3:04 PM, "Francisco Jerez" wrote: >> >> Not intended for upstream. Should cause a GPU hang if some thread is >> executed with a non-contiguous dispatch mask breaking assumptions of >> brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or >> Piglit regressions, while replacing brw_stage_has_packed_dispatch() >> with a dummy implementation that unconditionally returns true on top >> of this patch causes multiple GPU hangs. >> --- >> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 17 + >> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 + >> 2 files changed, 38 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> index 042203d..b3eec49 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> @@ -33,6 +33,23 @@ using namespace brw::surface_access; >> void >> fs_visitor::emit_nir_code() >> { >> + if (brw_stage_has_packed_dispatch(stage, prog_data)) { > > Mind adding "0 &&" and merging this patch so we remain aware of the issue, > keep it building, and can easily test future hardware. > I guess that would work -- An alternative that would keep the NIR-to-i965 pass tidier would be to assert(devinfo->gen <= 9) in brw_stage_has_packed_dispatch() to make sure we don't forget to do a full Piglit/DEQP/CTS run with this patch applied when a new generation is powered on. I can also add a comment with a link to this patch. >> + const fs_builder ubld = bld.exec_all().group(1, 0); >> + const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0); >> + const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ? > brw_vmask_reg() : >> + brw_dmask_reg()); >> + >> + ubld.ADD(tmp, mask, brw_imm_ud(1)); >> + ubld.AND(tmp, mask, tmp); >> + >> + /* This will loop forever if the dispatch mask doesn't have the > expected >> + * form '2^n-1', in which case tmp will be non-zero. >> + */ >> + bld.emit(BRW_OPCODE_DO); >> + bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ); >> + set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE)); >> + } >> + >> /* emit the arrays used for inputs and outputs - load/store > intrinsics will >> * be converted to reads/writes of these arrays >> */ >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> index ba3bbdf..9f7a1f0 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> @@ -35,6 +35,27 @@ namespace brw { >> void >> vec4_visitor::emit_nir_code() >> { >> + if (brw_stage_has_packed_dispatch(stage, _data->base)) { >> + const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type), >> +WRITEMASK_X); >> + const src_reg mask = >> + > brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE, > BRW_ARF_STATE, 0), >> + 0, 4, 1), >> +BRW_REGISTER_TYPE_UD), >> + BRW_SWIZZLE_); > > Can we just do vec4_reg(brw_vmask_reg)? > That didn't seem to work, because both brw_dmask_reg() and brw_vmask_reg() return a register region that is equivalent to sr0.0. in Align16 addressing mode (maybe a bug of PATCH 1? It's unlikely to matter a lot in practice though), so the Z component where the DMask is stored in is not even part of the returned Align16 region. >> + >> + emit(ADD(tmp, mask, brw_imm_ud(1))); >> + emit(AND(tmp, mask, src_reg(tmp))); >> + >> + /* This will loop forever if the dispatch mask doesn't have the > expected >> + * form '2^n-1', in which case tmp will be non-zero. >> + */ >> + emit(BRW_OPCODE_DO); >> + emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0), >> + BRW_CONDITIONAL_NZ)); >> + emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL; >> + } >> + >> if (nir->num_uniforms > 0) >>nir_setup_uniforms(); >> >> -- >> 2.9.0 >> signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.
On Sep 16, 2016 3:04 PM, "Francisco Jerez"wrote: > > Not intended for upstream. Should cause a GPU hang if some thread is > executed with a non-contiguous dispatch mask breaking assumptions of > brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or > Piglit regressions, while replacing brw_stage_has_packed_dispatch() > with a dummy implementation that unconditionally returns true on top > of this patch causes multiple GPU hangs. > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 17 + > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 + > 2 files changed, 38 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 042203d..b3eec49 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -33,6 +33,23 @@ using namespace brw::surface_access; > void > fs_visitor::emit_nir_code() > { > + if (brw_stage_has_packed_dispatch(stage, prog_data)) { Mind adding "0 &&" and merging this patch so we remain aware of the issue, keep it building, and can easily test future hardware. > + const fs_builder ubld = bld.exec_all().group(1, 0); > + const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0); > + const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() : > + brw_dmask_reg()); > + > + ubld.ADD(tmp, mask, brw_imm_ud(1)); > + ubld.AND(tmp, mask, tmp); > + > + /* This will loop forever if the dispatch mask doesn't have the expected > + * form '2^n-1', in which case tmp will be non-zero. > + */ > + bld.emit(BRW_OPCODE_DO); > + bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ); > + set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE)); > + } > + > /* emit the arrays used for inputs and outputs - load/store intrinsics will > * be converted to reads/writes of these arrays > */ > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > index ba3bbdf..9f7a1f0 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > @@ -35,6 +35,27 @@ namespace brw { > void > vec4_visitor::emit_nir_code() > { > + if (brw_stage_has_packed_dispatch(stage, _data->base)) { > + const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type), > +WRITEMASK_X); > + const src_reg mask = > + brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_STATE, 0), > + 0, 4, 1), > +BRW_REGISTER_TYPE_UD), > + BRW_SWIZZLE_); Can we just do vec4_reg(brw_vmask_reg)? > + > + emit(ADD(tmp, mask, brw_imm_ud(1))); > + emit(AND(tmp, mask, src_reg(tmp))); > + > + /* This will loop forever if the dispatch mask doesn't have the expected > + * form '2^n-1', in which case tmp will be non-zero. > + */ > + emit(BRW_OPCODE_DO); > + emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0), > + BRW_CONDITIONAL_NZ)); > + emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL; > + } > + > if (nir->num_uniforms > 0) >nir_setup_uniforms(); > > -- > 2.9.0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.
Not intended for upstream. Should cause a GPU hang if some thread is executed with a non-contiguous dispatch mask breaking assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or Piglit regressions, while replacing brw_stage_has_packed_dispatch() with a dummy implementation that unconditionally returns true on top of this patch causes multiple GPU hangs. --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 17 + src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 + 2 files changed, 38 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 042203d..b3eec49 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -33,6 +33,23 @@ using namespace brw::surface_access; void fs_visitor::emit_nir_code() { + if (brw_stage_has_packed_dispatch(stage, prog_data)) { + const fs_builder ubld = bld.exec_all().group(1, 0); + const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0); + const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() : + brw_dmask_reg()); + + ubld.ADD(tmp, mask, brw_imm_ud(1)); + ubld.AND(tmp, mask, tmp); + + /* This will loop forever if the dispatch mask doesn't have the expected + * form '2^n-1', in which case tmp will be non-zero. + */ + bld.emit(BRW_OPCODE_DO); + bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ); + set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE)); + } + /* emit the arrays used for inputs and outputs - load/store intrinsics will * be converted to reads/writes of these arrays */ diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index ba3bbdf..9f7a1f0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -35,6 +35,27 @@ namespace brw { void vec4_visitor::emit_nir_code() { + if (brw_stage_has_packed_dispatch(stage, _data->base)) { + const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type), +WRITEMASK_X); + const src_reg mask = + brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_STATE, 0), + 0, 4, 1), +BRW_REGISTER_TYPE_UD), + BRW_SWIZZLE_); + + emit(ADD(tmp, mask, brw_imm_ud(1))); + emit(AND(tmp, mask, src_reg(tmp))); + + /* This will loop forever if the dispatch mask doesn't have the expected + * form '2^n-1', in which case tmp will be non-zero. + */ + emit(BRW_OPCODE_DO); + emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0), + BRW_CONDITIONAL_NZ)); + emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL; + } + if (nir->num_uniforms > 0) nir_setup_uniforms(); -- 2.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev