Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT
On 20/02/17 21:31, Francisco Jerez wrote: > Samuel Iglesias Gonsálvezwrites: > >> On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote: >>> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote: Samuel Iglesias Gonsálvez writes: > The lowered BSW/BXT indirect move instructions had incorrect > source types, which luckily wasn't causing incorrect assembly to > be > generated due to the bug fixed in the next patch, but would have > confused the remaining back-end IR infrastructure due to the > mismatch > between the IR source types and the emitted machine code. > > v2: > - Improve commit log (Curro) > - Fix read_size (Curro) > - Fix DF uniform array detection in assign_constant_locations() > when > it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. > > Signed-off-by: Samuel Iglesias Gonsálvez > Cc: "17.0" > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 11 - > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 -- > -- > 2 files changed, 30 insertions(+), 22 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index c348bc7138d..93ab84b5845 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations() > unsigned last = constant_nr + (inst->src[2].ud / 4) > - > 1; > assert(last < uniforms); > > +bool supports_64bit_indirects = > + !devinfo->is_cherryview && !devinfo->is_broxton; > +/* Detect if this is as a result 64-bit MOV > INDIRECT. > In case > + * of BSW/BXT, we substitute each DF MOV INDIRECT by > two 32-bit MOV > + * INDIRECT. > + */ > +bool mov_indirect_64bit = (type_sz(inst- >> src[i].type) > == 8) || > + (!supports_64bit_indirects && inst->dst.type == > BRW_REGISTER_TYPE_UD && > +inst->src[0].type == BRW_REGISTER_TYPE_UD && > inst- >> dst.stride == 2); This seems kind of fragile, I don't think the optimizer gives you any guarantees that the stride of a lowered 64-bit indirect move will remain equal to two, or that the destination stride of an actual 32-bit indirect uniform load will never end up being two as well. That said, because you access these with 32-bit indirect moves, I don't see why you'd need to treat them as 64-bit uniforms here, the usual alignment requirements for 64-bit uniforms no longer apply, so you can treat them as regular 32-bit uniforms AFAICT. Why did you add this hunk? >>> >>> I added it because of this case: if we access to one DF uniform array >>> element with a normal MOV and the rest with MOV INDIRECT, we will >>> mark >>> the former as a live 64bit variable. Then we have the array scattered >>> as part of it is uploaded as a 64-bits uniform and the other as 32- >>> bits. Even if we avoid this by uploading everything together as 32- >>> bits, then the access to that DF could not be aligned to 64-bits. >>> >>> So my idea was to find a way to identify somehow those MOV INDIRECT >>> in >>> BSW to mark all the array as a 64-bit one. >>> >> >> Mmm, maybe I can fix this case without the hack I did. I can add the >> following code after marking all live variables accessed by the >> instructions. >> >> It is very similar to the one to detect live variables but it is fixing >> the case where any MOV INDIRECT in BSW is accessing to an uniform array >> of DF elements where one of these elements is directly accessed by >> another instruction. >> >> What do you think? >> > > Looks somewhat better, but I don't think this is correct if you have > multiple overlapping indirect loads of the same uniform array and only > one of them overlaps with a direct 64-bit load. In that case, I think this is correct. The 2 32-bit MOV INDIRECTs where emitted as a "lowering" of an unsupported 64-bit MOV INDIRECT. They both keep the 'read_size' of the original one, so they both overlap to any other direct 64-bit load to that array like with the original instruction. If none of them overlap to the direct 64-bit access, then I think they can be handled as non-contiguous to the latter without any issue. > Apparently > assign_constant_locations() already has code to attempt to push > indirectly accessed uniform sections contiguously, but the logic seems > broken in multiple ways, even on platforms other than CHV/BXT... The > first is_live_64bit location assignment loop doesn't consider non-64bit > uniforms even if they're marked as contiguous with a 64-bit
Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT
Samuel Iglesias Gonsálvezwrites: > On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote: >> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote: >> > Samuel Iglesias Gonsálvez writes: >> > >> > > The lowered BSW/BXT indirect move instructions had incorrect >> > > source types, which luckily wasn't causing incorrect assembly to >> > > be >> > > generated due to the bug fixed in the next patch, but would have >> > > confused the remaining back-end IR infrastructure due to the >> > > mismatch >> > > between the IR source types and the emitted machine code. >> > > >> > > v2: >> > > - Improve commit log (Curro) >> > > - Fix read_size (Curro) >> > > - Fix DF uniform array detection in assign_constant_locations() >> > > when >> > > it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. >> > > >> > > Signed-off-by: Samuel Iglesias Gonsálvez >> > > Cc: "17.0" >> > > --- >> > > src/mesa/drivers/dri/i965/brw_fs.cpp | 11 - >> > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 -- >> > > -- >> > > 2 files changed, 30 insertions(+), 22 deletions(-) >> > > >> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > index c348bc7138d..93ab84b5845 100644 >> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> > > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations() >> > > unsigned last = constant_nr + (inst->src[2].ud / 4) >> > > - >> > > 1; >> > > assert(last < uniforms); >> > > >> > > +bool supports_64bit_indirects = >> > > + !devinfo->is_cherryview && !devinfo->is_broxton; >> > > +/* Detect if this is as a result 64-bit MOV >> > > INDIRECT. >> > > In case >> > > + * of BSW/BXT, we substitute each DF MOV INDIRECT by >> > > two 32-bit MOV >> > > + * INDIRECT. >> > > + */ >> > > +bool mov_indirect_64bit = (type_sz(inst- >> > > >src[i].type) >> > > == 8) || >> > > + (!supports_64bit_indirects && inst->dst.type == >> > > BRW_REGISTER_TYPE_UD && >> > > +inst->src[0].type == BRW_REGISTER_TYPE_UD && >> > > inst- >> > > > dst.stride == 2); >> > >> > This seems kind of fragile, I don't think the optimizer gives you >> > any >> > guarantees that the stride of a lowered 64-bit indirect move will >> > remain >> > equal to two, or that the destination stride of an actual 32-bit >> > indirect uniform load will never end up being two as well. That >> > said, >> > because you access these with 32-bit indirect moves, I don't see >> > why >> > you'd need to treat them as 64-bit uniforms here, the usual >> > alignment >> > requirements for 64-bit uniforms no longer apply, so you can treat >> > them >> > as regular 32-bit uniforms AFAICT. Why did you add this hunk? >> > >> >> I added it because of this case: if we access to one DF uniform array >> element with a normal MOV and the rest with MOV INDIRECT, we will >> mark >> the former as a live 64bit variable. Then we have the array scattered >> as part of it is uploaded as a 64-bits uniform and the other as 32- >> bits. Even if we avoid this by uploading everything together as 32- >> bits, then the access to that DF could not be aligned to 64-bits. >> >> So my idea was to find a way to identify somehow those MOV INDIRECT >> in >> BSW to mark all the array as a 64-bit one. >> > > Mmm, maybe I can fix this case without the hack I did. I can add the > following code after marking all live variables accessed by the > instructions. > > It is very similar to the one to detect live variables but it is fixing > the case where any MOV INDIRECT in BSW is accessing to an uniform array > of DF elements where one of these elements is directly accessed by > another instruction. > > What do you think? > Looks somewhat better, but I don't think this is correct if you have multiple overlapping indirect loads of the same uniform array and only one of them overlaps with a direct 64-bit load. Apparently assign_constant_locations() already has code to attempt to push indirectly accessed uniform sections contiguously, but the logic seems broken in multiple ways, even on platforms other than CHV/BXT... The first is_live_64bit location assignment loop doesn't consider non-64bit uniforms even if they're marked as contiguous with a 64-bit uniform, because the contiguous block tracking is done from within set_push_pull_constant_loc() which may not be called at all if the continue condition evaluates to true at the top of the loop. Also AFAICT this could potentially lead to a very large amount of data being considered as contiguous if you have a contiguous 64-bit block, then random unrelated 32-bit data, and then another contiguous 64-bit block, because the very last element of a MOV_INDIRECT
Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT
On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote: > On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote: > > Samuel Iglesias Gonsálvezwrites: > > > > > The lowered BSW/BXT indirect move instructions had incorrect > > > source types, which luckily wasn't causing incorrect assembly to > > > be > > > generated due to the bug fixed in the next patch, but would have > > > confused the remaining back-end IR infrastructure due to the > > > mismatch > > > between the IR source types and the emitted machine code. > > > > > > v2: > > > - Improve commit log (Curro) > > > - Fix read_size (Curro) > > > - Fix DF uniform array detection in assign_constant_locations() > > > when > > > it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. > > > > > > Signed-off-by: Samuel Iglesias Gonsálvez > > > Cc: "17.0" > > > --- > > > src/mesa/drivers/dri/i965/brw_fs.cpp | 11 - > > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 -- > > > -- > > > 2 files changed, 30 insertions(+), 22 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > index c348bc7138d..93ab84b5845 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations() > > > unsigned last = constant_nr + (inst->src[2].ud / 4) > > > - > > > 1; > > > assert(last < uniforms); > > > > > > +bool supports_64bit_indirects = > > > + !devinfo->is_cherryview && !devinfo->is_broxton; > > > +/* Detect if this is as a result 64-bit MOV > > > INDIRECT. > > > In case > > > + * of BSW/BXT, we substitute each DF MOV INDIRECT by > > > two 32-bit MOV > > > + * INDIRECT. > > > + */ > > > +bool mov_indirect_64bit = (type_sz(inst- > > > >src[i].type) > > > == 8) || > > > + (!supports_64bit_indirects && inst->dst.type == > > > BRW_REGISTER_TYPE_UD && > > > +inst->src[0].type == BRW_REGISTER_TYPE_UD && > > > inst- > > > > dst.stride == 2); > > > > This seems kind of fragile, I don't think the optimizer gives you > > any > > guarantees that the stride of a lowered 64-bit indirect move will > > remain > > equal to two, or that the destination stride of an actual 32-bit > > indirect uniform load will never end up being two as well. That > > said, > > because you access these with 32-bit indirect moves, I don't see > > why > > you'd need to treat them as 64-bit uniforms here, the usual > > alignment > > requirements for 64-bit uniforms no longer apply, so you can treat > > them > > as regular 32-bit uniforms AFAICT. Why did you add this hunk? > > > > I added it because of this case: if we access to one DF uniform array > element with a normal MOV and the rest with MOV INDIRECT, we will > mark > the former as a live 64bit variable. Then we have the array scattered > as part of it is uploaded as a 64-bits uniform and the other as 32- > bits. Even if we avoid this by uploading everything together as 32- > bits, then the access to that DF could not be aligned to 64-bits. > > So my idea was to find a way to identify somehow those MOV INDIRECT > in > BSW to mark all the array as a 64-bit one. > Mmm, maybe I can fix this case without the hack I did. I can add the following code after marking all live variables accessed by the instructions. It is very similar to the one to detect live variables but it is fixing the case where any MOV INDIRECT in BSW is accessing to an uniform array of DF elements where one of these elements is directly accessed by another instruction. What do you think? Sam + /* Detect if we accessed to an array of DF uniforms in both direct and +* indirect ways on BSW/BXT. If this is the case, then we need +* to take into account in order to push all the elements together. +*/ + if (devinfo->is_cherryview || devinfo->is_broxton) { + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { + for (int i = 0 ; i < inst->sources; i++) { +if (inst->src[i].file != UNIFORM) + continue; + +int constant_nr = inst->src[i].nr + inst->src[i].offset / 4; + +if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { + assert(inst->src[2].ud % 4 == 0); + unsigned last = constant_nr + (inst->src[2].ud / 4) - 1; + assert(last < uniforms); + + bool is_df_array = false; + for (unsigned j = constant_nr; j < last; j++) { + if (is_live_64bit[j]) { + is_df_array = true; + break; + } + } + + if (is_df_array) { + /* If part of it is accessed with direct addressing as a
Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT
On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote: > Samuel Iglesias Gonsálvezwrites: > > > The lowered BSW/BXT indirect move instructions had incorrect > > source types, which luckily wasn't causing incorrect assembly to be > > generated due to the bug fixed in the next patch, but would have > > confused the remaining back-end IR infrastructure due to the > > mismatch > > between the IR source types and the emitted machine code. > > > > v2: > > - Improve commit log (Curro) > > - Fix read_size (Curro) > > - Fix DF uniform array detection in assign_constant_locations() > > when > > it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. > > > > Signed-off-by: Samuel Iglesias Gonsálvez > > Cc: "17.0" > > --- > > src/mesa/drivers/dri/i965/brw_fs.cpp | 11 - > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 -- > > -- > > 2 files changed, 30 insertions(+), 22 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > index c348bc7138d..93ab84b5845 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations() > > unsigned last = constant_nr + (inst->src[2].ud / 4) - > > 1; > > assert(last < uniforms); > > > > +bool supports_64bit_indirects = > > + !devinfo->is_cherryview && !devinfo->is_broxton; > > +/* Detect if this is as a result 64-bit MOV INDIRECT. > > In case > > + * of BSW/BXT, we substitute each DF MOV INDIRECT by > > two 32-bit MOV > > + * INDIRECT. > > + */ > > +bool mov_indirect_64bit = (type_sz(inst->src[i].type) > > == 8) || > > + (!supports_64bit_indirects && inst->dst.type == > > BRW_REGISTER_TYPE_UD && > > +inst->src[0].type == BRW_REGISTER_TYPE_UD && inst- > > >dst.stride == 2); > > This seems kind of fragile, I don't think the optimizer gives you any > guarantees that the stride of a lowered 64-bit indirect move will > remain > equal to two, or that the destination stride of an actual 32-bit > indirect uniform load will never end up being two as well. That > said, > because you access these with 32-bit indirect moves, I don't see why > you'd need to treat them as 64-bit uniforms here, the usual alignment > requirements for 64-bit uniforms no longer apply, so you can treat > them > as regular 32-bit uniforms AFAICT. Why did you add this hunk? > I added it because of this case: if we access to one DF uniform array element with a normal MOV and the rest with MOV INDIRECT, we will mark the former as a live 64bit variable. Then we have the array scattered as part of it is uploaded as a 64-bits uniform and the other as 32- bits. Even if we avoid this by uploading everything together as 32- bits, then the access to that DF could not be aligned to 64-bits. So my idea was to find a way to identify somehow those MOV INDIRECT in BSW to mark all the array as a 64-bit one. > Other than that patch looks good to me. > OK, thanks. Sam > > for (unsigned j = constant_nr; j < last; j++) { > > is_live[j] = true; > > contiguous[j] = true; > > - if (type_sz(inst->src[i].type) == 8) { > > + if (mov_indirect_64bit) { > > is_live_64bit[j] = true; > > } > > } > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > index 991c20fad62..4aaf84964e9 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const > > fs_builder , nir_intrinsic_instr *instr > > unsigned read_size = instr->const_index[1] - > > (instr->num_components - 1) * type_sz(dest.type); > > > > - fs_reg indirect_chv_high_32bit; > > - bool is_chv_bxt_64bit = > > -(devinfo->is_cherryview || devinfo->is_broxton) && > > -type_sz(dest.type) == 8; > > - if (is_chv_bxt_64bit) { > > -indirect_chv_high_32bit = vgrf(glsl_type::uint_type); > > -/* Calculate indirect address to read high 32 bits */ > > -bld.ADD(indirect_chv_high_32bit, indirect, > > brw_imm_ud(4)); > > - } > > + bool supports_64bit_indirects = > > +!devinfo->is_cherryview && !devinfo->is_broxton; > > > > - for (unsigned j = 0; j < instr->num_components; j++) { > > -if (!is_chv_bxt_64bit) { > > + if (type_sz(dest.type) != 8 || supports_64bit_indirects) > > { > > +for (unsigned j = 0; j < instr->num_components; j++) { > > bld.emit(SHADER_OPCODE_MOV_INDIRECT, > >
Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT
Samuel Iglesias Gonsálvezwrites: > The lowered BSW/BXT indirect move instructions had incorrect > source types, which luckily wasn't causing incorrect assembly to be > generated due to the bug fixed in the next patch, but would have > confused the remaining back-end IR infrastructure due to the mismatch > between the IR source types and the emitted machine code. > > v2: > - Improve commit log (Curro) > - Fix read_size (Curro) > - Fix DF uniform array detection in assign_constant_locations() when > it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. > > Signed-off-by: Samuel Iglesias Gonsálvez > Cc: "17.0" > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 11 - > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 > > 2 files changed, 30 insertions(+), 22 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index c348bc7138d..93ab84b5845 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations() > unsigned last = constant_nr + (inst->src[2].ud / 4) - 1; > assert(last < uniforms); > > +bool supports_64bit_indirects = > + !devinfo->is_cherryview && !devinfo->is_broxton; > +/* Detect if this is as a result 64-bit MOV INDIRECT. In case > + * of BSW/BXT, we substitute each DF MOV INDIRECT by two 32-bit > MOV > + * INDIRECT. > + */ > +bool mov_indirect_64bit = (type_sz(inst->src[i].type) == 8) || > + (!supports_64bit_indirects && inst->dst.type == > BRW_REGISTER_TYPE_UD && > +inst->src[0].type == BRW_REGISTER_TYPE_UD && > inst->dst.stride == 2); This seems kind of fragile, I don't think the optimizer gives you any guarantees that the stride of a lowered 64-bit indirect move will remain equal to two, or that the destination stride of an actual 32-bit indirect uniform load will never end up being two as well. That said, because you access these with 32-bit indirect moves, I don't see why you'd need to treat them as 64-bit uniforms here, the usual alignment requirements for 64-bit uniforms no longer apply, so you can treat them as regular 32-bit uniforms AFAICT. Why did you add this hunk? Other than that patch looks good to me. > for (unsigned j = constant_nr; j < last; j++) { > is_live[j] = true; > contiguous[j] = true; > - if (type_sz(inst->src[i].type) == 8) { > + if (mov_indirect_64bit) { >is_live_64bit[j] = true; > } > } > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 991c20fad62..4aaf84964e9 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , > nir_intrinsic_instr *instr > unsigned read_size = instr->const_index[1] - > (instr->num_components - 1) * type_sz(dest.type); > > - fs_reg indirect_chv_high_32bit; > - bool is_chv_bxt_64bit = > -(devinfo->is_cherryview || devinfo->is_broxton) && > -type_sz(dest.type) == 8; > - if (is_chv_bxt_64bit) { > -indirect_chv_high_32bit = vgrf(glsl_type::uint_type); > -/* Calculate indirect address to read high 32 bits */ > -bld.ADD(indirect_chv_high_32bit, indirect, brw_imm_ud(4)); > - } > + bool supports_64bit_indirects = > +!devinfo->is_cherryview && !devinfo->is_broxton; > > - for (unsigned j = 0; j < instr->num_components; j++) { > -if (!is_chv_bxt_64bit) { > + if (type_sz(dest.type) != 8 || supports_64bit_indirects) { > +for (unsigned j = 0; j < instr->num_components; j++) { > bld.emit(SHADER_OPCODE_MOV_INDIRECT, > offset(dest, bld, j), offset(src, bld, j), > indirect, brw_imm_ud(read_size)); > -} else { > - bld.emit(SHADER_OPCODE_MOV_INDIRECT, > -subscript(offset(dest, bld, j), > BRW_REGISTER_TYPE_UD, 0), > -offset(src, bld, j), > -indirect, brw_imm_ud(read_size)); > - > - bld.emit(SHADER_OPCODE_MOV_INDIRECT, > -subscript(offset(dest, bld, j), > BRW_REGISTER_TYPE_UD, 1), > -offset(src, bld, j), > -indirect_chv_high_32bit, brw_imm_ud(read_size)); > +} > + } else { > +const unsigned num_mov_indirects = > + type_sz(dest.type) / type_sz(BRW_REGISTER_TYPE_UD); > +
[Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT
The lowered BSW/BXT indirect move instructions had incorrect source types, which luckily wasn't causing incorrect assembly to be generated due to the bug fixed in the next patch, but would have confused the remaining back-end IR infrastructure due to the mismatch between the IR source types and the emitted machine code. v2: - Improve commit log (Curro) - Fix read_size (Curro) - Fix DF uniform array detection in assign_constant_locations() when it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. Signed-off-by: Samuel Iglesias GonsálvezCc: "17.0" --- src/mesa/drivers/dri/i965/brw_fs.cpp | 11 - src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 2 files changed, 30 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c348bc7138d..93ab84b5845 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations() unsigned last = constant_nr + (inst->src[2].ud / 4) - 1; assert(last < uniforms); +bool supports_64bit_indirects = + !devinfo->is_cherryview && !devinfo->is_broxton; +/* Detect if this is as a result 64-bit MOV INDIRECT. In case + * of BSW/BXT, we substitute each DF MOV INDIRECT by two 32-bit MOV + * INDIRECT. + */ +bool mov_indirect_64bit = (type_sz(inst->src[i].type) == 8) || + (!supports_64bit_indirects && inst->dst.type == BRW_REGISTER_TYPE_UD && +inst->src[0].type == BRW_REGISTER_TYPE_UD && inst->dst.stride == 2); for (unsigned j = constant_nr; j < last; j++) { is_live[j] = true; contiguous[j] = true; - if (type_sz(inst->src[i].type) == 8) { + if (mov_indirect_64bit) { is_live_64bit[j] = true; } } diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 991c20fad62..4aaf84964e9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , nir_intrinsic_instr *instr unsigned read_size = instr->const_index[1] - (instr->num_components - 1) * type_sz(dest.type); - fs_reg indirect_chv_high_32bit; - bool is_chv_bxt_64bit = -(devinfo->is_cherryview || devinfo->is_broxton) && -type_sz(dest.type) == 8; - if (is_chv_bxt_64bit) { -indirect_chv_high_32bit = vgrf(glsl_type::uint_type); -/* Calculate indirect address to read high 32 bits */ -bld.ADD(indirect_chv_high_32bit, indirect, brw_imm_ud(4)); - } + bool supports_64bit_indirects = +!devinfo->is_cherryview && !devinfo->is_broxton; - for (unsigned j = 0; j < instr->num_components; j++) { -if (!is_chv_bxt_64bit) { + if (type_sz(dest.type) != 8 || supports_64bit_indirects) { +for (unsigned j = 0; j < instr->num_components; j++) { bld.emit(SHADER_OPCODE_MOV_INDIRECT, offset(dest, bld, j), offset(src, bld, j), indirect, brw_imm_ud(read_size)); -} else { - bld.emit(SHADER_OPCODE_MOV_INDIRECT, -subscript(offset(dest, bld, j), BRW_REGISTER_TYPE_UD, 0), -offset(src, bld, j), -indirect, brw_imm_ud(read_size)); - - bld.emit(SHADER_OPCODE_MOV_INDIRECT, -subscript(offset(dest, bld, j), BRW_REGISTER_TYPE_UD, 1), -offset(src, bld, j), -indirect_chv_high_32bit, brw_imm_ud(read_size)); +} + } else { +const unsigned num_mov_indirects = + type_sz(dest.type) / type_sz(BRW_REGISTER_TYPE_UD); +/* We read a little bit less per MOV INDIRECT, as they are now + * 32-bits ones instead of 64-bit. Fix read_size then. + */ +const unsigned read_size_32bit = read_size - +(num_mov_indirects - 1) * type_sz(BRW_REGISTER_TYPE_UD); +for (unsigned j = 0; j < instr->num_components; j++) { + for (unsigned i = 0; i < num_mov_indirects; i++) { + bld.emit(SHADER_OPCODE_MOV_INDIRECT, + subscript(offset(dest, bld, j), BRW_REGISTER_TYPE_UD, i), + subscript(offset(src, bld, j), BRW_REGISTER_TYPE_UD, i), + indirect, brw_imm_ud(read_size_32bit)); + } } } } -- 2.11.0 ___ mesa-dev