Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

2017-02-21 Thread Samuel Iglesias Gonsálvez
On 20/02/17 21:31, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
>> On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote:
>>> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote:
 Samuel Iglesias Gonsálvez  writes:

> The lowered BSW/BXT indirect move instructions had incorrect
> source types, which luckily wasn't causing incorrect assembly to
> be
> generated due to the bug fixed in the next patch, but would have
> confused the remaining back-end IR infrastructure due to the
> mismatch
> between the IR source types and the emitted machine code.
>
> v2:
> - Improve commit log (Curro)
> - Fix read_size (Curro)
> - Fix DF uniform array detection in assign_constant_locations()
> when
>   it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> Cc: "17.0" 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 11 -
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 --
> --
>  2 files changed, 30 insertions(+), 22 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index c348bc7138d..93ab84b5845 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
>  unsigned last = constant_nr + (inst->src[2].ud / 4)
> -
> 1;
>  assert(last < uniforms);
>  
> +bool supports_64bit_indirects =
> +   !devinfo->is_cherryview && !devinfo->is_broxton;
> +/* Detect if this is as a result 64-bit MOV
> INDIRECT.
> In case
> + * of BSW/BXT, we substitute each DF MOV INDIRECT by
> two 32-bit MOV
> + * INDIRECT.
> + */
> +bool mov_indirect_64bit = (type_sz(inst-
>> src[i].type)
> == 8) ||
> +   (!supports_64bit_indirects && inst->dst.type ==
> BRW_REGISTER_TYPE_UD &&
> +inst->src[0].type == BRW_REGISTER_TYPE_UD &&
> inst-
>> dst.stride == 2);

 This seems kind of fragile, I don't think the optimizer gives you
 any
 guarantees that the stride of a lowered 64-bit indirect move will
 remain
 equal to two, or that the destination stride of an actual 32-bit
 indirect uniform load will never end up being two as well.  That
 said,
 because you access these with 32-bit indirect moves, I don't see
 why
 you'd need to treat them as 64-bit uniforms here, the usual
 alignment
 requirements for 64-bit uniforms no longer apply, so you can treat
 them
 as regular 32-bit uniforms AFAICT.  Why did you add this hunk?

>>>
>>> I added it because of this case: if we access to one DF uniform array
>>> element with a normal MOV and the rest with MOV INDIRECT, we will
>>> mark
>>> the former as a live 64bit variable. Then we have the array scattered
>>> as part of it is uploaded as a 64-bits uniform and the other as 32-
>>> bits. Even if we avoid this by uploading everything together as 32-
>>> bits, then the access to that DF could not be aligned to 64-bits.
>>>
>>> So my idea was to find a way to identify somehow those MOV INDIRECT
>>> in
>>> BSW to mark all the array as a 64-bit one.
>>>
>>
>> Mmm, maybe I can fix this case without the hack I did. I can add the
>> following code after marking all live variables accessed by the
>> instructions.
>>
>> It is very similar to the one to detect live variables but it is fixing
>> the case where any MOV INDIRECT in BSW is accessing to an uniform array
>> of DF elements where one of these elements is directly accessed by
>> another instruction.
>>
>> What do you think?
>>
> 
> Looks somewhat better, but I don't think this is correct if you have
> multiple overlapping indirect loads of the same uniform array and only
> one of them overlaps with a direct 64-bit load.

In that case, I think this is correct. The 2 32-bit MOV INDIRECTs where
emitted as a "lowering" of an unsupported 64-bit MOV INDIRECT. They both
keep the 'read_size' of the original one, so they both overlap to any
other direct 64-bit load to that array like with the original
instruction. If none of them overlap to the direct 64-bit access, then I
think they can be handled as non-contiguous to the latter without any issue.

>  Apparently
> assign_constant_locations() already has code to attempt to push
> indirectly accessed uniform sections contiguously, but the logic seems
> broken in multiple ways, even on platforms other than CHV/BXT... The
> first is_live_64bit location assignment loop doesn't consider non-64bit
> uniforms even if they're marked as contiguous with a 64-bit 

Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

2017-02-20 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote:
>> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote:
>> > Samuel Iglesias Gonsálvez  writes:
>> > 
>> > > The lowered BSW/BXT indirect move instructions had incorrect
>> > > source types, which luckily wasn't causing incorrect assembly to
>> > > be
>> > > generated due to the bug fixed in the next patch, but would have
>> > > confused the remaining back-end IR infrastructure due to the
>> > > mismatch
>> > > between the IR source types and the emitted machine code.
>> > > 
>> > > v2:
>> > > - Improve commit log (Curro)
>> > > - Fix read_size (Curro)
>> > > - Fix DF uniform array detection in assign_constant_locations()
>> > > when
>> > >   it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
>> > > 
>> > > Signed-off-by: Samuel Iglesias Gonsálvez 
>> > > Cc: "17.0" 
>> > > ---
>> > >  src/mesa/drivers/dri/i965/brw_fs.cpp | 11 -
>> > >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 --
>> > > --
>> > >  2 files changed, 30 insertions(+), 22 deletions(-)
>> > > 
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > index c348bc7138d..93ab84b5845 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
>> > >  unsigned last = constant_nr + (inst->src[2].ud / 4)
>> > > -
>> > > 1;
>> > >  assert(last < uniforms);
>> > >  
>> > > +bool supports_64bit_indirects =
>> > > +   !devinfo->is_cherryview && !devinfo->is_broxton;
>> > > +/* Detect if this is as a result 64-bit MOV
>> > > INDIRECT.
>> > > In case
>> > > + * of BSW/BXT, we substitute each DF MOV INDIRECT by
>> > > two 32-bit MOV
>> > > + * INDIRECT.
>> > > + */
>> > > +bool mov_indirect_64bit = (type_sz(inst-
>> > > >src[i].type)
>> > > == 8) ||
>> > > +   (!supports_64bit_indirects && inst->dst.type ==
>> > > BRW_REGISTER_TYPE_UD &&
>> > > +inst->src[0].type == BRW_REGISTER_TYPE_UD &&
>> > > inst-
>> > > > dst.stride == 2);
>> > 
>> > This seems kind of fragile, I don't think the optimizer gives you
>> > any
>> > guarantees that the stride of a lowered 64-bit indirect move will
>> > remain
>> > equal to two, or that the destination stride of an actual 32-bit
>> > indirect uniform load will never end up being two as well.  That
>> > said,
>> > because you access these with 32-bit indirect moves, I don't see
>> > why
>> > you'd need to treat them as 64-bit uniforms here, the usual
>> > alignment
>> > requirements for 64-bit uniforms no longer apply, so you can treat
>> > them
>> > as regular 32-bit uniforms AFAICT.  Why did you add this hunk?
>> > 
>> 
>> I added it because of this case: if we access to one DF uniform array
>> element with a normal MOV and the rest with MOV INDIRECT, we will
>> mark
>> the former as a live 64bit variable. Then we have the array scattered
>> as part of it is uploaded as a 64-bits uniform and the other as 32-
>> bits. Even if we avoid this by uploading everything together as 32-
>> bits, then the access to that DF could not be aligned to 64-bits.
>> 
>> So my idea was to find a way to identify somehow those MOV INDIRECT
>> in
>> BSW to mark all the array as a 64-bit one.
>> 
>
> Mmm, maybe I can fix this case without the hack I did. I can add the
> following code after marking all live variables accessed by the
> instructions.
>
> It is very similar to the one to detect live variables but it is fixing
> the case where any MOV INDIRECT in BSW is accessing to an uniform array
> of DF elements where one of these elements is directly accessed by
> another instruction.
>
> What do you think?
>

Looks somewhat better, but I don't think this is correct if you have
multiple overlapping indirect loads of the same uniform array and only
one of them overlaps with a direct 64-bit load.  Apparently
assign_constant_locations() already has code to attempt to push
indirectly accessed uniform sections contiguously, but the logic seems
broken in multiple ways, even on platforms other than CHV/BXT...  The
first is_live_64bit location assignment loop doesn't consider non-64bit
uniforms even if they're marked as contiguous with a 64-bit uniform,
because the contiguous block tracking is done from within
set_push_pull_constant_loc() which may not be called at all if the
continue condition evaluates to true at the top of the loop.  Also
AFAICT this could potentially lead to a very large amount of data being
considered as contiguous if you have a contiguous 64-bit block, then
random unrelated 32-bit data, and then another contiguous 64-bit block,
because the very last element of a MOV_INDIRECT 

Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

2017-02-20 Thread Samuel Iglesias Gonsálvez
On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote:
> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote:
> > Samuel Iglesias Gonsálvez  writes:
> > 
> > > The lowered BSW/BXT indirect move instructions had incorrect
> > > source types, which luckily wasn't causing incorrect assembly to
> > > be
> > > generated due to the bug fixed in the next patch, but would have
> > > confused the remaining back-end IR infrastructure due to the
> > > mismatch
> > > between the IR source types and the emitted machine code.
> > > 
> > > v2:
> > > - Improve commit log (Curro)
> > > - Fix read_size (Curro)
> > > - Fix DF uniform array detection in assign_constant_locations()
> > > when
> > >   it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
> > > 
> > > Signed-off-by: Samuel Iglesias Gonsálvez 
> > > Cc: "17.0" 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs.cpp | 11 -
> > >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 --
> > > --
> > >  2 files changed, 30 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > index c348bc7138d..93ab84b5845 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
> > >  unsigned last = constant_nr + (inst->src[2].ud / 4)
> > > -
> > > 1;
> > >  assert(last < uniforms);
> > >  
> > > +bool supports_64bit_indirects =
> > > +   !devinfo->is_cherryview && !devinfo->is_broxton;
> > > +/* Detect if this is as a result 64-bit MOV
> > > INDIRECT.
> > > In case
> > > + * of BSW/BXT, we substitute each DF MOV INDIRECT by
> > > two 32-bit MOV
> > > + * INDIRECT.
> > > + */
> > > +bool mov_indirect_64bit = (type_sz(inst-
> > > >src[i].type)
> > > == 8) ||
> > > +   (!supports_64bit_indirects && inst->dst.type ==
> > > BRW_REGISTER_TYPE_UD &&
> > > +inst->src[0].type == BRW_REGISTER_TYPE_UD &&
> > > inst-
> > > > dst.stride == 2);
> > 
> > This seems kind of fragile, I don't think the optimizer gives you
> > any
> > guarantees that the stride of a lowered 64-bit indirect move will
> > remain
> > equal to two, or that the destination stride of an actual 32-bit
> > indirect uniform load will never end up being two as well.  That
> > said,
> > because you access these with 32-bit indirect moves, I don't see
> > why
> > you'd need to treat them as 64-bit uniforms here, the usual
> > alignment
> > requirements for 64-bit uniforms no longer apply, so you can treat
> > them
> > as regular 32-bit uniforms AFAICT.  Why did you add this hunk?
> > 
> 
> I added it because of this case: if we access to one DF uniform array
> element with a normal MOV and the rest with MOV INDIRECT, we will
> mark
> the former as a live 64bit variable. Then we have the array scattered
> as part of it is uploaded as a 64-bits uniform and the other as 32-
> bits. Even if we avoid this by uploading everything together as 32-
> bits, then the access to that DF could not be aligned to 64-bits.
> 
> So my idea was to find a way to identify somehow those MOV INDIRECT
> in
> BSW to mark all the array as a 64-bit one.
> 

Mmm, maybe I can fix this case without the hack I did. I can add the
following code after marking all live variables accessed by the
instructions.

It is very similar to the one to detect live variables but it is fixing
the case where any MOV INDIRECT in BSW is accessing to an uniform array
of DF elements where one of these elements is directly accessed by
another instruction.

What do you think?

Sam

+   /* Detect if we accessed to an array of DF uniforms in both direct
and
+* indirect ways on BSW/BXT. If this is the case, then we need
+* to take into account in order to push all the elements together.
+*/
+   if (devinfo->is_cherryview || devinfo->is_broxton) {
+  foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
+ for (int i = 0 ; i < inst->sources; i++) {
+if (inst->src[i].file != UNIFORM)
+   continue;
+
+int constant_nr = inst->src[i].nr + inst->src[i].offset /
4;
+
+if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0)
{
+   assert(inst->src[2].ud % 4 == 0);
+   unsigned last = constant_nr + (inst->src[2].ud / 4) -
1;
+   assert(last < uniforms);
+
+   bool is_df_array = false;
+   for (unsigned j = constant_nr; j < last; j++) {
+  if (is_live_64bit[j]) {
+ is_df_array = true;
+ break;
+  }
+   }
+
+   if (is_df_array) {
+  /* If part of it is accessed with direct addressing
as a 

Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

2017-02-19 Thread Samuel Iglesias Gonsálvez
On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
> > The lowered BSW/BXT indirect move instructions had incorrect
> > source types, which luckily wasn't causing incorrect assembly to be
> > generated due to the bug fixed in the next patch, but would have
> > confused the remaining back-end IR infrastructure due to the
> > mismatch
> > between the IR source types and the emitted machine code.
> > 
> > v2:
> > - Improve commit log (Curro)
> > - Fix read_size (Curro)
> > - Fix DF uniform array detection in assign_constant_locations()
> > when
> >   it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
> > 
> > Signed-off-by: Samuel Iglesias Gonsálvez 
> > Cc: "17.0" 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 11 -
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 --
> > --
> >  2 files changed, 30 insertions(+), 22 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index c348bc7138d..93ab84b5845 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
> >  unsigned last = constant_nr + (inst->src[2].ud / 4) -
> > 1;
> >  assert(last < uniforms);
> >  
> > +bool supports_64bit_indirects =
> > +   !devinfo->is_cherryview && !devinfo->is_broxton;
> > +/* Detect if this is as a result 64-bit MOV INDIRECT.
> > In case
> > + * of BSW/BXT, we substitute each DF MOV INDIRECT by
> > two 32-bit MOV
> > + * INDIRECT.
> > + */
> > +bool mov_indirect_64bit = (type_sz(inst->src[i].type)
> > == 8) ||
> > +   (!supports_64bit_indirects && inst->dst.type ==
> > BRW_REGISTER_TYPE_UD &&
> > +inst->src[0].type == BRW_REGISTER_TYPE_UD && inst-
> > >dst.stride == 2);
> 
> This seems kind of fragile, I don't think the optimizer gives you any
> guarantees that the stride of a lowered 64-bit indirect move will
> remain
> equal to two, or that the destination stride of an actual 32-bit
> indirect uniform load will never end up being two as well.  That
> said,
> because you access these with 32-bit indirect moves, I don't see why
> you'd need to treat them as 64-bit uniforms here, the usual alignment
> requirements for 64-bit uniforms no longer apply, so you can treat
> them
> as regular 32-bit uniforms AFAICT.  Why did you add this hunk?
> 

I added it because of this case: if we access to one DF uniform array
element with a normal MOV and the rest with MOV INDIRECT, we will mark
the former as a live 64bit variable. Then we have the array scattered
as part of it is uploaded as a 64-bits uniform and the other as 32-
bits. Even if we avoid this by uploading everything together as 32-
bits, then the access to that DF could not be aligned to 64-bits.

So my idea was to find a way to identify somehow those MOV INDIRECT in
BSW to mark all the array as a 64-bit one.

> Other than that patch looks good to me.
> 

OK, thanks.

Sam

> >  for (unsigned j = constant_nr; j < last; j++) {
> > is_live[j] = true;
> > contiguous[j] = true;
> > -   if (type_sz(inst->src[i].type) == 8) {
> > +   if (mov_indirect_64bit) {
> >    is_live_64bit[j] = true;
> > }
> >  }
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index 991c20fad62..4aaf84964e9 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const
> > fs_builder , nir_intrinsic_instr *instr
> >   unsigned read_size = instr->const_index[1] -
> >  (instr->num_components - 1) * type_sz(dest.type);
> >  
> > - fs_reg indirect_chv_high_32bit;
> > - bool is_chv_bxt_64bit =
> > -(devinfo->is_cherryview || devinfo->is_broxton) &&
> > -type_sz(dest.type) == 8;
> > - if (is_chv_bxt_64bit) {
> > -indirect_chv_high_32bit = vgrf(glsl_type::uint_type);
> > -/* Calculate indirect address to read high 32 bits */
> > -bld.ADD(indirect_chv_high_32bit, indirect,
> > brw_imm_ud(4));
> > - }
> > + bool supports_64bit_indirects =
> > +!devinfo->is_cherryview && !devinfo->is_broxton;
> >  
> > - for (unsigned j = 0; j < instr->num_components; j++) {
> > -if (!is_chv_bxt_64bit) {
> > + if (type_sz(dest.type) != 8 || supports_64bit_indirects)
> > {
> > +for (unsigned j = 0; j < instr->num_components; j++) {
> > bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > 

Re: [Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

2017-02-18 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> The lowered BSW/BXT indirect move instructions had incorrect
> source types, which luckily wasn't causing incorrect assembly to be
> generated due to the bug fixed in the next patch, but would have
> confused the remaining back-end IR infrastructure due to the mismatch
> between the IR source types and the emitted machine code.
>
> v2:
> - Improve commit log (Curro)
> - Fix read_size (Curro)
> - Fix DF uniform array detection in assign_constant_locations() when
>   it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> Cc: "17.0" 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 11 -
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 
> 
>  2 files changed, 30 insertions(+), 22 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index c348bc7138d..93ab84b5845 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
>  unsigned last = constant_nr + (inst->src[2].ud / 4) - 1;
>  assert(last < uniforms);
>  
> +bool supports_64bit_indirects =
> +   !devinfo->is_cherryview && !devinfo->is_broxton;
> +/* Detect if this is as a result 64-bit MOV INDIRECT. In case
> + * of BSW/BXT, we substitute each DF MOV INDIRECT by two 32-bit 
> MOV
> + * INDIRECT.
> + */
> +bool mov_indirect_64bit = (type_sz(inst->src[i].type) == 8) ||
> +   (!supports_64bit_indirects && inst->dst.type == 
> BRW_REGISTER_TYPE_UD &&
> +inst->src[0].type == BRW_REGISTER_TYPE_UD && 
> inst->dst.stride == 2);

This seems kind of fragile, I don't think the optimizer gives you any
guarantees that the stride of a lowered 64-bit indirect move will remain
equal to two, or that the destination stride of an actual 32-bit
indirect uniform load will never end up being two as well.  That said,
because you access these with 32-bit indirect moves, I don't see why
you'd need to treat them as 64-bit uniforms here, the usual alignment
requirements for 64-bit uniforms no longer apply, so you can treat them
as regular 32-bit uniforms AFAICT.  Why did you add this hunk?

Other than that patch looks good to me.

>  for (unsigned j = constant_nr; j < last; j++) {
> is_live[j] = true;
> contiguous[j] = true;
> -   if (type_sz(inst->src[i].type) == 8) {
> +   if (mov_indirect_64bit) {
>is_live_64bit[j] = true;
> }
>  }
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 991c20fad62..4aaf84964e9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>   unsigned read_size = instr->const_index[1] -
>  (instr->num_components - 1) * type_sz(dest.type);
>  
> - fs_reg indirect_chv_high_32bit;
> - bool is_chv_bxt_64bit =
> -(devinfo->is_cherryview || devinfo->is_broxton) &&
> -type_sz(dest.type) == 8;
> - if (is_chv_bxt_64bit) {
> -indirect_chv_high_32bit = vgrf(glsl_type::uint_type);
> -/* Calculate indirect address to read high 32 bits */
> -bld.ADD(indirect_chv_high_32bit, indirect, brw_imm_ud(4));
> - }
> + bool supports_64bit_indirects =
> +!devinfo->is_cherryview && !devinfo->is_broxton;
>  
> - for (unsigned j = 0; j < instr->num_components; j++) {
> -if (!is_chv_bxt_64bit) {
> + if (type_sz(dest.type) != 8 || supports_64bit_indirects) {
> +for (unsigned j = 0; j < instr->num_components; j++) {
> bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>  offset(dest, bld, j), offset(src, bld, j),
>  indirect, brw_imm_ud(read_size));
> -} else {
> -   bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> -subscript(offset(dest, bld, j), 
> BRW_REGISTER_TYPE_UD, 0),
> -offset(src, bld, j),
> -indirect, brw_imm_ud(read_size));
> -
> -   bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> -subscript(offset(dest, bld, j), 
> BRW_REGISTER_TYPE_UD, 1),
> -offset(src, bld, j),
> -indirect_chv_high_32bit, brw_imm_ud(read_size));
> +}
> + } else {
> +const unsigned num_mov_indirects =
> +   type_sz(dest.type) / type_sz(BRW_REGISTER_TYPE_UD);
> +

[Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

2017-02-16 Thread Samuel Iglesias Gonsálvez
The lowered BSW/BXT indirect move instructions had incorrect
source types, which luckily wasn't causing incorrect assembly to be
generated due to the bug fixed in the next patch, but would have
confused the remaining back-end IR infrastructure due to the mismatch
between the IR source types and the emitted machine code.

v2:
- Improve commit log (Curro)
- Fix read_size (Curro)
- Fix DF uniform array detection in assign_constant_locations() when
  it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.

Signed-off-by: Samuel Iglesias Gonsálvez 
Cc: "17.0" 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 11 -
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index c348bc7138d..93ab84b5845 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
 unsigned last = constant_nr + (inst->src[2].ud / 4) - 1;
 assert(last < uniforms);
 
+bool supports_64bit_indirects =
+   !devinfo->is_cherryview && !devinfo->is_broxton;
+/* Detect if this is as a result 64-bit MOV INDIRECT. In case
+ * of BSW/BXT, we substitute each DF MOV INDIRECT by two 32-bit MOV
+ * INDIRECT.
+ */
+bool mov_indirect_64bit = (type_sz(inst->src[i].type) == 8) ||
+   (!supports_64bit_indirects && inst->dst.type == 
BRW_REGISTER_TYPE_UD &&
+inst->src[0].type == BRW_REGISTER_TYPE_UD && inst->dst.stride 
== 2);
 for (unsigned j = constant_nr; j < last; j++) {
is_live[j] = true;
contiguous[j] = true;
-   if (type_sz(inst->src[i].type) == 8) {
+   if (mov_indirect_64bit) {
   is_live_64bit[j] = true;
}
 }
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 991c20fad62..4aaf84964e9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  unsigned read_size = instr->const_index[1] -
 (instr->num_components - 1) * type_sz(dest.type);
 
- fs_reg indirect_chv_high_32bit;
- bool is_chv_bxt_64bit =
-(devinfo->is_cherryview || devinfo->is_broxton) &&
-type_sz(dest.type) == 8;
- if (is_chv_bxt_64bit) {
-indirect_chv_high_32bit = vgrf(glsl_type::uint_type);
-/* Calculate indirect address to read high 32 bits */
-bld.ADD(indirect_chv_high_32bit, indirect, brw_imm_ud(4));
- }
+ bool supports_64bit_indirects =
+!devinfo->is_cherryview && !devinfo->is_broxton;
 
- for (unsigned j = 0; j < instr->num_components; j++) {
-if (!is_chv_bxt_64bit) {
+ if (type_sz(dest.type) != 8 || supports_64bit_indirects) {
+for (unsigned j = 0; j < instr->num_components; j++) {
bld.emit(SHADER_OPCODE_MOV_INDIRECT,
 offset(dest, bld, j), offset(src, bld, j),
 indirect, brw_imm_ud(read_size));
-} else {
-   bld.emit(SHADER_OPCODE_MOV_INDIRECT,
-subscript(offset(dest, bld, j), BRW_REGISTER_TYPE_UD, 
0),
-offset(src, bld, j),
-indirect, brw_imm_ud(read_size));
-
-   bld.emit(SHADER_OPCODE_MOV_INDIRECT,
-subscript(offset(dest, bld, j), BRW_REGISTER_TYPE_UD, 
1),
-offset(src, bld, j),
-indirect_chv_high_32bit, brw_imm_ud(read_size));
+}
+ } else {
+const unsigned num_mov_indirects =
+   type_sz(dest.type) / type_sz(BRW_REGISTER_TYPE_UD);
+/* We read a little bit less per MOV INDIRECT, as they are now
+ * 32-bits ones instead of 64-bit. Fix read_size then.
+ */
+const unsigned read_size_32bit = read_size -
+(num_mov_indirects - 1) * type_sz(BRW_REGISTER_TYPE_UD);
+for (unsigned j = 0; j < instr->num_components; j++) {
+   for (unsigned i = 0; i < num_mov_indirects; i++) {
+  bld.emit(SHADER_OPCODE_MOV_INDIRECT,
+   subscript(offset(dest, bld, j), 
BRW_REGISTER_TYPE_UD, i),
+   subscript(offset(src, bld, j), 
BRW_REGISTER_TYPE_UD, i),
+   indirect, brw_imm_ud(read_size_32bit));
+   }
 }
  }
   }
-- 
2.11.0

___
mesa-dev