Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-08-20 Thread Chema Casanova
El 29/07/18 a las 19:47, Chema Casanova escribió:
> El 28/07/18 a las 01:45, Francisco Jerez escribió:
>> Chema Casanova  writes:

[...]

> If we have a partial write/read:
>
> I understood that you my initial patter proposal would only be ok for
> the first GRF of src[i]/dst (reg_offset == 0)
>
> periodic_mask(this->exec_size,   /* count */
>this->src[i].stride * type_sz(this->src[i].type), /*step */
>type_sz(this->src[i].type),   /* bits */
>this->src[i].offset % REG_SIZE);  /* offset */
>
> In the case we manage only reg_offset == 0 we get a huge improvement
> reducing all problems many of the register_pressure we have now on all
> SIMD8 shaders with 8/16bits test cases.
>
> I understood that you didn't agree that for cases where src/destination
> use more than 1 GRF (reg_offset == 1) we can not guarantee that we can
> apply the same internal offset (this->src[i].offset % REG_SIZE) as the
> base register to calculate a patter. So It would be better to return ~0u
> on reads or 0u in writes.
>
>>>
 Yes, but you could easily determine whether the mask is going to be
 invariant with respect to reg_offset (where reg_offset is within bounds)
 and in that case return the periodic_mask() expression above, otherwise
 return 0/~0u depending on whether reg_offset is within bounds.
>>>
>>> Ok, so we are within bounds, we don't have a predicated write, we are
>>> not a send message. Then we have an ALU opcode and we return the
>>> periodic_mask.
>>>
>>
>> Those are all necessary but not sufficient conditions for the
>> periodic_mask() expression above to give you the correct answer for any
>> in-bounds reg_offset > 0, you should check that byte_offset < type_size
>> * stride in addition.
> 
> That's true. Fixed in v5.
> 
> If we don't satisfy the condition then we return 0 on writes and ~0u on
> reads.

Could you have a look at the v5 to check if I can count with your R-b ?

https://patchwork.freedesktop.org/patch/241482/

I suppose you didn't have time to have a look at the other patch of the
series.

"[v2,2/2] intel/fs: Improve liveness range calculation for partial writes"
https://patchwork.freedesktop.org/patch/239839/

Thanks in advance,

Chema

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-29 Thread Chema Casanova
El 28/07/18 a las 01:45, Francisco Jerez escribió:
> Chema Casanova  writes:
> 
>> El 27/07/18 a las 02:44, Francisco Jerez escribió:
>>> Chema Casanova  writes:
>>>
 El 26/07/18 a las 20:02, Francisco Jerez escribió:
> Chema Casanova  writes:
>
>> El 20/07/18 a las 22:10, Francisco Jerez escribió:
>>> Chema Casanova  writes:
>>>
 El 20/07/18 a las 00:34, Francisco Jerez escribió:
> Chema Casanova  writes:
>
>> El 14/07/18 a las 00:14, Francisco Jerez escribió:
>>> Jose Maria Casanova Crespo  writes:
>>>
 For a register source/destination of an instruction the function 
 returns
 the read/write byte pattern of a 32-byte registers as a unsigned 
 int.

 The returned pattern takes into account the exec_size of the 
 instruction,
 the type bitsize, the stride and if the register is source or 
 destination.

 The objective of the functions if to help to know the read/written 
 bytes
 of the instructions to improve the liveness analysis for partial 
 read/writes.

 We manage special cases for 
 SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
 and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the 
 bitsize
 parameter they have a different read pattern.
 ---
  src/intel/compiler/brw_fs.cpp  | 183 
 +
  src/intel/compiler/brw_ir_fs.h |   1 +
  2 files changed, 184 insertions(+)

 diff --git a/src/intel/compiler/brw_fs.cpp 
 b/src/intel/compiler/brw_fs.cpp
 index 2b8363ca362..f3045c4ff6c 100644
 --- a/src/intel/compiler/brw_fs.cpp
 +++ b/src/intel/compiler/brw_fs.cpp
 @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
 this->dst.offset % REG_SIZE != 0);
  }
  
 +/**
 + * Returns a 32-bit uint whose bits represent if the associated 
 register byte
 + * has been read/written by the instruction. The returned pattern 
 takes into
 + * account the exec_size of the instruction, the type bitsize and 
 the register
 + * stride and the register is source or destination for the 
 instruction.
 + *
 + * The objective of this function is to identify which parts of 
 the register
 + * are read or written for operations that don't read/write a 
 full register.
 + * So we can identify in live range variable analysis if a 
 partial write has
 + * completelly defined the part of the register used by a partial 
 read. So we
 + * avoid extending the liveness range because all data read was 
 already
 + * defined although the wasn't completely written.
 + */
 +unsigned
 +fs_inst::register_byte_use_pattern(const fs_reg , boolean 
 is_dst) const
 +{
 +   if (is_dst) {
>>
>>> Please split into two functions (like fs_inst::src_read and
>>> ::src_written) since that would make the call-sites of this method 
>>> more
>>> self-documenting than a boolean parameter.  You should be able to 
>>> share
>>> code by refactoring the common logic into a separate function (see 
>>> below
>>> for some suggestions on how that could be achieved).
>>
>> Sure, it would improve readability and simplifies the logic, I've 
>> chosen
>> dst_write_pattern and src_read_pattern.
>>
>>>
 +  /* We don't know what is written so we return the worts 
 case */
>>>
>>> "worst"
>>
>> Fixed.
>>
 +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
 + return 0;
 +  /* We assume that send destinations are completely written 
 */
 +  if (this->is_send_from_grf())
 + return ~0u;
>>>
>>> Some send-like instructions won't be caught by this condition, you
>>> should check for this->mlen != 0 in addition.
>>
>> Would it be enough to check for (this->mlen > 0) and forget about
>> is_send_from_grf? I am using this approach in v2 I am sending.
>>
>
> I don't think the mlen > 0 condition would catch all cases either...
> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need 
> both
> conditions.  Sucks...

 That is 

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-27 Thread Francisco Jerez
Chema Casanova  writes:

> El 27/07/18 a las 02:44, Francisco Jerez escribió:
>> Chema Casanova  writes:
>> 
>>> El 26/07/18 a las 20:02, Francisco Jerez escribió:
 Chema Casanova  writes:

> El 20/07/18 a las 22:10, Francisco Jerez escribió:
>> Chema Casanova  writes:
>>
>>> El 20/07/18 a las 00:34, Francisco Jerez escribió:
 Chema Casanova  writes:

> El 14/07/18 a las 00:14, Francisco Jerez escribió:
>> Jose Maria Casanova Crespo  writes:
>>
>>> For a register source/destination of an instruction the function 
>>> returns
>>> the read/write byte pattern of a 32-byte registers as a unsigned 
>>> int.
>>>
>>> The returned pattern takes into account the exec_size of the 
>>> instruction,
>>> the type bitsize, the stride and if the register is source or 
>>> destination.
>>>
>>> The objective of the functions if to help to know the read/written 
>>> bytes
>>> of the instructions to improve the liveness analysis for partial 
>>> read/writes.
>>>
>>> We manage special cases for 
>>> SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the 
>>> bitsize
>>> parameter they have a different read pattern.
>>> ---
>>>  src/intel/compiler/brw_fs.cpp  | 183 
>>> +
>>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>>  2 files changed, 184 insertions(+)
>>>
>>> diff --git a/src/intel/compiler/brw_fs.cpp 
>>> b/src/intel/compiler/brw_fs.cpp
>>> index 2b8363ca362..f3045c4ff6c 100644
>>> --- a/src/intel/compiler/brw_fs.cpp
>>> +++ b/src/intel/compiler/brw_fs.cpp
>>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>>> this->dst.offset % REG_SIZE != 0);
>>>  }
>>>  
>>> +/**
>>> + * Returns a 32-bit uint whose bits represent if the associated 
>>> register byte
>>> + * has been read/written by the instruction. The returned pattern 
>>> takes into
>>> + * account the exec_size of the instruction, the type bitsize and 
>>> the register
>>> + * stride and the register is source or destination for the 
>>> instruction.
>>> + *
>>> + * The objective of this function is to identify which parts of 
>>> the register
>>> + * are read or written for operations that don't read/write a full 
>>> register.
>>> + * So we can identify in live range variable analysis if a partial 
>>> write has
>>> + * completelly defined the part of the register used by a partial 
>>> read. So we
>>> + * avoid extending the liveness range because all data read was 
>>> already
>>> + * defined although the wasn't completely written.
>>> + */
>>> +unsigned
>>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean 
>>> is_dst) const
>>> +{
>>> +   if (is_dst) {
>
>> Please split into two functions (like fs_inst::src_read and
>> ::src_written) since that would make the call-sites of this method 
>> more
>> self-documenting than a boolean parameter.  You should be able to 
>> share
>> code by refactoring the common logic into a separate function (see 
>> below
>> for some suggestions on how that could be achieved).
>
> Sure, it would improve readability and simplifies the logic, I've 
> chosen
> dst_write_pattern and src_read_pattern.
>
>>
>>> +  /* We don't know what is written so we return the worts case 
>>> */
>>
>> "worst"
>
> Fixed.
>
>>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>>> + return 0;
>>> +  /* We assume that send destinations are completely written */
>>> +  if (this->is_send_from_grf())
>>> + return ~0u;
>>
>> Some send-like instructions won't be caught by this condition, you
>> should check for this->mlen != 0 in addition.
>
> Would it be enough to check for (this->mlen > 0) and forget about
> is_send_from_grf? I am using this approach in v2 I am sending.
>

 I don't think the mlen > 0 condition would catch all cases either...
 E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
 conditions.  Sucks...
>>>
>>> That is true, so now we have the:
>>>  (this->is_send_from_grf() || this->mlen != 0)
>>>
>>> +   } else {
>>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit 

[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-27 Thread Chema Casanova
El 27/07/18 a las 02:44, Francisco Jerez escribió:
> Chema Casanova  writes:
> 
>> El 26/07/18 a las 20:02, Francisco Jerez escribió:
>>> Chema Casanova  writes:
>>>
 El 20/07/18 a las 22:10, Francisco Jerez escribió:
> Chema Casanova  writes:
>
>> El 20/07/18 a las 00:34, Francisco Jerez escribió:
>>> Chema Casanova  writes:
>>>
 El 14/07/18 a las 00:14, Francisco Jerez escribió:
> Jose Maria Casanova Crespo  writes:
>
>> For a register source/destination of an instruction the function 
>> returns
>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>
>> The returned pattern takes into account the exec_size of the 
>> instruction,
>> the type bitsize, the stride and if the register is source or 
>> destination.
>>
>> The objective of the functions if to help to know the read/written 
>> bytes
>> of the instructions to improve the liveness analysis for partial 
>> read/writes.
>>
>> We manage special cases for 
>> SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the 
>> bitsize
>> parameter they have a different read pattern.
>> ---
>>  src/intel/compiler/brw_fs.cpp  | 183 
>> +
>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>  2 files changed, 184 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp 
>> b/src/intel/compiler/brw_fs.cpp
>> index 2b8363ca362..f3045c4ff6c 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>> this->dst.offset % REG_SIZE != 0);
>>  }
>>  
>> +/**
>> + * Returns a 32-bit uint whose bits represent if the associated 
>> register byte
>> + * has been read/written by the instruction. The returned pattern 
>> takes into
>> + * account the exec_size of the instruction, the type bitsize and 
>> the register
>> + * stride and the register is source or destination for the 
>> instruction.
>> + *
>> + * The objective of this function is to identify which parts of the 
>> register
>> + * are read or written for operations that don't read/write a full 
>> register.
>> + * So we can identify in live range variable analysis if a partial 
>> write has
>> + * completelly defined the part of the register used by a partial 
>> read. So we
>> + * avoid extending the liveness range because all data read was 
>> already
>> + * defined although the wasn't completely written.
>> + */
>> +unsigned
>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) 
>> const
>> +{
>> +   if (is_dst) {

> Please split into two functions (like fs_inst::src_read and
> ::src_written) since that would make the call-sites of this method 
> more
> self-documenting than a boolean parameter.  You should be able to 
> share
> code by refactoring the common logic into a separate function (see 
> below
> for some suggestions on how that could be achieved).

 Sure, it would improve readability and simplifies the logic, I've 
 chosen
 dst_write_pattern and src_read_pattern.

>
>> +  /* We don't know what is written so we return the worts case 
>> */
>
> "worst"

 Fixed.

>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>> + return 0;
>> +  /* We assume that send destinations are completely written */
>> +  if (this->is_send_from_grf())
>> + return ~0u;
>
> Some send-like instructions won't be caught by this condition, you
> should check for this->mlen != 0 in addition.

 Would it be enough to check for (this->mlen > 0) and forget about
 is_send_from_grf? I am using this approach in v2 I am sending.

>>>
>>> I don't think the mlen > 0 condition would catch all cases either...
>>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
>>> conditions.  Sucks...
>>
>> That is true, so now we have the:
>>  (this->is_send_from_grf() || this->mlen != 0)
>>
>> +   } else {
>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit 
>> aligned
>> +   * so the read pattern depends on the bitsize stored at src[4]
>> +   */
>> +  if (this->opcode == 

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-26 Thread Francisco Jerez
Chema Casanova  writes:

> El 26/07/18 a las 20:02, Francisco Jerez escribió:
>> Chema Casanova  writes:
>> 
>>> El 20/07/18 a las 22:10, Francisco Jerez escribió:
 Chema Casanova  writes:

> El 20/07/18 a las 00:34, Francisco Jerez escribió:
>> Chema Casanova  writes:
>>
>>> El 14/07/18 a las 00:14, Francisco Jerez escribió:
 Jose Maria Casanova Crespo  writes:

> For a register source/destination of an instruction the function 
> returns
> the read/write byte pattern of a 32-byte registers as a unsigned int.
>
> The returned pattern takes into account the exec_size of the 
> instruction,
> the type bitsize, the stride and if the register is source or 
> destination.
>
> The objective of the functions if to help to know the read/written 
> bytes
> of the instructions to improve the liveness analysis for partial 
> read/writes.
>
> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the 
> bitsize
> parameter they have a different read pattern.
> ---
>  src/intel/compiler/brw_fs.cpp  | 183 
> +
>  src/intel/compiler/brw_ir_fs.h |   1 +
>  2 files changed, 184 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp 
> b/src/intel/compiler/brw_fs.cpp
> index 2b8363ca362..f3045c4ff6c 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
> this->dst.offset % REG_SIZE != 0);
>  }
>  
> +/**
> + * Returns a 32-bit uint whose bits represent if the associated 
> register byte
> + * has been read/written by the instruction. The returned pattern 
> takes into
> + * account the exec_size of the instruction, the type bitsize and 
> the register
> + * stride and the register is source or destination for the 
> instruction.
> + *
> + * The objective of this function is to identify which parts of the 
> register
> + * are read or written for operations that don't read/write a full 
> register.
> + * So we can identify in live range variable analysis if a partial 
> write has
> + * completelly defined the part of the register used by a partial 
> read. So we
> + * avoid extending the liveness range because all data read was 
> already
> + * defined although the wasn't completely written.
> + */
> +unsigned
> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) 
> const
> +{
> +   if (is_dst) {
>>>
 Please split into two functions (like fs_inst::src_read and
 ::src_written) since that would make the call-sites of this method more
 self-documenting than a boolean parameter.  You should be able to share
 code by refactoring the common logic into a separate function (see 
 below
 for some suggestions on how that could be achieved).
>>>
>>> Sure, it would improve readability and simplifies the logic, I've chosen
>>> dst_write_pattern and src_read_pattern.
>>>

> +  /* We don't know what is written so we return the worts case */

 "worst"
>>>
>>> Fixed.
>>>
> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
> + return 0;
> +  /* We assume that send destinations are completely written */
> +  if (this->is_send_from_grf())
> + return ~0u;

 Some send-like instructions won't be caught by this condition, you
 should check for this->mlen != 0 in addition.
>>>
>>> Would it be enough to check for (this->mlen > 0) and forget about
>>> is_send_from_grf? I am using this approach in v2 I am sending.
>>>
>>
>> I don't think the mlen > 0 condition would catch all cases either...
>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
>> conditions.  Sucks...
>
> That is true, so now we have the:
>  (this->is_send_from_grf() || this->mlen != 0)
>
> +   } else {
> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit 
> aligned
> +   * so the read pattern depends on the bitsize stored at src[4]
> +   */
> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL 
> &&
> +  this->src[1].nr == r.nr) {
>>>
 I feel uncomfortable about attempting to guess the source the caller is
 referring to by comparing 

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-26 Thread Chema Casanova
El 26/07/18 a las 20:02, Francisco Jerez escribió:
> Chema Casanova  writes:
> 
>> El 20/07/18 a las 22:10, Francisco Jerez escribió:
>>> Chema Casanova  writes:
>>>
 El 20/07/18 a las 00:34, Francisco Jerez escribió:
> Chema Casanova  writes:
>
>> El 14/07/18 a las 00:14, Francisco Jerez escribió:
>>> Jose Maria Casanova Crespo  writes:
>>>
 For a register source/destination of an instruction the function 
 returns
 the read/write byte pattern of a 32-byte registers as a unsigned int.

 The returned pattern takes into account the exec_size of the 
 instruction,
 the type bitsize, the stride and if the register is source or 
 destination.

 The objective of the functions if to help to know the read/written 
 bytes
 of the instructions to improve the liveness analysis for partial 
 read/writes.

 We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
 and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
 parameter they have a different read pattern.
 ---
  src/intel/compiler/brw_fs.cpp  | 183 +
  src/intel/compiler/brw_ir_fs.h |   1 +
  2 files changed, 184 insertions(+)

 diff --git a/src/intel/compiler/brw_fs.cpp 
 b/src/intel/compiler/brw_fs.cpp
 index 2b8363ca362..f3045c4ff6c 100644
 --- a/src/intel/compiler/brw_fs.cpp
 +++ b/src/intel/compiler/brw_fs.cpp
 @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
 this->dst.offset % REG_SIZE != 0);
  }
  
 +/**
 + * Returns a 32-bit uint whose bits represent if the associated 
 register byte
 + * has been read/written by the instruction. The returned pattern 
 takes into
 + * account the exec_size of the instruction, the type bitsize and the 
 register
 + * stride and the register is source or destination for the 
 instruction.
 + *
 + * The objective of this function is to identify which parts of the 
 register
 + * are read or written for operations that don't read/write a full 
 register.
 + * So we can identify in live range variable analysis if a partial 
 write has
 + * completelly defined the part of the register used by a partial 
 read. So we
 + * avoid extending the liveness range because all data read was 
 already
 + * defined although the wasn't completely written.
 + */
 +unsigned
 +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) 
 const
 +{
 +   if (is_dst) {
>>
>>> Please split into two functions (like fs_inst::src_read and
>>> ::src_written) since that would make the call-sites of this method more
>>> self-documenting than a boolean parameter.  You should be able to share
>>> code by refactoring the common logic into a separate function (see below
>>> for some suggestions on how that could be achieved).
>>
>> Sure, it would improve readability and simplifies the logic, I've chosen
>> dst_write_pattern and src_read_pattern.
>>
>>>
 +  /* We don't know what is written so we return the worts case */
>>>
>>> "worst"
>>
>> Fixed.
>>
 +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
 + return 0;
 +  /* We assume that send destinations are completely written */
 +  if (this->is_send_from_grf())
 + return ~0u;
>>>
>>> Some send-like instructions won't be caught by this condition, you
>>> should check for this->mlen != 0 in addition.
>>
>> Would it be enough to check for (this->mlen > 0) and forget about
>> is_send_from_grf? I am using this approach in v2 I am sending.
>>
>
> I don't think the mlen > 0 condition would catch all cases either...
> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
> conditions.  Sucks...

 That is true, so now we have the:
  (this->is_send_from_grf() || this->mlen != 0)

 +   } else {
 +  /* byte_scattered_write_logical pattern of src[1] is 32-bit 
 aligned
 +   * so the read pattern depends on the bitsize stored at src[4]
 +   */
 +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL 
 &&
 +  this->src[1].nr == r.nr) {
>>
>>> I feel uncomfortable about attempting to guess the source the caller is
>>> referring to by comparing the registers for equality.  E.g.  you could
>>> potentially end up with two sources that compare equal but have
>>> different semantics (e.g. as a result of CSE) which 

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-26 Thread Francisco Jerez
Chema Casanova  writes:

> El 20/07/18 a las 22:10, Francisco Jerez escribió:
>> Chema Casanova  writes:
>> 
>>> El 20/07/18 a las 00:34, Francisco Jerez escribió:
 Chema Casanova  writes:

> El 14/07/18 a las 00:14, Francisco Jerez escribió:
>> Jose Maria Casanova Crespo  writes:
>>
>>> For a register source/destination of an instruction the function returns
>>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>>
>>> The returned pattern takes into account the exec_size of the 
>>> instruction,
>>> the type bitsize, the stride and if the register is source or 
>>> destination.
>>>
>>> The objective of the functions if to help to know the read/written bytes
>>> of the instructions to improve the liveness analysis for partial 
>>> read/writes.
>>>
>>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
>>> parameter they have a different read pattern.
>>> ---
>>>  src/intel/compiler/brw_fs.cpp  | 183 +
>>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>>  2 files changed, 184 insertions(+)
>>>
>>> diff --git a/src/intel/compiler/brw_fs.cpp 
>>> b/src/intel/compiler/brw_fs.cpp
>>> index 2b8363ca362..f3045c4ff6c 100644
>>> --- a/src/intel/compiler/brw_fs.cpp
>>> +++ b/src/intel/compiler/brw_fs.cpp
>>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>>> this->dst.offset % REG_SIZE != 0);
>>>  }
>>>  
>>> +/**
>>> + * Returns a 32-bit uint whose bits represent if the associated 
>>> register byte
>>> + * has been read/written by the instruction. The returned pattern 
>>> takes into
>>> + * account the exec_size of the instruction, the type bitsize and the 
>>> register
>>> + * stride and the register is source or destination for the 
>>> instruction.
>>> + *
>>> + * The objective of this function is to identify which parts of the 
>>> register
>>> + * are read or written for operations that don't read/write a full 
>>> register.
>>> + * So we can identify in live range variable analysis if a partial 
>>> write has
>>> + * completelly defined the part of the register used by a partial 
>>> read. So we
>>> + * avoid extending the liveness range because all data read was already
>>> + * defined although the wasn't completely written.
>>> + */
>>> +unsigned
>>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) 
>>> const
>>> +{
>>> +   if (is_dst) {
>
>> Please split into two functions (like fs_inst::src_read and
>> ::src_written) since that would make the call-sites of this method more
>> self-documenting than a boolean parameter.  You should be able to share
>> code by refactoring the common logic into a separate function (see below
>> for some suggestions on how that could be achieved).
>
> Sure, it would improve readability and simplifies the logic, I've chosen
> dst_write_pattern and src_read_pattern.
>
>>
>>> +  /* We don't know what is written so we return the worts case */
>>
>> "worst"
>
> Fixed.
>
>>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>>> + return 0;
>>> +  /* We assume that send destinations are completely written */
>>> +  if (this->is_send_from_grf())
>>> + return ~0u;
>>
>> Some send-like instructions won't be caught by this condition, you
>> should check for this->mlen != 0 in addition.
>
> Would it be enough to check for (this->mlen > 0) and forget about
> is_send_from_grf? I am using this approach in v2 I am sending.
>

 I don't think the mlen > 0 condition would catch all cases either...
 E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
 conditions.  Sucks...
>>>
>>> That is true, so now we have the:
>>>  (this->is_send_from_grf() || this->mlen != 0)
>>>
>>> +   } else {
>>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit 
>>> aligned
>>> +   * so the read pattern depends on the bitsize stored at src[4]
>>> +   */
>>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
>>> +  this->src[1].nr == r.nr) {
>
>> I feel uncomfortable about attempting to guess the source the caller is
>> referring to by comparing the registers for equality.  E.g.  you could
>> potentially end up with two sources that compare equal but have
>> different semantics (e.g. as a result of CSE) which might cause it to
>> get the wrong answer.  It would probably be better to pass a source
>> index and a byte offset as argument instead of an fs_reg.
>
> I've didn't thought about CSE, I'm now 

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-23 Thread Chema Casanova
El 20/07/18 a las 22:10, Francisco Jerez escribió:
> Chema Casanova  writes:
> 
>> El 20/07/18 a las 00:34, Francisco Jerez escribió:
>>> Chema Casanova  writes:
>>>
 El 14/07/18 a las 00:14, Francisco Jerez escribió:
> Jose Maria Casanova Crespo  writes:
>
>> For a register source/destination of an instruction the function returns
>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>
>> The returned pattern takes into account the exec_size of the instruction,
>> the type bitsize, the stride and if the register is source or 
>> destination.
>>
>> The objective of the functions if to help to know the read/written bytes
>> of the instructions to improve the liveness analysis for partial 
>> read/writes.
>>
>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
>> parameter they have a different read pattern.
>> ---
>>  src/intel/compiler/brw_fs.cpp  | 183 +
>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>  2 files changed, 184 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp 
>> b/src/intel/compiler/brw_fs.cpp
>> index 2b8363ca362..f3045c4ff6c 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>> this->dst.offset % REG_SIZE != 0);
>>  }
>>  
>> +/**
>> + * Returns a 32-bit uint whose bits represent if the associated 
>> register byte
>> + * has been read/written by the instruction. The returned pattern takes 
>> into
>> + * account the exec_size of the instruction, the type bitsize and the 
>> register
>> + * stride and the register is source or destination for the instruction.
>> + *
>> + * The objective of this function is to identify which parts of the 
>> register
>> + * are read or written for operations that don't read/write a full 
>> register.
>> + * So we can identify in live range variable analysis if a partial 
>> write has
>> + * completelly defined the part of the register used by a partial read. 
>> So we
>> + * avoid extending the liveness range because all data read was already
>> + * defined although the wasn't completely written.
>> + */
>> +unsigned
>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) 
>> const
>> +{
>> +   if (is_dst) {

> Please split into two functions (like fs_inst::src_read and
> ::src_written) since that would make the call-sites of this method more
> self-documenting than a boolean parameter.  You should be able to share
> code by refactoring the common logic into a separate function (see below
> for some suggestions on how that could be achieved).

 Sure, it would improve readability and simplifies the logic, I've chosen
 dst_write_pattern and src_read_pattern.

>
>> +  /* We don't know what is written so we return the worts case */
>
> "worst"

 Fixed.

>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>> + return 0;
>> +  /* We assume that send destinations are completely written */
>> +  if (this->is_send_from_grf())
>> + return ~0u;
>
> Some send-like instructions won't be caught by this condition, you
> should check for this->mlen != 0 in addition.

 Would it be enough to check for (this->mlen > 0) and forget about
 is_send_from_grf? I am using this approach in v2 I am sending.

>>>
>>> I don't think the mlen > 0 condition would catch all cases either...
>>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
>>> conditions.  Sucks...
>>
>> That is true, so now we have the:
>>  (this->is_send_from_grf() || this->mlen != 0)
>>
>> +   } else {
>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit 
>> aligned
>> +   * so the read pattern depends on the bitsize stored at src[4]
>> +   */
>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
>> +  this->src[1].nr == r.nr) {

> I feel uncomfortable about attempting to guess the source the caller is
> referring to by comparing the registers for equality.  E.g.  you could
> potentially end up with two sources that compare equal but have
> different semantics (e.g. as a result of CSE) which might cause it to
> get the wrong answer.  It would probably be better to pass a source
> index and a byte offset as argument instead of an fs_reg.

 I've didn't thought about CSE, I'm now receiving the number of source
 and the reg_offset. I'm using reg_offset instead of byte offsets as it
 simplifies the logic. Now we are using always 

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-20 Thread Francisco Jerez
Chema Casanova  writes:

> El 20/07/18 a las 00:34, Francisco Jerez escribió:
>> Chema Casanova  writes:
>> 
>>> El 14/07/18 a las 00:14, Francisco Jerez escribió:
 Jose Maria Casanova Crespo  writes:

> For a register source/destination of an instruction the function returns
> the read/write byte pattern of a 32-byte registers as a unsigned int.
>
> The returned pattern takes into account the exec_size of the instruction,
> the type bitsize, the stride and if the register is source or destination.
>
> The objective of the functions if to help to know the read/written bytes
> of the instructions to improve the liveness analysis for partial 
> read/writes.
>
> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
> parameter they have a different read pattern.
> ---
>  src/intel/compiler/brw_fs.cpp  | 183 +
>  src/intel/compiler/brw_ir_fs.h |   1 +
>  2 files changed, 184 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2b8363ca362..f3045c4ff6c 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
> this->dst.offset % REG_SIZE != 0);
>  }
>  
> +/**
> + * Returns a 32-bit uint whose bits represent if the associated register 
> byte
> + * has been read/written by the instruction. The returned pattern takes 
> into
> + * account the exec_size of the instruction, the type bitsize and the 
> register
> + * stride and the register is source or destination for the instruction.
> + *
> + * The objective of this function is to identify which parts of the 
> register
> + * are read or written for operations that don't read/write a full 
> register.
> + * So we can identify in live range variable analysis if a partial write 
> has
> + * completelly defined the part of the register used by a partial read. 
> So we
> + * avoid extending the liveness range because all data read was already
> + * defined although the wasn't completely written.
> + */
> +unsigned
> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
> +{
> +   if (is_dst) {
>>>
 Please split into two functions (like fs_inst::src_read and
 ::src_written) since that would make the call-sites of this method more
 self-documenting than a boolean parameter.  You should be able to share
 code by refactoring the common logic into a separate function (see below
 for some suggestions on how that could be achieved).
>>>
>>> Sure, it would improve readability and simplifies the logic, I've chosen
>>> dst_write_pattern and src_read_pattern.
>>>

> +  /* We don't know what is written so we return the worts case */

 "worst"
>>>
>>> Fixed.
>>>
> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
> + return 0;
> +  /* We assume that send destinations are completely written */
> +  if (this->is_send_from_grf())
> + return ~0u;

 Some send-like instructions won't be caught by this condition, you
 should check for this->mlen != 0 in addition.
>>>
>>> Would it be enough to check for (this->mlen > 0) and forget about
>>> is_send_from_grf? I am using this approach in v2 I am sending.
>>>
>> 
>> I don't think the mlen > 0 condition would catch all cases either...
>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
>> conditions.  Sucks...
>
> That is true, so now we have the:
>  (this->is_send_from_grf() || this->mlen != 0)
>
> +   } else {
> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
> +   * so the read pattern depends on the bitsize stored at src[4]
> +   */
> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
> +  this->src[1].nr == r.nr) {
>>>
 I feel uncomfortable about attempting to guess the source the caller is
 referring to by comparing the registers for equality.  E.g.  you could
 potentially end up with two sources that compare equal but have
 different semantics (e.g. as a result of CSE) which might cause it to
 get the wrong answer.  It would probably be better to pass a source
 index and a byte offset as argument instead of an fs_reg.
>>>
>>> I've didn't thought about CSE, I'm now receiving the number of source
>>> and the reg_offset. I'm using reg_offset instead of byte offsets as it
>>> simplifies the logic. Now we are using always the base src register to
>>> do all the calculation
> + switch (this->src[4].ud) {
> + case 32:
> +return ~0u;
> + case 16:
> +

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-20 Thread Chema Casanova
El 20/07/18 a las 00:34, Francisco Jerez escribió:
> Chema Casanova  writes:
> 
>> El 14/07/18 a las 00:14, Francisco Jerez escribió:
>>> Jose Maria Casanova Crespo  writes:
>>>
 For a register source/destination of an instruction the function returns
 the read/write byte pattern of a 32-byte registers as a unsigned int.

 The returned pattern takes into account the exec_size of the instruction,
 the type bitsize, the stride and if the register is source or destination.

 The objective of the functions if to help to know the read/written bytes
 of the instructions to improve the liveness analysis for partial 
 read/writes.

 We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
 and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
 parameter they have a different read pattern.
 ---
  src/intel/compiler/brw_fs.cpp  | 183 +
  src/intel/compiler/brw_ir_fs.h |   1 +
  2 files changed, 184 insertions(+)

 diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
 index 2b8363ca362..f3045c4ff6c 100644
 --- a/src/intel/compiler/brw_fs.cpp
 +++ b/src/intel/compiler/brw_fs.cpp
 @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
 this->dst.offset % REG_SIZE != 0);
  }
  
 +/**
 + * Returns a 32-bit uint whose bits represent if the associated register 
 byte
 + * has been read/written by the instruction. The returned pattern takes 
 into
 + * account the exec_size of the instruction, the type bitsize and the 
 register
 + * stride and the register is source or destination for the instruction.
 + *
 + * The objective of this function is to identify which parts of the 
 register
 + * are read or written for operations that don't read/write a full 
 register.
 + * So we can identify in live range variable analysis if a partial write 
 has
 + * completelly defined the part of the register used by a partial read. 
 So we
 + * avoid extending the liveness range because all data read was already
 + * defined although the wasn't completely written.
 + */
 +unsigned
 +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
 +{
 +   if (is_dst) {
>>
>>> Please split into two functions (like fs_inst::src_read and
>>> ::src_written) since that would make the call-sites of this method more
>>> self-documenting than a boolean parameter.  You should be able to share
>>> code by refactoring the common logic into a separate function (see below
>>> for some suggestions on how that could be achieved).
>>
>> Sure, it would improve readability and simplifies the logic, I've chosen
>> dst_write_pattern and src_read_pattern.
>>
>>>
 +  /* We don't know what is written so we return the worts case */
>>>
>>> "worst"
>>
>> Fixed.
>>
 +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
 + return 0;
 +  /* We assume that send destinations are completely written */
 +  if (this->is_send_from_grf())
 + return ~0u;
>>>
>>> Some send-like instructions won't be caught by this condition, you
>>> should check for this->mlen != 0 in addition.
>>
>> Would it be enough to check for (this->mlen > 0) and forget about
>> is_send_from_grf? I am using this approach in v2 I am sending.
>>
> 
> I don't think the mlen > 0 condition would catch all cases either...
> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
> conditions.  Sucks...

That is true, so now we have the:
 (this->is_send_from_grf() || this->mlen != 0)

 +   } else {
 +  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
 +   * so the read pattern depends on the bitsize stored at src[4]
 +   */
 +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
 +  this->src[1].nr == r.nr) {
>>
>>> I feel uncomfortable about attempting to guess the source the caller is
>>> referring to by comparing the registers for equality.  E.g.  you could
>>> potentially end up with two sources that compare equal but have
>>> different semantics (e.g. as a result of CSE) which might cause it to
>>> get the wrong answer.  It would probably be better to pass a source
>>> index and a byte offset as argument instead of an fs_reg.
>>
>> I've didn't thought about CSE, I'm now receiving the number of source
>> and the reg_offset. I'm using reg_offset instead of byte offsets as it
>> simplifies the logic. Now we are using always the base src register to
>> do all the calculation
 + switch (this->src[4].ud) {
 + case 32:
 +return ~0u;
 + case 16:
 +return 0x;
 + case 8:
 +return 0x;
 + default:
 +

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-19 Thread Francisco Jerez
Chema Casanova  writes:

> El 14/07/18 a las 00:14, Francisco Jerez escribió:
>> Jose Maria Casanova Crespo  writes:
>> 
>>> For a register source/destination of an instruction the function returns
>>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>>
>>> The returned pattern takes into account the exec_size of the instruction,
>>> the type bitsize, the stride and if the register is source or destination.
>>>
>>> The objective of the functions if to help to know the read/written bytes
>>> of the instructions to improve the liveness analysis for partial 
>>> read/writes.
>>>
>>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
>>> parameter they have a different read pattern.
>>> ---
>>>  src/intel/compiler/brw_fs.cpp  | 183 +
>>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>>  2 files changed, 184 insertions(+)
>>>
>>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>>> index 2b8363ca362..f3045c4ff6c 100644
>>> --- a/src/intel/compiler/brw_fs.cpp
>>> +++ b/src/intel/compiler/brw_fs.cpp
>>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>>> this->dst.offset % REG_SIZE != 0);
>>>  }
>>>  
>>> +/**
>>> + * Returns a 32-bit uint whose bits represent if the associated register 
>>> byte
>>> + * has been read/written by the instruction. The returned pattern takes 
>>> into
>>> + * account the exec_size of the instruction, the type bitsize and the 
>>> register
>>> + * stride and the register is source or destination for the instruction.
>>> + *
>>> + * The objective of this function is to identify which parts of the 
>>> register
>>> + * are read or written for operations that don't read/write a full 
>>> register.
>>> + * So we can identify in live range variable analysis if a partial write 
>>> has
>>> + * completelly defined the part of the register used by a partial read. So 
>>> we
>>> + * avoid extending the liveness range because all data read was already
>>> + * defined although the wasn't completely written.
>>> + */
>>> +unsigned
>>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
>>> +{
>>> +   if (is_dst) {
>
>> Please split into two functions (like fs_inst::src_read and
>> ::src_written) since that would make the call-sites of this method more
>> self-documenting than a boolean parameter.  You should be able to share
>> code by refactoring the common logic into a separate function (see below
>> for some suggestions on how that could be achieved).
>
> Sure, it would improve readability and simplifies the logic, I've chosen
> dst_write_pattern and src_read_pattern.
>
>> 
>>> +  /* We don't know what is written so we return the worts case */
>> 
>> "worst"
>
> Fixed.
>
>>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>>> + return 0;
>>> +  /* We assume that send destinations are completely written */
>>> +  if (this->is_send_from_grf())
>>> + return ~0u;
>> 
>> Some send-like instructions won't be caught by this condition, you
>> should check for this->mlen != 0 in addition.
>
> Would it be enough to check for (this->mlen > 0) and forget about
> is_send_from_grf? I am using this approach in v2 I am sending.
>

I don't think the mlen > 0 condition would catch all cases either...
E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
conditions.  Sucks...

>>> +   } else {
>>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
>>> +   * so the read pattern depends on the bitsize stored at src[4]
>>> +   */
>>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
>>> +  this->src[1].nr == r.nr) {
>
>> I feel uncomfortable about attempting to guess the source the caller is
>> referring to by comparing the registers for equality.  E.g.  you could
>> potentially end up with two sources that compare equal but have
>> different semantics (e.g. as a result of CSE) which might cause it to
>> get the wrong answer.  It would probably be better to pass a source
>> index and a byte offset as argument instead of an fs_reg.
>
> I've didn't thought about CSE, I'm now receiving the number of source
> and the reg_offset. I'm using reg_offset instead of byte offsets as it
> simplifies the logic. Now we are using always the base src register to
> do all the calculation
>>> + switch (this->src[4].ud) {
>>> + case 32:
>>> +return ~0u;
>>> + case 16:
>>> +return 0x;
>>> + case 8:
>>> +return 0x;
>>> + default:
>>> +unreachable("Unsupported bitsize at 
>>> byte_scattered_write_logical");
>>> + }
>> 
>> Replace the above switch statement with a call to "periodic_mask(8, 4,
>> this->src[4].ud / 8)" (see below for the definition).
>
> Ok.
>
>>> +  }
>>> +  /* As for 

[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-19 Thread Chema Casanova
El 14/07/18 a las 00:14, Francisco Jerez escribió:
> Jose Maria Casanova Crespo  writes:
> 
>> For a register source/destination of an instruction the function returns
>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>
>> The returned pattern takes into account the exec_size of the instruction,
>> the type bitsize, the stride and if the register is source or destination.
>>
>> The objective of the functions if to help to know the read/written bytes
>> of the instructions to improve the liveness analysis for partial read/writes.
>>
>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
>> parameter they have a different read pattern.
>> ---
>>  src/intel/compiler/brw_fs.cpp  | 183 +
>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>  2 files changed, 184 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>> index 2b8363ca362..f3045c4ff6c 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>> this->dst.offset % REG_SIZE != 0);
>>  }
>>  
>> +/**
>> + * Returns a 32-bit uint whose bits represent if the associated register 
>> byte
>> + * has been read/written by the instruction. The returned pattern takes into
>> + * account the exec_size of the instruction, the type bitsize and the 
>> register
>> + * stride and the register is source or destination for the instruction.
>> + *
>> + * The objective of this function is to identify which parts of the register
>> + * are read or written for operations that don't read/write a full register.
>> + * So we can identify in live range variable analysis if a partial write has
>> + * completelly defined the part of the register used by a partial read. So 
>> we
>> + * avoid extending the liveness range because all data read was already
>> + * defined although the wasn't completely written.
>> + */
>> +unsigned
>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
>> +{
>> +   if (is_dst) {

> Please split into two functions (like fs_inst::src_read and
> ::src_written) since that would make the call-sites of this method more
> self-documenting than a boolean parameter.  You should be able to share
> code by refactoring the common logic into a separate function (see below
> for some suggestions on how that could be achieved).

Sure, it would improve readability and simplifies the logic, I've chosen
dst_write_pattern and src_read_pattern.

> 
>> +  /* We don't know what is written so we return the worts case */
> 
> "worst"

Fixed.

>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>> + return 0;
>> +  /* We assume that send destinations are completely written */
>> +  if (this->is_send_from_grf())
>> + return ~0u;
> 
> Some send-like instructions won't be caught by this condition, you
> should check for this->mlen != 0 in addition.

Would it be enough to check for (this->mlen > 0) and forget about
is_send_from_grf? I am using this approach in v2 I am sending.

>> +   } else {
>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
>> +   * so the read pattern depends on the bitsize stored at src[4]
>> +   */
>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
>> +  this->src[1].nr == r.nr) {

> I feel uncomfortable about attempting to guess the source the caller is
> referring to by comparing the registers for equality.  E.g.  you could
> potentially end up with two sources that compare equal but have
> different semantics (e.g. as a result of CSE) which might cause it to
> get the wrong answer.  It would probably be better to pass a source
> index and a byte offset as argument instead of an fs_reg.

I've didn't thought about CSE, I'm now receiving the number of source
and the reg_offset. I'm using reg_offset instead of byte offsets as it
simplifies the logic. Now we are using always the base src register to
do all the calculation
>> + switch (this->src[4].ud) {
>> + case 32:
>> +return ~0u;
>> + case 16:
>> +return 0x;
>> + case 8:
>> +return 0x;
>> + default:
>> +unreachable("Unsupported bitsize at 
>> byte_scattered_write_logical");
>> + }
> 
> Replace the above switch statement with a call to "periodic_mask(8, 4,
> this->src[4].ud / 8)" (see below for the definition).

Ok.

>> +  }
>> +  /* As for byte_scattered_write_logical but we need to take into 
>> account
>> +   * that data written are in the payload offset 32 with SIMD8 and 
>> offset
>> +   * 64 with SIMD16.
>> +   */
>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE &&
>> +  this->src[0].nr == r.nr) {
>> + fs_reg payload = 

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-13 Thread Francisco Jerez
Jose Maria Casanova Crespo  writes:

> For a register source/destination of an instruction the function returns
> the read/write byte pattern of a 32-byte registers as a unsigned int.
>
> The returned pattern takes into account the exec_size of the instruction,
> the type bitsize, the stride and if the register is source or destination.
>
> The objective of the functions if to help to know the read/written bytes
> of the instructions to improve the liveness analysis for partial read/writes.
>
> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
> parameter they have a different read pattern.
> ---
>  src/intel/compiler/brw_fs.cpp  | 183 +
>  src/intel/compiler/brw_ir_fs.h |   1 +
>  2 files changed, 184 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2b8363ca362..f3045c4ff6c 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
> this->dst.offset % REG_SIZE != 0);
>  }
>  
> +/**
> + * Returns a 32-bit uint whose bits represent if the associated register byte
> + * has been read/written by the instruction. The returned pattern takes into
> + * account the exec_size of the instruction, the type bitsize and the 
> register
> + * stride and the register is source or destination for the instruction.
> + *
> + * The objective of this function is to identify which parts of the register
> + * are read or written for operations that don't read/write a full register.
> + * So we can identify in live range variable analysis if a partial write has
> + * completelly defined the part of the register used by a partial read. So we
> + * avoid extending the liveness range because all data read was already
> + * defined although the wasn't completely written.
> + */
> +unsigned
> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
> +{
> +   if (is_dst) {

Please split into two functions (like fs_inst::src_read and
::src_written) since that would make the call-sites of this method more
self-documenting than a boolean parameter.  You should be able to share
code by refactoring the common logic into a separate function (see below
for some suggestions on how that could be achieved).

> +  /* We don't know what is written so we return the worts case */

"worst"

> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
> + return 0;
> +  /* We assume that send destinations are completelly written */
> +  if (this->is_send_from_grf())
> + return ~0u;

Some send-like instructions won't be caught by this condition, you
should check for this->mlen != 0 in addition.

> +   } else {
> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
> +   * so the read pattern depends on the bitsize stored at src[4]
> +   */
> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
> +  this->src[1].nr == r.nr) {

I feel uncomfortable about attempting to guess the source the caller is
referring to by comparing the registers for equality.  E.g.  you could
potentially end up with two sources that compare equal but have
different semantics (e.g. as a result of CSE) which might cause it to
get the wrong answer.  It would probably be better to pass a source
index and a byte offset as argument instead of an fs_reg.

> + switch (this->src[4].ud) {
> + case 32:
> +return ~0u;
> + case 16:
> +return 0x;
> + case 8:
> +return 0x;
> + default:
> +unreachable("Unsupported bitsize at 
> byte_scattered_write_logical");
> + }

Replace the above switch statement with a call to "periodic_mask(8, 4,
this->src[4].ud / 8)" (see below for the definition).

> +  }
> +  /* As for byte_scattered_write_logical but we need to take into account
> +   * that data written are in the payload offset 32 with SIMD8 and offset
> +   * 64 with SIMD16.
> +   */
> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE &&
> +  this->src[0].nr == r.nr) {
> + fs_reg payload = this->src[0];
> + payload.offset = REG_SIZE * this->exec_size / 8;

byte_offset() is your friend.

> + if (regions_overlap(r, REG_SIZE,
> + payload, REG_SIZE * this->exec_size / 8)) {
> +switch (this->src[2].ud) {
> +case 32:
> +   return ~0u;
> +case 16:
> +   return 0x;
> +case 8:
> +   return 0x;
> +default:
> +   unreachable("Unsupported bitsize at byte_scattered_write");
> +}

Replace the above switch statement with a call to "periodic_mask(8, 4,
this->src[2].ud / 8)".

> + } else {
> + 

[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-13 Thread Jose Maria Casanova Crespo
For a register source/destination of an instruction the function returns
the read/write byte pattern of a 32-byte registers as a unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the stride and if the register is source or destination.

The objective of the functions if to help to know the read/written bytes
of the instructions to improve the liveness analysis for partial read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.
---
 src/intel/compiler/brw_fs.cpp  | 183 +
 src/intel/compiler/brw_ir_fs.h |   1 +
 2 files changed, 184 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 2b8363ca362..f3045c4ff6c 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read/written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and the register
+ * stride and the register is source or destination for the instruction.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are read or written for operations that don't read/write a full register.
+ * So we can identify in live range variable analysis if a partial write has
+ * completelly defined the part of the register used by a partial read. So we
+ * avoid extending the liveness range because all data read was already
+ * defined although the wasn't completely written.
+ */
+unsigned
+fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
+{
+   if (is_dst) {
+  /* We don't know what is written so we return the worts case */
+  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+ return 0;
+  /* We assume that send destinations are completelly written */
+  if (this->is_send_from_grf())
+ return ~0u;
+   } else {
+  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
+   * so the read pattern depends on the bitsize stored at src[4]
+   */
+  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
+  this->src[1].nr == r.nr) {
+ switch (this->src[4].ud) {
+ case 32:
+return ~0u;
+ case 16:
+return 0x;
+ case 8:
+return 0x;
+ default:
+unreachable("Unsupported bitsize at byte_scattered_write_logical");
+ }
+  }
+  /* As for byte_scattered_write_logical but we need to take into account
+   * that data written are in the payload offset 32 with SIMD8 and offset
+   * 64 with SIMD16.
+   */
+  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE &&
+  this->src[0].nr == r.nr) {
+ fs_reg payload = this->src[0];
+ payload.offset = REG_SIZE * this->exec_size / 8;
+ if (regions_overlap(r, REG_SIZE,
+ payload, REG_SIZE * this->exec_size / 8)) {
+switch (this->src[2].ud) {
+case 32:
+   return ~0u;
+case 16:
+   return 0x;
+case 8:
+   return 0x;
+default:
+   unreachable("Unsupported bitsize at byte_scattered_write");
+}
+ } else {
+return ~0u;
+ }
+  }
+   }
+
+   /* We define the most conservative value in order to calculate liveness
+* range. If it is a destination nothing is defined and if is a source
+* all the bytes of the register could be read. So for release builds
+* the unreachables would have always safe return value. */
+   unsigned pattern =  is_dst ? 0 : ~0u;
+
+   /* In the general case we calculate the pattern for a specific register
+* on base of the type_size and stride. We calculate the SIMD8 pattern
+* and then we adjust the patter if needed for different exec_sizes
+* and offset
+*/
+   switch (type_sz(r.type)){
+   case 1:
+  switch (r.stride) {
+  case 0:
+ pattern = 0X1;
+ break;
+  case 1:
+ pattern = 0xff;
+ break;
+  case 2:
+ pattern = 0x;
+ break;
+  case 4:
+ pattern = 0x;
+ break;
+  case 8:
+ pattern = 0x01010101;
+ break;
+  default:
+ unreachable("Unknown pattern unsupported 8-bit stride");
+  }
+  break;
+   case 2:
+  switch (r.stride) {
+  case 0:
+ pattern = 0X3;
+ break;
+  case 1:
+ pattern = 0x;
+ break;
+  case 2:
+ pattern = 0x;
+ break;
+