Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-11 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2016-05-10 at 16:53 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > From: Iago Toral Quiroga 
>> >
>> > We were not accounting for reg_suboffset in the check for the start
>> > of the region. This meant that would allow copy-propagation even if
>> > the dst wrote to sub_regoffset 4 and our source read from
>> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
>> > since there we use reg_suboffset to select the high 32-bit of a double.
>> >
>> I don't think this paragraph is accurate, copy instructions with
>> non-zero destination subreg offset are currently considered partial
>> writes and should never have been added to the ACP hash table in the
>> first place.
>
> Right, I think I wrote this patch before the one where I fixed
> is_partial_write() to consider any write to subreg_offset > 0 partial. 
>
>> > Also, fs_reg::regs_read() already takes the stride into account, so we
>> > should not multiply its result by the stride again. This was making
>> > copy-propagation fail to copy-propagate cases that would otherwise be
>> > safe to copy-propagate. Again, this was observed in fp64 code, since
>> > there we use stride > 1 often.
>> >
>> > Incidentally, these fixes open up more possibilities for copy propagation
>> > which uncovered new bugs in copy-propagation. The folowing patches address
>> > each of these new issues.
>> 
>> Oh man, that sucks...
>> 
>> > ---
>> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
>> > +
>> >  1 file changed, 13 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > index 5fae10f..23df877 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
>> > stride,
>> > return true;
>> >  }
>> >  
>> > +static inline bool
>> > +region_match(fs_reg src, unsigned regs_read,
>> > + fs_reg dst, unsigned regs_written)
>> 
>> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
>> personally wouldn't mind 'region_match' but
>> 'write_region_contains_read_region' sounds a bit too long for my taste).
>> 
>> > +{
>> > +   return src.reg_offset >= dst.reg_offset &&
>> > +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) 
>> > &&
>> > +  src.subreg_offset >= dst.subreg_offset;
>> 
>> This works under the assumption that src.subreg_offset is strictly less
>> than the reg_offset unit -- Which *should* be the case unless we've
>> messed up that restriction in some place (we have in the past :P).  To
>> be on the safe side you could do something like following, if you like:
>> 
>> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
>> |   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
>> |  src.reg_offset + regs_read <= dst.reg_offset + regs_written;
>
> I understand that even if we discard writes with dst.subreg_offset > 0,
> you still want the subreg_offset check here to be safe exactly in that
> scenario (since we would not need this for the case I originally wrote
> it for).
>

Yeah, it's up to you but I guess it wouldn't hurt to be extra-paranoid
here, and it would probably be sensible to add 'src.file == dst.file &&
src.nr == dst.nr && ...' to the return expresssion in addition.

>> With the above taken into account:
>> 
>> Reviewed-by: Francisco Jerez 
>
> Thanks!
>
>> > +}
>> > +
>> >  bool
>> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>> >  {
>> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int 
>> > arg, acp_entry *entry)
>> > /* Bail if inst is reading a range that isn't contained in the range
>> >  * that entry is writing.
>> >  */
>> > -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
>> > -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
>> > -inst->regs_read(arg) * inst->src[arg].stride * 32) >
>> > -   (entry->dst.reg_offset + entry->regs_written) * 32)
>> > +   if (!region_match(inst->src[arg], inst->regs_read(arg),
>> > + entry->dst, entry->regs_written))
>> >return false;
>> >  
>> > /* we can't generally copy-propagate UD negations because we
>> > @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
>> > acp_entry *entry)
>> >/* Bail if inst is reading a range that isn't contained in the range
>> > * that entry is writing.
>> > */
>> > -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
>> > -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
>> > -   inst->regs_read(i) * inst->src[i].stride * 

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-11 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2016-05-10 at 16:53 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > From: Iago Toral Quiroga 
>> >
>> > We were not accounting for reg_suboffset in the check for the start
>> > of the region. This meant that would allow copy-propagation even if
>> > the dst wrote to sub_regoffset 4 and our source read from
>> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
>> > since there we use reg_suboffset to select the high 32-bit of a double.
>> >
>> I don't think this paragraph is accurate, copy instructions with
>> non-zero destination subreg offset are currently considered partial
>> writes and should never have been added to the ACP hash table in the
>> first place.
>
> Right, I think I wrote this patch before the one where I fixed
> is_partial_write() to consider any write to subreg_offset > 0 partial. 
>
In practice they would be considered partial writes already, but the
reason is somewhat obscure -- Writes with subreg_offset != 0 would
necessarily fall into three categories:
 - Strided writes (which are considered partial explicitly).
 - Contiguous writes which write less than one GRF worth of data (which
   are considered partial explicitly)
 - Contiguous writes which straddle two registers (which have been
   severely limited by the hardware historically).

Even though it's unlikely to have fixed pre-existing bugs, for the sake
of sanity I'm glad that you fixed is_partial_write() to handle the
latter case explicitly (thanks!), especially since the hardware has
slowly been lifting restrictions on the ways you can straddle multiple
registers: On Gen7 the stupid decompression behaviour that simply shifts
the second decompressed portion of the instruction by one GRF would kill
you.  Gen8 is slightly less stupid and shifts the second decompressed
portion by ExecSize/2 components which allows it to handle a subset of
straddled writes (only the cases where the written components are
balanced between the two registers).  AFAIK Gen9 should be the first to
support fully unrestricted unbalanced writes.

>> > Also, fs_reg::regs_read() already takes the stride into account, so we
>> > should not multiply its result by the stride again. This was making
>> > copy-propagation fail to copy-propagate cases that would otherwise be
>> > safe to copy-propagate. Again, this was observed in fp64 code, since
>> > there we use stride > 1 often.
>> >
>> > Incidentally, these fixes open up more possibilities for copy propagation
>> > which uncovered new bugs in copy-propagation. The folowing patches address
>> > each of these new issues.
>> 
>> Oh man, that sucks...
>> 
>> > ---
>> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
>> > +
>> >  1 file changed, 13 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > index 5fae10f..23df877 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
>> > stride,
>> > return true;
>> >  }
>> >  
>> > +static inline bool
>> > +region_match(fs_reg src, unsigned regs_read,
>> > + fs_reg dst, unsigned regs_written)
>> 
>> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
>> personally wouldn't mind 'region_match' but
>> 'write_region_contains_read_region' sounds a bit too long for my taste).
>> 
>> > +{
>> > +   return src.reg_offset >= dst.reg_offset &&
>> > +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) 
>> > &&
>> > +  src.subreg_offset >= dst.subreg_offset;
>> 
>> This works under the assumption that src.subreg_offset is strictly less
>> than the reg_offset unit -- Which *should* be the case unless we've
>> messed up that restriction in some place (we have in the past :P).  To
>> be on the safe side you could do something like following, if you like:
>> 
>> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
>> |   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
>> |  src.reg_offset + regs_read <= dst.reg_offset + regs_written;
>
> I understand that even if we discard writes with dst.subreg_offset > 0,
> you still want the subreg_offset check here to be safe exactly in that
> scenario (since we would not need this for the case I originally wrote
> it for).
>
>> With the above taken into account:
>> 
>> Reviewed-by: Francisco Jerez 
>
> Thanks!
>
>> > +}
>> > +
>> >  bool
>> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>> >  {
>> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int 
>> > arg, acp_entry *entry)
>> > /* Bail if inst is reading a range that isn't contained 

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-11 Thread Iago Toral
On Tue, 2016-05-10 at 16:53 -0700, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
> > From: Iago Toral Quiroga 
> >
> > We were not accounting for reg_suboffset in the check for the start
> > of the region. This meant that would allow copy-propagation even if
> > the dst wrote to sub_regoffset 4 and our source read from
> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
> > since there we use reg_suboffset to select the high 32-bit of a double.
> >
> I don't think this paragraph is accurate, copy instructions with
> non-zero destination subreg offset are currently considered partial
> writes and should never have been added to the ACP hash table in the
> first place.

Right, I think I wrote this patch before the one where I fixed
is_partial_write() to consider any write to subreg_offset > 0 partial. 

> > Also, fs_reg::regs_read() already takes the stride into account, so we
> > should not multiply its result by the stride again. This was making
> > copy-propagation fail to copy-propagate cases that would otherwise be
> > safe to copy-propagate. Again, this was observed in fp64 code, since
> > there we use stride > 1 often.
> >
> > Incidentally, these fixes open up more possibilities for copy propagation
> > which uncovered new bugs in copy-propagation. The folowing patches address
> > each of these new issues.
> 
> Oh man, that sucks...
> 
> > ---
> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
> > +
> >  1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > index 5fae10f..23df877 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
> > stride,
> > return true;
> >  }
> >  
> > +static inline bool
> > +region_match(fs_reg src, unsigned regs_read,
> > + fs_reg dst, unsigned regs_written)
> 
> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
> personally wouldn't mind 'region_match' but
> 'write_region_contains_read_region' sounds a bit too long for my taste).
> 
> > +{
> > +   return src.reg_offset >= dst.reg_offset &&
> > +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) 
> > &&
> > +  src.subreg_offset >= dst.subreg_offset;
> 
> This works under the assumption that src.subreg_offset is strictly less
> than the reg_offset unit -- Which *should* be the case unless we've
> messed up that restriction in some place (we have in the past :P).  To
> be on the safe side you could do something like following, if you like:
> 
> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
> |   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
> |  src.reg_offset + regs_read <= dst.reg_offset + regs_written;

I understand that even if we discard writes with dst.subreg_offset > 0,
you still want the subreg_offset check here to be safe exactly in that
scenario (since we would not need this for the case I originally wrote
it for).

> With the above taken into account:
> 
> Reviewed-by: Francisco Jerez 

Thanks!

> > +}
> > +
> >  bool
> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
> >  {
> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> > acp_entry *entry)
> > /* Bail if inst is reading a range that isn't contained in the range
> >  * that entry is writing.
> >  */
> > -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> > -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
> > -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> > -   (entry->dst.reg_offset + entry->regs_written) * 32)
> > +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> > + entry->dst, entry->regs_written))
> >return false;
> >  
> > /* we can't generally copy-propagate UD negations because we
> > @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
> > acp_entry *entry)
> >/* Bail if inst is reading a range that isn't contained in the range
> > * that entry is writing.
> > */
> > -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> > -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
> > -   inst->regs_read(i) * inst->src[i].stride * 32) >
> > -  (entry->dst.reg_offset + entry->regs_written) * 32)
> > +  if (!region_match(inst->src[i], inst->regs_read(i),
> > +entry->dst, entry->regs_written))
> >   continue;
> >  
> >fs_reg val = entry->src;
> > -- 
> > 2.5.0
> >
> > ___
> > mesa-dev mailing list

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-10 Thread Francisco Jerez
Francisco Jerez  writes:

> Samuel Iglesias Gonsálvez  writes:
>
>> From: Iago Toral Quiroga 
>>
>> We were not accounting for reg_suboffset in the check for the start
>> of the region. This meant that would allow copy-propagation even if
>> the dst wrote to sub_regoffset 4 and our source read from
>> sub_regoffset 0, which is not correct. This was observed in fp64 code,
>> since there we use reg_suboffset to select the high 32-bit of a double.
>>
> I don't think this paragraph is accurate, copy instructions with
> non-zero destination subreg offset are currently considered partial
> writes and should never have been added to the ACP hash table in the
> first place.
>
>> Also, fs_reg::regs_read() already takes the stride into account, so we
>> should not multiply its result by the stride again. This was making
>> copy-propagation fail to copy-propagate cases that would otherwise be
>> safe to copy-propagate. Again, this was observed in fp64 code, since
>> there we use stride > 1 often.
>>
>> Incidentally, these fixes open up more possibilities for copy propagation
>> which uncovered new bugs in copy-propagation. The folowing patches address
>> each of these new issues.
>
> Oh man, that sucks...
>
>> ---
>>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
>> +
>>  1 file changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> index 5fae10f..23df877 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
>> stride,
>> return true;
>>  }
>>  
>> +static inline bool
>> +region_match(fs_reg src, unsigned regs_read,
>> + fs_reg dst, unsigned regs_written)
>
Forgot to mention that there's no reason to pass the registers by value
here, please use 'const fs_reg &' instead.

> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I

Oops, in case it wasn't not clear from my sentence above, I didn't
intend to suggest using different argument names for this function, I
just typoed them -- regs_written sounds fine to me.

> personally wouldn't mind 'region_match' but
> 'write_region_contains_read_region' sounds a bit too long for my taste).
>
>> +{
>> +   return src.reg_offset >= dst.reg_offset &&
>> +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) &&
>> +  src.subreg_offset >= dst.subreg_offset;
>
> This works under the assumption that src.subreg_offset is strictly less
> than the reg_offset unit -- Which *should* be the case unless we've
> messed up that restriction in some place (we have in the past :P).  To
> be on the safe side you could do something like following, if you like:
>
> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
> |   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
> |  src.reg_offset + regs_read <= dst.reg_offset + regs_written;
>
> With the above taken into account:
>
> Reviewed-by: Francisco Jerez 
>
>> +}
>> +
>>  bool
>>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>>  {
>> @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
>> acp_entry *entry)
>> /* Bail if inst is reading a range that isn't contained in the range
>>  * that entry is writing.
>>  */
>> -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
>> -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
>> -inst->regs_read(arg) * inst->src[arg].stride * 32) >
>> -   (entry->dst.reg_offset + entry->regs_written) * 32)
>> +   if (!region_match(inst->src[arg], inst->regs_read(arg),
>> + entry->dst, entry->regs_written))
>>return false;
>>  
>> /* we can't generally copy-propagate UD negations because we
>> @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
>> acp_entry *entry)
>>/* Bail if inst is reading a range that isn't contained in the range
>> * that entry is writing.
>> */
>> -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
>> -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
>> -   inst->regs_read(i) * inst->src[i].stride * 32) >
>> -  (entry->dst.reg_offset + entry->regs_written) * 32)
>> +  if (!region_match(inst->src[i], inst->regs_read(i),
>> +entry->dst, entry->regs_written))
>>   continue;
>>  
>>fs_reg val = entry->src;
>> -- 
>> 2.5.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> We were not accounting for reg_suboffset in the check for the start
> of the region. This meant that would allow copy-propagation even if
> the dst wrote to sub_regoffset 4 and our source read from
> sub_regoffset 0, which is not correct. This was observed in fp64 code,
> since there we use reg_suboffset to select the high 32-bit of a double.
>
I don't think this paragraph is accurate, copy instructions with
non-zero destination subreg offset are currently considered partial
writes and should never have been added to the ACP hash table in the
first place.

> Also, fs_reg::regs_read() already takes the stride into account, so we
> should not multiply its result by the stride again. This was making
> copy-propagation fail to copy-propagate cases that would otherwise be
> safe to copy-propagate. Again, this was observed in fp64 code, since
> there we use stride > 1 often.
>
> Incidentally, these fixes open up more possibilities for copy propagation
> which uncovered new bugs in copy-propagation. The folowing patches address
> each of these new issues.

Oh man, that sucks...

> ---
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
> +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 5fae10f..23df877 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
> stride,
> return true;
>  }
>  
> +static inline bool
> +region_match(fs_reg src, unsigned regs_read,
> + fs_reg dst, unsigned regs_written)

How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
personally wouldn't mind 'region_match' but
'write_region_contains_read_region' sounds a bit too long for my taste).

> +{
> +   return src.reg_offset >= dst.reg_offset &&
> +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) &&
> +  src.subreg_offset >= dst.subreg_offset;

This works under the assumption that src.subreg_offset is strictly less
than the reg_offset unit -- Which *should* be the case unless we've
messed up that restriction in some place (we have in the past :P).  To
be on the safe side you could do something like following, if you like:

|   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
|   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
|  src.reg_offset + regs_read <= dst.reg_offset + regs_written;

With the above taken into account:

Reviewed-by: Francisco Jerez 

> +}
> +
>  bool
>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>  {
> @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> acp_entry *entry)
> /* Bail if inst is reading a range that isn't contained in the range
>  * that entry is writing.
>  */
> -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
> -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> -   (entry->dst.reg_offset + entry->regs_written) * 32)
> +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> + entry->dst, entry->regs_written))
>return false;
>  
> /* we can't generally copy-propagate UD negations because we
> @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
> acp_entry *entry)
>/* Bail if inst is reading a range that isn't contained in the range
> * that entry is writing.
> */
> -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
> -   inst->regs_read(i) * inst->src[i].stride * 32) >
> -  (entry->dst.reg_offset + entry->regs_written) * 32)
> +  if (!region_match(inst->src[i], inst->regs_read(i),
> +entry->dst, entry->regs_written))
>   continue;
>  
>fs_reg val = entry->src;
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-05 Thread Iago Toral
On Tue, 2016-05-03 at 17:28 -0700, Jordan Justen wrote:
> On 2016-05-03 05:21:55, Samuel Iglesias Gonsálvez wrote:
> > From: Iago Toral Quiroga 
> > 
> > We were not accounting for reg_suboffset in the check for the start
> > of the region. This meant that would allow copy-propagation even if
> > the dst wrote to sub_regoffset 4 and our source read from
> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
> > since there we use reg_suboffset to select the high 32-bit of a double.
> > 
> > Also, fs_reg::regs_read() already takes the stride into account, so we
> > should not multiply its result by the stride again. This was making
> > copy-propagation fail to copy-propagate cases that would otherwise be
> > safe to copy-propagate. Again, this was observed in fp64 code, since
> > there we use stride > 1 often.
> > 
> > Incidentally, these fixes open up more possibilities for copy propagation
> > which uncovered new bugs in copy-propagation. The folowing patches address
> > each of these new issues.
> 
> Should this be moved after those fixes?

Yeah, I think that would be better.

> -Jordan
> 
> > ---
> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
> > +
> >  1 file changed, 13 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > index 5fae10f..23df877 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
> > stride,
> > return true;
> >  }
> >  
> > +static inline bool
> > +region_match(fs_reg src, unsigned regs_read,
> > + fs_reg dst, unsigned regs_written)
> > +{
> > +   return src.reg_offset >= dst.reg_offset &&
> > +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) 
> > &&
> > +  src.subreg_offset >= dst.subreg_offset;
> > +}
> > +
> >  bool
> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
> >  {
> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> > acp_entry *entry)
> > /* Bail if inst is reading a range that isn't contained in the range
> >  * that entry is writing.
> >  */
> > -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> > -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
> > -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> > -   (entry->dst.reg_offset + entry->regs_written) * 32)
> > +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> > + entry->dst, entry->regs_written))
> >return false;
> >  
> > /* we can't generally copy-propagate UD negations because we
> > @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
> > acp_entry *entry)
> >/* Bail if inst is reading a range that isn't contained in the range
> > * that entry is writing.
> > */
> > -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> > -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
> > -   inst->regs_read(i) * inst->src[i].stride * 32) >
> > -  (entry->dst.reg_offset + entry->regs_written) * 32)
> > +  if (!region_match(inst->src[i], inst->regs_read(i),
> > +entry->dst, entry->regs_written))
> >   continue;
> >  
> >fs_reg val = entry->src;
> > -- 
> > 2.5.0
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-04 Thread Iago Toral
On Tue, 2016-05-03 at 17:37 -0700, Jason Ekstrand wrote:
> 
> 
> On Tue, May 3, 2016 at 5:21 AM, Samuel Iglesias Gonsálvez
>  wrote:
> From: Iago Toral Quiroga 
> 
> We were not accounting for reg_suboffset in the check for the
> start
> of the region. This meant that would allow copy-propagation
> even if
> the dst wrote to sub_regoffset 4 and our source read from
> sub_regoffset 0, which is not correct. This was observed in
> fp64 code,
> since there we use reg_suboffset to select the high 32-bit of
> a double.
> 
> 
> I'm not sure if copy-prop can handle the case where you write to
> sub_regoffset != 0 anyway.  Maybe we should just bail on that?

Right, we found a few issues and some of the patches in this series
address the problems we found. Maybe there are still a few other cases
that we missed but my thinking was that we probably wanted to fix the
pass for subreg_offset != 0 anyway and if we actually missed something
we could just fix it up when it shows up as a bug somewhere. That said,
we can of course simplify things and just bail here, we would lose some
optimizations but we might not need some of the other patches in the
series...

> Also, fs_reg::regs_read() already takes the stride into
> account, so we
> should not multiply its result by the stride again. This was
> making
> copy-propagation fail to copy-propagate cases that would
> otherwise be
> safe to copy-propagate. Again, this was observed in fp64 code,
> since
> there we use stride > 1 often.
> 
> Incidentally, these fixes open up more possibilities for copy
> propagation
> which uncovered new bugs in copy-propagation. The folowing
> patches address
> each of these new issues.
> ---
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21
> +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git
> a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 5fae10f..23df877 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned
> arg, unsigned stride,
> return true;
>  }
> 
> +static inline bool
> +region_match(fs_reg src, unsigned regs_read,
> + fs_reg dst, unsigned regs_written)
> 
> 
> I don't like the name.  It should say something about containment with
> perhaps a comment to go along with it.  Should we assert that they
> have the same file and number?

/**
 * Check that the register region read by src [src.reg_offset, 
 * src.reg_offset + regs_read] is contained inside the register
 * region written by dst [dst.reg_offset, dst.reg_offset + regs_written]
 * Both src and dst must have the same register number and file.
 */
static inline bool
write_region_contains_read_region(...)

And yes, we should definitely assert on the file and register number.
 
> +{
> +   return src.reg_offset >= dst.reg_offset &&
> +  (src.reg_offset + regs_read) <= (dst.reg_offset +
> regs_written) &&
> +  src.subreg_offset >= dst.subreg_offset;
> +}
> +
>  bool
>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg,
> acp_entry *entry)
>  {
> @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst
> *inst, int arg, acp_entry *entry)
> /* Bail if inst is reading a range that isn't contained in
> the range
>  * that entry is writing.
>  */
> -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> -   (inst->src[arg].reg_offset * 32 +
> inst->src[arg].subreg_offset +
> -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> -   (entry->dst.reg_offset + entry->regs_written) * 32)
> +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> + entry->dst, entry->regs_written))
>return false;
> 
> /* we can't generally copy-propagate UD negations because
> we
> @@ -554,10 +561,8 @@
> fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry
> *entry)
>/* Bail if inst is reading a range that isn't contained
> in the range
> * that entry is writing.
> */
> -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> -  (inst->src[i].reg_offset * 32 +
> inst->src[i].subreg_offset +
> -

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-03 Thread Jason Ekstrand
On Tue, May 3, 2016 at 5:21 AM, Samuel Iglesias Gonsálvez <
sigles...@igalia.com> wrote:

> From: Iago Toral Quiroga 
>
> We were not accounting for reg_suboffset in the check for the start
> of the region. This meant that would allow copy-propagation even if
> the dst wrote to sub_regoffset 4 and our source read from
> sub_regoffset 0, which is not correct. This was observed in fp64 code,
> since there we use reg_suboffset to select the high 32-bit of a double.
>

I'm not sure if copy-prop can handle the case where you write to
sub_regoffset != 0 anyway.  Maybe we should just bail on that?


> Also, fs_reg::regs_read() already takes the stride into account, so we
> should not multiply its result by the stride again. This was making
> copy-propagation fail to copy-propagate cases that would otherwise be
> safe to copy-propagate. Again, this was observed in fp64 code, since
> there we use stride > 1 often.
>
> Incidentally, these fixes open up more possibilities for copy propagation
> which uncovered new bugs in copy-propagation. The folowing patches address
> each of these new issues.
> ---
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21
> +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 5fae10f..23df877 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned
> stride,
> return true;
>  }
>
> +static inline bool
> +region_match(fs_reg src, unsigned regs_read,
> + fs_reg dst, unsigned regs_written)
>

I don't like the name.  It should say something about containment with
perhaps a comment to go along with it.  Should we assert that they have the
same file and number?


> +{
> +   return src.reg_offset >= dst.reg_offset &&
> +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written)
> &&
> +  src.subreg_offset >= dst.subreg_offset;
> +}
> +
>  bool
>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>  {
> @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int
> arg, acp_entry *entry)
> /* Bail if inst is reading a range that isn't contained in the range
>  * that entry is writing.
>  */
> -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
> -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> -   (entry->dst.reg_offset + entry->regs_written) * 32)
> +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> + entry->dst, entry->regs_written))
>return false;
>
> /* we can't generally copy-propagate UD negations because we
> @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst,
> acp_entry *entry)
>/* Bail if inst is reading a range that isn't contained in the range
> * that entry is writing.
> */
> -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
> -   inst->regs_read(i) * inst->src[i].stride * 32) >
> -  (entry->dst.reg_offset + entry->regs_written) * 32)
> +  if (!region_match(inst->src[i], inst->regs_read(i),
> +entry->dst, entry->regs_written))
>   continue;
>
>fs_reg val = entry->src;
> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-03 Thread Jordan Justen
On 2016-05-03 05:21:55, Samuel Iglesias Gonsálvez wrote:
> From: Iago Toral Quiroga 
> 
> We were not accounting for reg_suboffset in the check for the start
> of the region. This meant that would allow copy-propagation even if
> the dst wrote to sub_regoffset 4 and our source read from
> sub_regoffset 0, which is not correct. This was observed in fp64 code,
> since there we use reg_suboffset to select the high 32-bit of a double.
> 
> Also, fs_reg::regs_read() already takes the stride into account, so we
> should not multiply its result by the stride again. This was making
> copy-propagation fail to copy-propagate cases that would otherwise be
> safe to copy-propagate. Again, this was observed in fp64 code, since
> there we use stride > 1 often.
> 
> Incidentally, these fixes open up more possibilities for copy propagation
> which uncovered new bugs in copy-propagation. The folowing patches address
> each of these new issues.

Should this be moved after those fixes?

-Jordan

> ---
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
> +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 5fae10f..23df877 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
> stride,
> return true;
>  }
>  
> +static inline bool
> +region_match(fs_reg src, unsigned regs_read,
> + fs_reg dst, unsigned regs_written)
> +{
> +   return src.reg_offset >= dst.reg_offset &&
> +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) &&
> +  src.subreg_offset >= dst.subreg_offset;
> +}
> +
>  bool
>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>  {
> @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> acp_entry *entry)
> /* Bail if inst is reading a range that isn't contained in the range
>  * that entry is writing.
>  */
> -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
> -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> -   (entry->dst.reg_offset + entry->regs_written) * 32)
> +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> + entry->dst, entry->regs_written))
>return false;
>  
> /* we can't generally copy-propagate UD negations because we
> @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
> acp_entry *entry)
>/* Bail if inst is reading a range that isn't contained in the range
> * that entry is writing.
> */
> -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
> -   inst->regs_read(i) * inst->src[i].stride * 32) >
> -  (entry->dst.reg_offset + entry->regs_written) * 32)
> +  if (!region_match(inst->src[i], inst->regs_read(i),
> +entry->dst, entry->regs_written))
>   continue;
>  
>fs_reg val = entry->src;
> -- 
> 2.5.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-03 Thread Samuel Iglesias Gonsálvez
From: Iago Toral Quiroga 

We were not accounting for reg_suboffset in the check for the start
of the region. This meant that would allow copy-propagation even if
the dst wrote to sub_regoffset 4 and our source read from
sub_regoffset 0, which is not correct. This was observed in fp64 code,
since there we use reg_suboffset to select the high 32-bit of a double.

Also, fs_reg::regs_read() already takes the stride into account, so we
should not multiply its result by the stride again. This was making
copy-propagation fail to copy-propagate cases that would otherwise be
safe to copy-propagate. Again, this was observed in fp64 code, since
there we use stride > 1 often.

Incidentally, these fixes open up more possibilities for copy propagation
which uncovered new bugs in copy-propagation. The folowing patches address
each of these new issues.
---
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 5fae10f..23df877 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
stride,
return true;
 }
 
+static inline bool
+region_match(fs_reg src, unsigned regs_read,
+ fs_reg dst, unsigned regs_written)
+{
+   return src.reg_offset >= dst.reg_offset &&
+  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) &&
+  src.subreg_offset >= dst.subreg_offset;
+}
+
 bool
 fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
 {
@@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
/* Bail if inst is reading a range that isn't contained in the range
 * that entry is writing.
 */
-   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
-   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
-inst->regs_read(arg) * inst->src[arg].stride * 32) >
-   (entry->dst.reg_offset + entry->regs_written) * 32)
+   if (!region_match(inst->src[arg], inst->regs_read(arg),
+ entry->dst, entry->regs_written))
   return false;
 
/* we can't generally copy-propagate UD negations because we
@@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
acp_entry *entry)
   /* Bail if inst is reading a range that isn't contained in the range
* that entry is writing.
*/
-  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
-  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
-   inst->regs_read(i) * inst->src[i].stride * 32) >
-  (entry->dst.reg_offset + entry->regs_written) * 32)
+  if (!region_match(inst->src[i], inst->regs_read(i),
+entry->dst, entry->regs_written))
  continue;
 
   fs_reg val = entry->src;
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev