Re: [Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

2019-01-07 Thread Iago Toral
On Mon, 2019-01-07 at 12:00 -0800, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
> > > Currently the visitor attempts to enforce the regioning
> > > restrictions
> > > that apply to double-precision instructions on CHV/BXT at NIR-to-
> > > i965
> > > translation time.  It is possible though for the copy propagation
> > > pass
> > > to violate this restriction if a strided move is propagated into
> > > one
> > > of the affected instructions.  I've only reproduced this issue on
> > > a
> > > future platform but it could affect CHV/BXT too under the right
> > > conditions.
> > > 
> > > Cc: mesa-sta...@lists.freedesktop.org
> > > ---
> > >  .../compiler/brw_fs_copy_propagation.cpp  | 10 +++
> > >  src/intel/compiler/brw_ir_fs.h| 28
> > > +++
> > >  2 files changed, 38 insertions(+)
> > > 
> > > diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > b/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > index a8ec1c34630..c23ce1ef426 100644
> > > --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg,
> > > unsigned stride,
> > > if (stride > 4)
> > >return false;
> > >  
> > > +   /* Bail if the channels of the source need to be aligned to
> > > the
> > > byte offset
> > > +* of the corresponding channel of the destination, and the
> > > provided stride
> > > +* would break this restriction.
> > > +*/
> > > +   if (has_dst_aligned_region_restriction(devinfo, inst) &&
> > > +   !(type_sz(inst->src[arg].type) * stride ==
> > > +   type_sz(inst->dst.type) * inst->dst.stride ||
> > > + stride == 0))
> > > +  return false;
> > > +
> > > /* 3-source instructions can only be Align16, which restricts
> > > what strides
> > >  * they can take. They can only take a stride of 1 (the usual
> > > case), or 0
> > >  * with a special "repctrl" bit. But the repctrl bit doesn't
> > > work
> > > for
> > > diff --git a/src/intel/compiler/brw_ir_fs.h
> > > b/src/intel/compiler/brw_ir_fs.h
> > > index 07e7224e0f8..95b069a2e02 100644
> > > --- a/src/intel/compiler/brw_ir_fs.h
> > > +++ b/src/intel/compiler/brw_ir_fs.h
> > > @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst)
> > > return type_sz(get_exec_type(inst));
> > >  }
> > >  
> > > +/**
> > > + * Return whether the following regioning restriction applies to
> > > the
> > > specified
> > > + * instruction.  From the Cherryview PRM Vol 7. "Register Region
> > > + * Restrictions":
> > > + *
> > > + * "When source or destination datatype is 64b or operation is
> > > integer DWord
> > > + *  multiply, regioning in Align1 must follow these rules:
> > > + *
> > > + *  1. Source and Destination horizontal stride must be aligned
> > > to
> > > the same qword.
> > > + *  2. Regioning must ensure Src.Vstride = Src.Width *
> > > Src.Hstride.
> > > + *  3. Source and Destination offset must be the same, except
> > > the
> > > case of
> > > + * scalar source."
> > > + */
> > > +static inline bool
> > > +has_dst_aligned_region_restriction(const gen_device_info
> > > *devinfo,
> > > +   const fs_inst *inst)
> > > +{
> > > +   const brw_reg_type exec_type = get_exec_type(inst);
> > > +   const bool is_int_multiply =
> > > !brw_reg_type_is_floating_point(exec_type) &&
> > > + (inst->opcode == BRW_OPCODE_MUL || inst->opcode ==
> > > BRW_OPCODE_MAD);
> > 
> > Should this be extended to include MAC and MACH too?
> > 
> 
> The documentation is unclear, but it doesn't look like that's the
> case
> according to the simulator, because those instructions don't do more
> than a 16x16 or 32x16 bit integer multiply respectively.

Ok, then it should be fine, thanks for looking into it.

> > > +
> > > +   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> > > +   (type_sz(exec_type) == 4 && is_int_multiply))
> > > +  return devinfo->is_cherryview ||
> > > gen_device_info_is_9lp(devinfo);
> > 
> > How about:
> > 
> > if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) {
> >...
> > } else {
> >return false;
> > }
> > 
> > since we only really need to do these checks in those platforms it
> > might make a bit more sense to do it this way.
> > 
> 
> Right now the difference is purely cosmetic, but in the future that
> won't work for the platform this was designed for, I can send you
> more
> details off-list.

That's fine, I was just looking at it from the point of view of the
platforms this is required for at present. If writing it like this
makes more sense to add future gens later on let's keep it as you wrote
it.

> > > +   else
> > > +  return false;
> > > +}
> > > +
> > >  #endif

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

2019-01-07 Thread Francisco Jerez
Iago Toral  writes:

> On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> Currently the visitor attempts to enforce the regioning restrictions
>> that apply to double-precision instructions on CHV/BXT at NIR-to-i965
>> translation time.  It is possible though for the copy propagation
>> pass
>> to violate this restriction if a strided move is propagated into one
>> of the affected instructions.  I've only reproduced this issue on a
>> future platform but it could affect CHV/BXT too under the right
>> conditions.
>> 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  .../compiler/brw_fs_copy_propagation.cpp  | 10 +++
>>  src/intel/compiler/brw_ir_fs.h| 28
>> +++
>>  2 files changed, 38 insertions(+)
>> 
>> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
>> b/src/intel/compiler/brw_fs_copy_propagation.cpp
>> index a8ec1c34630..c23ce1ef426 100644
>> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
>> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
>> @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg,
>> unsigned stride,
>> if (stride > 4)
>>return false;
>>  
>> +   /* Bail if the channels of the source need to be aligned to the
>> byte offset
>> +* of the corresponding channel of the destination, and the
>> provided stride
>> +* would break this restriction.
>> +*/
>> +   if (has_dst_aligned_region_restriction(devinfo, inst) &&
>> +   !(type_sz(inst->src[arg].type) * stride ==
>> +   type_sz(inst->dst.type) * inst->dst.stride ||
>> + stride == 0))
>> +  return false;
>> +
>> /* 3-source instructions can only be Align16, which restricts
>> what strides
>>  * they can take. They can only take a stride of 1 (the usual
>> case), or 0
>>  * with a special "repctrl" bit. But the repctrl bit doesn't work
>> for
>> diff --git a/src/intel/compiler/brw_ir_fs.h
>> b/src/intel/compiler/brw_ir_fs.h
>> index 07e7224e0f8..95b069a2e02 100644
>> --- a/src/intel/compiler/brw_ir_fs.h
>> +++ b/src/intel/compiler/brw_ir_fs.h
>> @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst)
>> return type_sz(get_exec_type(inst));
>>  }
>>  
>> +/**
>> + * Return whether the following regioning restriction applies to the
>> specified
>> + * instruction.  From the Cherryview PRM Vol 7. "Register Region
>> + * Restrictions":
>> + *
>> + * "When source or destination datatype is 64b or operation is
>> integer DWord
>> + *  multiply, regioning in Align1 must follow these rules:
>> + *
>> + *  1. Source and Destination horizontal stride must be aligned to
>> the same qword.
>> + *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
>> + *  3. Source and Destination offset must be the same, except the
>> case of
>> + * scalar source."
>> + */
>> +static inline bool
>> +has_dst_aligned_region_restriction(const gen_device_info *devinfo,
>> +   const fs_inst *inst)
>> +{
>> +   const brw_reg_type exec_type = get_exec_type(inst);
>> +   const bool is_int_multiply =
>> !brw_reg_type_is_floating_point(exec_type) &&
>> + (inst->opcode == BRW_OPCODE_MUL || inst->opcode ==
>> BRW_OPCODE_MAD);
>
> Should this be extended to include MAC and MACH too?
>

The documentation is unclear, but it doesn't look like that's the case
according to the simulator, because those instructions don't do more
than a 16x16 or 32x16 bit integer multiply respectively.

>> +
>> +   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
>> +   (type_sz(exec_type) == 4 && is_int_multiply))
>> +  return devinfo->is_cherryview ||
>> gen_device_info_is_9lp(devinfo);
>
> How about:
>
> if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) {
>...
> } else {
>return false;
> }
>
> since we only really need to do these checks in those platforms it
> might make a bit more sense to do it this way.
>

Right now the difference is purely cosmetic, but in the future that
won't work for the platform this was designed for, I can send you more
details off-list.

>> +   else
>> +  return false;
>> +}
>> +
>>  #endif


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

2019-01-07 Thread Iago Toral
On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
> Currently the visitor attempts to enforce the regioning restrictions
> that apply to double-precision instructions on CHV/BXT at NIR-to-i965
> translation time.  It is possible though for the copy propagation
> pass
> to violate this restriction if a strided move is propagated into one
> of the affected instructions.  I've only reproduced this issue on a
> future platform but it could affect CHV/BXT too under the right
> conditions.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  .../compiler/brw_fs_copy_propagation.cpp  | 10 +++
>  src/intel/compiler/brw_ir_fs.h| 28
> +++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
> b/src/intel/compiler/brw_fs_copy_propagation.cpp
> index a8ec1c34630..c23ce1ef426 100644
> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
> @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg,
> unsigned stride,
> if (stride > 4)
>return false;
>  
> +   /* Bail if the channels of the source need to be aligned to the
> byte offset
> +* of the corresponding channel of the destination, and the
> provided stride
> +* would break this restriction.
> +*/
> +   if (has_dst_aligned_region_restriction(devinfo, inst) &&
> +   !(type_sz(inst->src[arg].type) * stride ==
> +   type_sz(inst->dst.type) * inst->dst.stride ||
> + stride == 0))
> +  return false;
> +
> /* 3-source instructions can only be Align16, which restricts
> what strides
>  * they can take. They can only take a stride of 1 (the usual
> case), or 0
>  * with a special "repctrl" bit. But the repctrl bit doesn't work
> for
> diff --git a/src/intel/compiler/brw_ir_fs.h
> b/src/intel/compiler/brw_ir_fs.h
> index 07e7224e0f8..95b069a2e02 100644
> --- a/src/intel/compiler/brw_ir_fs.h
> +++ b/src/intel/compiler/brw_ir_fs.h
> @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst)
> return type_sz(get_exec_type(inst));
>  }
>  
> +/**
> + * Return whether the following regioning restriction applies to the
> specified
> + * instruction.  From the Cherryview PRM Vol 7. "Register Region
> + * Restrictions":
> + *
> + * "When source or destination datatype is 64b or operation is
> integer DWord
> + *  multiply, regioning in Align1 must follow these rules:
> + *
> + *  1. Source and Destination horizontal stride must be aligned to
> the same qword.
> + *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
> + *  3. Source and Destination offset must be the same, except the
> case of
> + * scalar source."
> + */
> +static inline bool
> +has_dst_aligned_region_restriction(const gen_device_info *devinfo,
> +   const fs_inst *inst)
> +{
> +   const brw_reg_type exec_type = get_exec_type(inst);
> +   const bool is_int_multiply =
> !brw_reg_type_is_floating_point(exec_type) &&
> + (inst->opcode == BRW_OPCODE_MUL || inst->opcode ==
> BRW_OPCODE_MAD);

Should this be extended to include MAC and MACH too?

> +
> +   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> +   (type_sz(exec_type) == 4 && is_int_multiply))
> +  return devinfo->is_cherryview ||
> gen_device_info_is_9lp(devinfo);

How about:

if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) {
   ...
} else {
   return false;
}

since we only really need to do these checks in those platforms it
might make a bit more sense to do it this way.

> +   else
> +  return false;
> +}
> +
>  #endif

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

2018-12-29 Thread Francisco Jerez
Currently the visitor attempts to enforce the regioning restrictions
that apply to double-precision instructions on CHV/BXT at NIR-to-i965
translation time.  It is possible though for the copy propagation pass
to violate this restriction if a strided move is propagated into one
of the affected instructions.  I've only reproduced this issue on a
future platform but it could affect CHV/BXT too under the right
conditions.

Cc: mesa-sta...@lists.freedesktop.org
---
 .../compiler/brw_fs_copy_propagation.cpp  | 10 +++
 src/intel/compiler/brw_ir_fs.h| 28 +++
 2 files changed, 38 insertions(+)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index a8ec1c34630..c23ce1ef426 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
stride,
if (stride > 4)
   return false;
 
+   /* Bail if the channels of the source need to be aligned to the byte offset
+* of the corresponding channel of the destination, and the provided stride
+* would break this restriction.
+*/
+   if (has_dst_aligned_region_restriction(devinfo, inst) &&
+   !(type_sz(inst->src[arg].type) * stride ==
+   type_sz(inst->dst.type) * inst->dst.stride ||
+ stride == 0))
+  return false;
+
/* 3-source instructions can only be Align16, which restricts what strides
 * they can take. They can only take a stride of 1 (the usual case), or 0
 * with a special "repctrl" bit. But the repctrl bit doesn't work for
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 07e7224e0f8..95b069a2e02 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst)
return type_sz(get_exec_type(inst));
 }
 
+/**
+ * Return whether the following regioning restriction applies to the specified
+ * instruction.  From the Cherryview PRM Vol 7. "Register Region
+ * Restrictions":
+ *
+ * "When source or destination datatype is 64b or operation is integer DWord
+ *  multiply, regioning in Align1 must follow these rules:
+ *
+ *  1. Source and Destination horizontal stride must be aligned to the same 
qword.
+ *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
+ *  3. Source and Destination offset must be the same, except the case of
+ * scalar source."
+ */
+static inline bool
+has_dst_aligned_region_restriction(const gen_device_info *devinfo,
+   const fs_inst *inst)
+{
+   const brw_reg_type exec_type = get_exec_type(inst);
+   const bool is_int_multiply = !brw_reg_type_is_floating_point(exec_type) &&
+ (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
+
+   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
+   (type_sz(exec_type) == 4 && is_int_multiply))
+  return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
+   else
+  return false;
+}
+
 #endif
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev