Re: [Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Iago Toral
Hi Curro, thanks for looking into this, I have a few comments inline:

On Tue, 2019-01-15 at 14:34 -0800, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > v2: adapted to work with the new regioning lowering pass
> > 
> > Reviewed-by: Topi Pohjolainen  (v1)
> > ---
> >  src/intel/compiler/brw_ir_fs.h | 33 ++--
> > -
> >  1 file changed, 26 insertions(+), 7 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_ir_fs.h
> > b/src/intel/compiler/brw_ir_fs.h
> > index 3c23fb375e4..ba4d6a95720 100644
> > --- a/src/intel/compiler/brw_ir_fs.h
> > +++ b/src/intel/compiler/brw_ir_fs.h
> > @@ -497,9 +497,10 @@ is_unordered(const fs_inst *inst)
> >  }
> >  
> >  /**
> > - * Return whether the following regioning restriction applies to
> > the specified
> > - * instruction.  From the Cherryview PRM Vol 7. "Register Region
> > - * Restrictions":
> > + * Return whether one of the the following regioning restrictions
> > apply to the
> > + * specified instruction.
> > + *
> > + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> >   *
> >   * "When source or destination datatype is 64b or operation is
> > integer DWord
> >   *  multiply, regioning in Align1 must follow these rules:
> > @@ -508,6 +509,14 @@ is_unordered(const fs_inst *inst)
> >   *  2. Regioning must ensure Src.Vstride = Src.Width *
> > Src.Hstride.
> >   *  3. Source and Destination offset must be the same, except the
> > case of
> >   * scalar source."
> > + *
> > + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> > + *
> > + *"Conversion between Integer and HF (Half Float) must be
> > DWord
> > + * aligned and strided by a DWord on the destination."
> > + *
> > + *The same restriction is listed for other hardware platforms,
> > however,
> > + *empirical testing suggests that only atom platforms are
> > affected.
> >   */
> >  static inline bool
> >  has_dst_aligned_region_restriction(const gen_device_info *devinfo,
> > @@ -518,10 +527,20 @@ has_dst_aligned_region_restriction(const
> > gen_device_info *devinfo,
> >   (inst->opcode == BRW_OPCODE_MUL || inst->opcode ==
> > BRW_OPCODE_MAD);
> >  
> > if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> > -   (type_sz(exec_type) == 4 && is_int_multiply))
> > -  return devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo);
> > -   else
> > -  return false;
> > +   (type_sz(exec_type) == 4 && is_int_multiply)) {
> > +  if (devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo))
> > + return true;
> > +   }
> > +
> > +   const bool dst_type_is_hf = inst->dst.type ==
> > BRW_REGISTER_TYPE_HF;
> > +   const bool exec_type_is_hf = exec_type == BRW_REGISTER_TYPE_HF;
> > +   if ((dst_type_is_hf &&
> > !brw_reg_type_is_floating_point(exec_type)) ||
> > +   (exec_type_is_hf && !brw_reg_type_is_floating_point(inst-
> > >dst.type))) {
> > +  if (devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo))
> > + return true;
> > +   }
> 
> While looking into this closely, I'm seeing substantial divergence
> between the behavior of the simulator, the hardware docs, and the
> restriction this is implementing...  The docs are certainly
> inconsistent
> about how and where this should be handled.
> 
> I'm suspecting that this restriction is more similar in nature to the
> one referred to in the regioning lowering pass as
> "is_narrowing_conversion", rather than the one handled by
> has_dst_aligned_region_restriction().  Probably we don't need to
> change
> this function nor the regioning pass for it to be honored, because
> that
> restriction is already implemented.  I have a feeling that the reason
> for this may be that the 16-bit pipeline lacks the ability to handle
> conversions from or to half-float, so the execution type is
> implicitly
> promoted to the matching (integer or floating-point) 32-bit type
> where
> any HF conversion would be needed.  And on those the usual alignment
> restriction of the destination to a larger execution type
> applies.  From
> the hardware docs for CHV *only*:
> 
> > When single precision and half precision floats are mixed between
> > source operands or between source and destination operand. In such
> > cases, single precision float is the execution datatype.
> 
> This would mean that an "add dst:f, src:hf, src:hf" is really
> computed
> with single precision (!).
> 
> The restriction you're quoting seems to be the following:
> 
> > BDW+
> > 
> > Conversion between Integer and HF (Half Float) must be DWord-
> > aligned
> > and strided by a DWord on the destination.
> > 
> > // Example:
> > add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
> > // Destination stride must be 2.
> > mov (8) r10.0<2>:w r11.0<8;8,1>:hf
> > // Destination stride must be 2.
> 
> However that restriction is apparently overriden on *most* projects
> except for BDW (where you aren't applying any restriction at all) by
> the
> following:


Re: [Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> v2: adapted to work with the new regioning lowering pass
>
> Reviewed-by: Topi Pohjolainen  (v1)
> ---
>  src/intel/compiler/brw_ir_fs.h | 33 ++---
>  1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
> index 3c23fb375e4..ba4d6a95720 100644
> --- a/src/intel/compiler/brw_ir_fs.h
> +++ b/src/intel/compiler/brw_ir_fs.h
> @@ -497,9 +497,10 @@ is_unordered(const fs_inst *inst)
>  }
>  
>  /**
> - * Return whether the following regioning restriction applies to the 
> specified
> - * instruction.  From the Cherryview PRM Vol 7. "Register Region
> - * Restrictions":
> + * Return whether one of the the following regioning restrictions apply to 
> the
> + * specified instruction.
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
>   *
>   * "When source or destination datatype is 64b or operation is integer DWord
>   *  multiply, regioning in Align1 must follow these rules:
> @@ -508,6 +509,14 @@ is_unordered(const fs_inst *inst)
>   *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
>   *  3. Source and Destination offset must be the same, except the case of
>   * scalar source."
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> + *
> + *"Conversion between Integer and HF (Half Float) must be DWord
> + * aligned and strided by a DWord on the destination."
> + *
> + *The same restriction is listed for other hardware platforms, however,
> + *empirical testing suggests that only atom platforms are affected.
>   */
>  static inline bool
>  has_dst_aligned_region_restriction(const gen_device_info *devinfo,
> @@ -518,10 +527,20 @@ has_dst_aligned_region_restriction(const 
> gen_device_info *devinfo,
>   (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
>  
> if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> -   (type_sz(exec_type) == 4 && is_int_multiply))
> -  return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
> -   else
> -  return false;
> +   (type_sz(exec_type) == 4 && is_int_multiply)) {
> +  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> + return true;
> +   }
> +
> +   const bool dst_type_is_hf = inst->dst.type == BRW_REGISTER_TYPE_HF;
> +   const bool exec_type_is_hf = exec_type == BRW_REGISTER_TYPE_HF;
> +   if ((dst_type_is_hf && !brw_reg_type_is_floating_point(exec_type)) ||
> +   (exec_type_is_hf && !brw_reg_type_is_floating_point(inst->dst.type))) 
> {
> +  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> + return true;
> +   }

While looking into this closely, I'm seeing substantial divergence
between the behavior of the simulator, the hardware docs, and the
restriction this is implementing...  The docs are certainly inconsistent
about how and where this should be handled.

I'm suspecting that this restriction is more similar in nature to the
one referred to in the regioning lowering pass as
"is_narrowing_conversion", rather than the one handled by
has_dst_aligned_region_restriction().  Probably we don't need to change
this function nor the regioning pass for it to be honored, because that
restriction is already implemented.  I have a feeling that the reason
for this may be that the 16-bit pipeline lacks the ability to handle
conversions from or to half-float, so the execution type is implicitly
promoted to the matching (integer or floating-point) 32-bit type where
any HF conversion would be needed.  And on those the usual alignment
restriction of the destination to a larger execution type applies.  From
the hardware docs for CHV *only*:

| When single precision and half precision floats are mixed between
| source operands or between source and destination operand. In such
| cases, single precision float is the execution datatype.

This would mean that an "add dst:f, src:hf, src:hf" is really computed
with single precision (!).

The restriction you're quoting seems to be the following:

| BDW+
|
| Conversion between Integer and HF (Half Float) must be DWord-aligned
| and strided by a DWord on the destination.
|
| // Example:
| add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
| // Destination stride must be 2.
| mov (8) r10.0<2>:w r11.0<8;8,1>:hf
| // Destination stride must be 2.

However that restriction is apparently overriden on *most* projects
except for BDW (where you aren't applying any restriction at all) by the
following:

| Project:  CHV, SKL+
|
| There is a relaxed alignment rule for word destinations. When the
| destination type is word (UW, W, HF), destination data types can be
| aligned to either the lowest word or the second lowest word of the
| execution channel. This means the destination data words can be either
| all in the even word locations or all in the odd word locations.
| 
| // Example:
| add (8)  r10.0<2>:hf