Re: [Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8

2018-12-07 Thread Jason Ekstrand
On Tue, Dec 4, 2018 at 1:19 AM Iago Toral Quiroga  wrote:

> We use ALign16 mode for this, since it is more convenient, but the PRM
> for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
> restrictions', Section '1. Special Restrictions':
>
>"In Align16 mode, the channel selects and channel enables apply to a
> pair of half-floats, because these parameters are defined for DWord
> elements ONLY. This is applicable when both source and destination
> are half-floats."
>
> This means that we cannot select individual HF elements using swizzles
> like we do with 32-bit floats so we can't implement the required
> regioning for this.
>
> Use the gen11 path for this instead, which uses Align1 mode.
>
> The restriction is not present in gen9 of gen10, where the Align16
>

"or gen10"?

Reviewed-by: Jason Ekstrand 


> implementation seems to work just fine.
> ---
>  src/intel/compiler/brw_fs_generator.cpp | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_generator.cpp
> b/src/intel/compiler/brw_fs_generator.cpp
> index d8e4bae17e0..ba7ed07e692 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst *inst,
> const uint32_t type_size = type_sz(src.type);
>
> if (inst->opcode == FS_OPCODE_DDY_FINE) {
> -  /* produce accurate derivatives */
> -  if (devinfo->gen >= 11) {
> +  /* produce accurate derivatives. We can do this easily in Align16
> +   * but this is not supported in gen11+ and gen8 Align16 swizzles
> +   * for Half-Float operands work in units of 32-bit and always
> +   * select pairs of consecutive half-float elements, so we can't use
> +   * use it for this.
> +   */
> +  if (devinfo->gen >= 11 ||
> +  (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {
>   src = stride(src, 0, 2, 1);
>   struct brw_reg src_0  = byte_offset(src,  0 * type_size);
>   struct brw_reg src_2  = byte_offset(src,  2 * type_size);
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8

2018-12-07 Thread Iago Toral
On Fri, 2018-12-07 at 15:06 +0200, Pohjolainen, Topi wrote:
> On Tue, Dec 04, 2018 at 08:16:58AM +0100, Iago Toral Quiroga wrote:
> > We use ALign16 mode for this, since it is more convenient, but the
> > PRM
> > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register
> > region
> > restrictions', Section '1. Special Restrictions':
> > 
> >"In Align16 mode, the channel selects and channel enables apply
> > to a
> > pair of half-floats, because these parameters are defined for
> > DWord
> > elements ONLY. This is applicable when both source and
> > destination
> > are half-floats."
> > 
> > This means that we cannot select individual HF elements using
> > swizzles
> > like we do with 32-bit floats so we can't implement the required
> > regioning for this.
> > 
> > Use the gen11 path for this instead, which uses Align1 mode.
> > 
> > The restriction is not present in gen9 of gen10, where the Align16
> 
>  or?

Right, the issue is exclusive to gen8.

Iago

> > implementation seems to work just fine.
> > ---
> >  src/intel/compiler/brw_fs_generator.cpp | 10 --
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs_generator.cpp
> > b/src/intel/compiler/brw_fs_generator.cpp
> > index d8e4bae17e0..ba7ed07e692 100644
> > --- a/src/intel/compiler/brw_fs_generator.cpp
> > +++ b/src/intel/compiler/brw_fs_generator.cpp
> > @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst
> > *inst,
> > const uint32_t type_size = type_sz(src.type);
> >  
> > if (inst->opcode == FS_OPCODE_DDY_FINE) {
> > -  /* produce accurate derivatives */
> > -  if (devinfo->gen >= 11) {
> > +  /* produce accurate derivatives. We can do this easily in
> > Align16
> > +   * but this is not supported in gen11+ and gen8 Align16
> > swizzles
> > +   * for Half-Float operands work in units of 32-bit and
> > always
> > +   * select pairs of consecutive half-float elements, so we
> > can't use
> > +   * use it for this.
> > +   */
> > +  if (devinfo->gen >= 11 ||
> > +  (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF))
> > {
> >   src = stride(src, 0, 2, 1);
> >   struct brw_reg src_0  = byte_offset(src,  0 * type_size);
> >   struct brw_reg src_2  = byte_offset(src,  2 * type_size);
> > -- 
> > 2.17.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8

2018-12-07 Thread Pohjolainen, Topi
On Tue, Dec 04, 2018 at 08:16:58AM +0100, Iago Toral Quiroga wrote:
> We use ALign16 mode for this, since it is more convenient, but the PRM
> for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
> restrictions', Section '1. Special Restrictions':
> 
>"In Align16 mode, the channel selects and channel enables apply to a
> pair of half-floats, because these parameters are defined for DWord
> elements ONLY. This is applicable when both source and destination
> are half-floats."
> 
> This means that we cannot select individual HF elements using swizzles
> like we do with 32-bit floats so we can't implement the required
> regioning for this.
> 
> Use the gen11 path for this instead, which uses Align1 mode.
> 
> The restriction is not present in gen9 of gen10, where the Align16

 or?

> implementation seems to work just fine.
> ---
>  src/intel/compiler/brw_fs_generator.cpp | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_generator.cpp 
> b/src/intel/compiler/brw_fs_generator.cpp
> index d8e4bae17e0..ba7ed07e692 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst *inst,
> const uint32_t type_size = type_sz(src.type);
>  
> if (inst->opcode == FS_OPCODE_DDY_FINE) {
> -  /* produce accurate derivatives */
> -  if (devinfo->gen >= 11) {
> +  /* produce accurate derivatives. We can do this easily in Align16
> +   * but this is not supported in gen11+ and gen8 Align16 swizzles
> +   * for Half-Float operands work in units of 32-bit and always
> +   * select pairs of consecutive half-float elements, so we can't use
> +   * use it for this.
> +   */
> +  if (devinfo->gen >= 11 ||
> +  (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {
>   src = stride(src, 0, 2, 1);
>   struct brw_reg src_0  = byte_offset(src,  0 * type_size);
>   struct brw_reg src_2  = byte_offset(src,  2 * type_size);
> -- 
> 2.17.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8

2018-12-03 Thread Iago Toral Quiroga
We use ALign16 mode for this, since it is more convenient, but the PRM
for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
restrictions', Section '1. Special Restrictions':

   "In Align16 mode, the channel selects and channel enables apply to a
pair of half-floats, because these parameters are defined for DWord
elements ONLY. This is applicable when both source and destination
are half-floats."

This means that we cannot select individual HF elements using swizzles
like we do with 32-bit floats so we can't implement the required
regioning for this.

Use the gen11 path for this instead, which uses Align1 mode.

The restriction is not present in gen9 of gen10, where the Align16
implementation seems to work just fine.
---
 src/intel/compiler/brw_fs_generator.cpp | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index d8e4bae17e0..ba7ed07e692 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst *inst,
const uint32_t type_size = type_sz(src.type);
 
if (inst->opcode == FS_OPCODE_DDY_FINE) {
-  /* produce accurate derivatives */
-  if (devinfo->gen >= 11) {
+  /* produce accurate derivatives. We can do this easily in Align16
+   * but this is not supported in gen11+ and gen8 Align16 swizzles
+   * for Half-Float operands work in units of 32-bit and always
+   * select pairs of consecutive half-float elements, so we can't use
+   * use it for this.
+   */
+  if (devinfo->gen >= 11 ||
+  (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {
  src = stride(src, 0, 2, 1);
  struct brw_reg src_0  = byte_offset(src,  0 * type_size);
  struct brw_reg src_2  = byte_offset(src,  2 * type_size);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev