Re: [Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8
On Tue, Dec 4, 2018 at 1:19 AM Iago Toral Quiroga wrote: > We use ALign16 mode for this, since it is more convenient, but the PRM > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region > restrictions', Section '1. Special Restrictions': > >"In Align16 mode, the channel selects and channel enables apply to a > pair of half-floats, because these parameters are defined for DWord > elements ONLY. This is applicable when both source and destination > are half-floats." > > This means that we cannot select individual HF elements using swizzles > like we do with 32-bit floats so we can't implement the required > regioning for this. > > Use the gen11 path for this instead, which uses Align1 mode. > > The restriction is not present in gen9 of gen10, where the Align16 > "or gen10"? Reviewed-by: Jason Ekstrand > implementation seems to work just fine. > --- > src/intel/compiler/brw_fs_generator.cpp | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_generator.cpp > b/src/intel/compiler/brw_fs_generator.cpp > index d8e4bae17e0..ba7ed07e692 100644 > --- a/src/intel/compiler/brw_fs_generator.cpp > +++ b/src/intel/compiler/brw_fs_generator.cpp > @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst *inst, > const uint32_t type_size = type_sz(src.type); > > if (inst->opcode == FS_OPCODE_DDY_FINE) { > - /* produce accurate derivatives */ > - if (devinfo->gen >= 11) { > + /* produce accurate derivatives. We can do this easily in Align16 > + * but this is not supported in gen11+ and gen8 Align16 swizzles > + * for Half-Float operands work in units of 32-bit and always > + * select pairs of consecutive half-float elements, so we can't use > + * use it for this. > + */ > + if (devinfo->gen >= 11 || > + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) { > src = stride(src, 0, 2, 1); > struct brw_reg src_0 = byte_offset(src, 0 * type_size); > struct brw_reg src_2 = byte_offset(src, 2 * type_size); > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8
On Fri, 2018-12-07 at 15:06 +0200, Pohjolainen, Topi wrote: > On Tue, Dec 04, 2018 at 08:16:58AM +0100, Iago Toral Quiroga wrote: > > We use ALign16 mode for this, since it is more convenient, but the > > PRM > > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register > > region > > restrictions', Section '1. Special Restrictions': > > > >"In Align16 mode, the channel selects and channel enables apply > > to a > > pair of half-floats, because these parameters are defined for > > DWord > > elements ONLY. This is applicable when both source and > > destination > > are half-floats." > > > > This means that we cannot select individual HF elements using > > swizzles > > like we do with 32-bit floats so we can't implement the required > > regioning for this. > > > > Use the gen11 path for this instead, which uses Align1 mode. > > > > The restriction is not present in gen9 of gen10, where the Align16 > > or? Right, the issue is exclusive to gen8. Iago > > implementation seems to work just fine. > > --- > > src/intel/compiler/brw_fs_generator.cpp | 10 -- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/src/intel/compiler/brw_fs_generator.cpp > > b/src/intel/compiler/brw_fs_generator.cpp > > index d8e4bae17e0..ba7ed07e692 100644 > > --- a/src/intel/compiler/brw_fs_generator.cpp > > +++ b/src/intel/compiler/brw_fs_generator.cpp > > @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst > > *inst, > > const uint32_t type_size = type_sz(src.type); > > > > if (inst->opcode == FS_OPCODE_DDY_FINE) { > > - /* produce accurate derivatives */ > > - if (devinfo->gen >= 11) { > > + /* produce accurate derivatives. We can do this easily in > > Align16 > > + * but this is not supported in gen11+ and gen8 Align16 > > swizzles > > + * for Half-Float operands work in units of 32-bit and > > always > > + * select pairs of consecutive half-float elements, so we > > can't use > > + * use it for this. > > + */ > > + if (devinfo->gen >= 11 || > > + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) > > { > > src = stride(src, 0, 2, 1); > > struct brw_reg src_0 = byte_offset(src, 0 * type_size); > > struct brw_reg src_2 = byte_offset(src, 2 * type_size); > > -- > > 2.17.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8
On Tue, Dec 04, 2018 at 08:16:58AM +0100, Iago Toral Quiroga wrote: > We use ALign16 mode for this, since it is more convenient, but the PRM > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region > restrictions', Section '1. Special Restrictions': > >"In Align16 mode, the channel selects and channel enables apply to a > pair of half-floats, because these parameters are defined for DWord > elements ONLY. This is applicable when both source and destination > are half-floats." > > This means that we cannot select individual HF elements using swizzles > like we do with 32-bit floats so we can't implement the required > regioning for this. > > Use the gen11 path for this instead, which uses Align1 mode. > > The restriction is not present in gen9 of gen10, where the Align16 or? > implementation seems to work just fine. > --- > src/intel/compiler/brw_fs_generator.cpp | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_generator.cpp > b/src/intel/compiler/brw_fs_generator.cpp > index d8e4bae17e0..ba7ed07e692 100644 > --- a/src/intel/compiler/brw_fs_generator.cpp > +++ b/src/intel/compiler/brw_fs_generator.cpp > @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst *inst, > const uint32_t type_size = type_sz(src.type); > > if (inst->opcode == FS_OPCODE_DDY_FINE) { > - /* produce accurate derivatives */ > - if (devinfo->gen >= 11) { > + /* produce accurate derivatives. We can do this easily in Align16 > + * but this is not supported in gen11+ and gen8 Align16 swizzles > + * for Half-Float operands work in units of 32-bit and always > + * select pairs of consecutive half-float elements, so we can't use > + * use it for this. > + */ > + if (devinfo->gen >= 11 || > + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) { > src = stride(src, 0, 2, 1); > struct brw_reg src_0 = byte_offset(src, 0 * type_size); > struct brw_reg src_2 = byte_offset(src, 2 * type_size); > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 34/59] intel/compiler: fix ddy for half-float in gen8
We use ALign16 mode for this, since it is more convenient, but the PRM for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region restrictions', Section '1. Special Restrictions': "In Align16 mode, the channel selects and channel enables apply to a pair of half-floats, because these parameters are defined for DWord elements ONLY. This is applicable when both source and destination are half-floats." This means that we cannot select individual HF elements using swizzles like we do with 32-bit floats so we can't implement the required regioning for this. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in gen9 of gen10, where the Align16 implementation seems to work just fine. --- src/intel/compiler/brw_fs_generator.cpp | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp index d8e4bae17e0..ba7ed07e692 100644 --- a/src/intel/compiler/brw_fs_generator.cpp +++ b/src/intel/compiler/brw_fs_generator.cpp @@ -1281,8 +1281,14 @@ fs_generator::generate_ddy(const fs_inst *inst, const uint32_t type_size = type_sz(src.type); if (inst->opcode == FS_OPCODE_DDY_FINE) { - /* produce accurate derivatives */ - if (devinfo->gen >= 11) { + /* produce accurate derivatives. We can do this easily in Align16 + * but this is not supported in gen11+ and gen8 Align16 swizzles + * for Half-Float operands work in units of 32-bit and always + * select pairs of consecutive half-float elements, so we can't use + * use it for this. + */ + if (devinfo->gen >= 11 || + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) { src = stride(src, 0, 2, 1); struct brw_reg src_0 = byte_offset(src, 0 * type_size); struct brw_reg src_2 = byte_offset(src, 2 * type_size); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev