Re: [Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

2018-05-22 Thread Iago Toral
On Mon, 2018-05-21 at 13:49 +0300, Eero Tamminen wrote:
> Hi,
> 
> On 21.05.2018 10:42, Iago Toral wrote:
> > On Fri, 2018-05-18 at 12:08 +0300, Eero Tamminen wrote:
> > > On 17.05.2018 14:25, Eero Tamminen wrote:
> > > > On 17.05.2018 11:46, Iago Toral Quiroga wrote:
> > > > > The PRM for MAD states that F, DF and HF are supported,
> > > > > however,
> > > > > then
> > > > > it requires that the instruction includes a 2-bit mask
> > > > > specifying
> > > > > the types of each operand like this:
> > > > 
> > > >   >
> > > > > 00: 32-bit float
> > > > > 01: 32-bit signed integer
> > > > > 10: 32-bit unsigned integer
> > > > > 11: 64-bit float
> > > > 
> > > > Where this was?
> > 
> > This is in the decription of the MAD instruction here (for SKL):
> > 
> > https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc
> > -skl
> > -vol02a-commandreference-instructions.pdf
> > 
> > It guess the contents for this were copy & pasted from previous
> > PRMs
> > that didn't support HF...
> 
> Ouch.  That looks pretty different from what's in here:
> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-s
> kl-vol07-3d_media_gpgpu.pdf
> 
> > > > In
> > > > https://01.org/sites/default/files/documentation/intel-gfx-bspe
> > > > c-os
> > > > rc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf
> 
> I'll ask around who's currently maintaining the 01.org docs, and
> could 
> she/he update the opcode docs. All of them from BSW to KBL seem to
> have 
> the old information, while the Media GPGPU docs have newer info.

Thanks Eero!

> 
> Btw.  Did you have test-cases utilizing mad() instructions, and did
> they work OK with your patchset?

Yes, but they only worked because I was dropping the MAD and open
coding it as MUL+ADD for HF operands.

> (If yes, better test-cases may be required.)

The existing tests catch the problem if I don't attempt to lower the
MAD for HF, so they are good enough for this.

BTW, I implemented the solution using the Src1Type and Src2Type bits
and it seems to work fine in BDW as well.

Iago

> 
> > > > Section "EU Changes by Processor Generation" states:
> > > > -
> > > > These features or behaviors are added for CHV, BSW, continuing
> > > > to
> > > > later generations:
> > > > ...
> > > > In the 3-source instruction format, widen the SrcType and
> > > > DstType
> > > > fields
> > > > and add an encoding for the HF (Half Float) type.
> > > > -
> > > > 
> > > > (I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)
> > 
> > Actually, I have just verified that the BDW PRMs have the exact
> > same
> > thing, but stating BDW instead of BSW/CHV, so I guess BDW should be
> > supported too.
> 
> Yes, right.
> 
> 
> > > > In section "GEN Instruction Format – 3-src" table, both "Dst
> > > > Type"
> > > > and
> > > > "Src Type" fields are 3 bits, and there's additional 1 bit
> > > > "Src1
> > > > Type"
> > > > and "Src2 Type" fields to differentiate formats for src1 &
> > > > src2.
> > > 
> > >   >
> > > > Then, when looking at "Source or Destination Operand Fields
> > > > (Alphabetically by Short Name)" section:
> > > > ---
> > > > DstType:
> > > > 
> > > > Encoding for three source instructions:
> > > > 000b = :f. Single precision Float (32-bit).
> > > > 001b = :d. Signed Doubleword integer.
> > > > 010b = :ud. Unsigned Doubleword integer.
> > > > 011b = :df. Double precision Float (64-bit).
> > > > 100b = :hf. Half precision Float (16-bit).
> > > > 101b - 111b. Reserved.
> > > > 
> > > > ...
> > > > 
> > > > SrcType:
> > > > 
> > > > Three source instructions use one SrcType field for all source
> > > > operands,
> > > > with a 3-bit encoding that allows fewer data types:
> > > > 
> > > > Encoding for three source instructions:
> > > > 000b = :f. Single precision Float (32-bit).
> > > > 001b = :d. Signed Doubleword integer.
> > > > 010b = :ud. Unsigned Doubleword integer.
> > > > 011b = :df. Double precision Float (64-bit).
> > > > 100b = :hf. Half precision Float (16-bit).
> > > > 101b - 111b. Reserved.
> > > > 
> > > > Three source instructions can use operands with mixed-mode
> > > > precision.
> > > > When SrcType field is set to :f or :hf it defines precision for
> > > > source 0
> > > > only, and fields Src1Type and Src2Type define precision for
> > > > other
> > > > source
> > > > operands:
> > > > 0b = :f. Single precision Float (32-bit).
> > > > 1b = :hf. Half precision Float (16-bit).
> > > > ---
> > > 
> > > Note also the section "Special Requirements for Handling Mixed
> > > Mode
> > > Float Operations".
> 
> Both BDW & BSW specs list also this restriction, which is lifted
> GEN9:
> * Calculations using the HF (Half Float) type do not support
> denormals 
> or gradual underflow.
> 
> 
> > > Btw. Mesa supporting HF with mad() is important because pln()
> > > doesn't support it in HW, but you can implement pln() HF version
> > > with two mad()s.

Re: [Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

2018-05-21 Thread Eero Tamminen

Hi,

On 21.05.2018 10:42, Iago Toral wrote:

On Fri, 2018-05-18 at 12:08 +0300, Eero Tamminen wrote:

On 17.05.2018 14:25, Eero Tamminen wrote:

On 17.05.2018 11:46, Iago Toral Quiroga wrote:

The PRM for MAD states that F, DF and HF are supported, however,
then
it requires that the instruction includes a 2-bit mask specifying
the types of each operand like this:


  >

00: 32-bit float
01: 32-bit signed integer
10: 32-bit unsigned integer
11: 64-bit float


Where this was?


This is in the decription of the MAD instruction here (for SKL):

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl
-vol02a-commandreference-instructions.pdf

It guess the contents for this were copy & pasted from previous PRMs
that didn't support HF...


Ouch.  That looks pretty different from what's in here:
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf

And in BSW one:

In
https://01.org/sites/default/files/documentation/intel-gfx-bspec-os
rc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf


I'll ask around who's currently maintaining the 01.org docs, and could 
she/he update the opcode docs. All of them from BSW to KBL seem to have 
the old information, while the Media GPGPU docs have newer info.



Btw.  Did you have test-cases utilizing mad() instructions, and did
they work OK with your patchset?

(If yes, better test-cases may be required.)



Section "EU Changes by Processor Generation" states:
-
These features or behaviors are added for CHV, BSW, continuing to
later generations:
...
In the 3-source instruction format, widen the SrcType and DstType
fields
and add an encoding for the HF (Half Float) type.
-

(I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)


Actually, I have just verified that the BDW PRMs have the exact same
thing, but stating BDW instead of BSW/CHV, so I guess BDW should be
supported too.


Yes, right.



In section "GEN Instruction Format – 3-src" table, both "Dst Type"
and
"Src Type" fields are 3 bits, and there's additional 1 bit "Src1
Type"
and "Src2 Type" fields to differentiate formats for src1 & src2.


  >

Then, when looking at "Source or Destination Operand Fields
(Alphabetically by Short Name)" section:
---
DstType:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

...

SrcType:

Three source instructions use one SrcType field for all source
operands,
with a 3-bit encoding that allows fewer data types:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

Three source instructions can use operands with mixed-mode
precision.
When SrcType field is set to :f or :hf it defines precision for
source 0
only, and fields Src1Type and Src2Type define precision for other
source
operands:
0b = :f. Single precision Float (32-bit).
1b = :hf. Half precision Float (16-bit).
---


Note also the section "Special Requirements for Handling Mixed Mode
Float Operations".


Both BDW & BSW specs list also this restriction, which is lifted GEN9:
* Calculations using the HF (Half Float) type do not support denormals 
or gradual underflow.




Btw. Mesa supporting HF with mad() is important because pln()
doesn't support it in HW, but you can implement pln() HF version
with two mad()s.



- Eero


[1]: SkyLake & KabyLake PRMs lists same amount of bits, but don't
tell
what they represent (often one needs to look into PRM for the HW
where
these changes were first introduced, which in this case was
Braswell /
Cherryview).



So 16-bit float would not be supported.  The driver also asserts
that
the types involved in ALING16 3-src operations are one of these
(MAD is always emitted as an align16 instruction prior to gen10).
---
   src/intel/compiler/brw_fs_nir.cpp | 9 -
   1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 91283ab4911..58ddc456bae 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const fs_builder
&bld,
nir_alu_instr *instr)
 break;
  case nir_op_ffma:
-  inst = bld.MAD(result, op[2], op[1], op[0]);
+  /* 3-src MAD doesn't support 16-bit operands */
+  if (nir_dest_bit_size(instr->dest.dest) >= 32) {
+ inst = bld.MAD(result, op[2], op[1], op[0]);
+  } else {
+ fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF);
+ bld.MUL(tmp, op[1], op[0]);
+  

Re: [Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

2018-05-17 Thread Eero Tamminen

Hi,

On 17.05.2018 11:46, Iago Toral Quiroga wrote:

The PRM for MAD states that F, DF and HF are supported, however, then
it requires that the instruction includes a 2-bit mask specifying
the types of each operand like this:

>

00: 32-bit float
01: 32-bit signed integer
10: 32-bit unsigned integer
11: 64-bit float


Where this was?


In 
https://01.org/sites/default/files/documentation/intel-gfx-bspec-osrc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf


Section "EU Changes by Processor Generation" states:
-
These features or behaviors are added for CHV, BSW, continuing to later 
generations:

...
In the 3-source instruction format, widen the SrcType and DstType fields 
and add an encoding for the HF (Half Float) type.

-

(I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)


In section "GEN Instruction Format – 3-src" table, both "Dst Type" and 
"Src Type" fields are 3 bits, and there's additional 1 bit "Src1 Type" 
and "Src2 Type" fields to differentiate formats for src1 & src2.



Then, when looking at "Source or Destination Operand Fields 
(Alphabetically by Short Name)" section:

---
DstType:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

...

SrcType:

Three source instructions use one SrcType field for all source operands, 
with a 3-bit encoding that allows fewer data types:


Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

Three source instructions can use operands with mixed-mode precision. 
When SrcType field is set to :f or :hf it defines precision for source 0 
only, and fields Src1Type and Src2Type define precision for other source 
operands:

0b = :f. Single precision Float (32-bit).
1b = :hf. Half precision Float (16-bit).
---


- Eero

[1]: SkyLake & KabyLake PRMs lists same amount of bits, but don't tell 
what they represent (often one needs to look into PRM for the HW where 
these changes were first introduced, which in this case was Braswell / 
Cherryview).




So 16-bit float would not be supported.  The driver also asserts that
the types involved in ALING16 3-src operations are one of these
(MAD is always emitted as an align16 instruction prior to gen10).
---
  src/intel/compiler/brw_fs_nir.cpp | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 91283ab4911..58ddc456bae 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, 
nir_alu_instr *instr)
break;
  
 case nir_op_ffma:

-  inst = bld.MAD(result, op[2], op[1], op[0]);
+  /* 3-src MAD doesn't support 16-bit operands */
+  if (nir_dest_bit_size(instr->dest.dest) >= 32) {
+ inst = bld.MAD(result, op[2], op[1], op[0]);
+  } else {
+ fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF);
+ bld.MUL(tmp, op[1], op[0]);
+ inst = bld.ADD(result, tmp, op[2]);
+  }
inst->saturate = instr->dest.saturate;
break;
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev