Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-04 Thread Matt Arsenault

> On Oct 4, 2017, at 12:50, Marek Olšák  wrote:
> 
> The LLVM backends selects MAD (unfused) for fmuladd, and FMA (fused) for fma.

For f64 and f16 by default it will emit an FMA since mad doesn’t support 
denorms.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-04 Thread Marek Olšák
On Wed, Oct 4, 2017 at 7:35 PM, Ilia Mirkin  wrote:
> Ah OK. So llvm.fmuladd is more like llvm.fmadontcare. Wrong assumption
> on my part.

The LLVM backends selects MAD (unfused) for fmuladd, and FMA (fused) for fma.

Personally I would prefer having separate fmul and fadd in LLVM IR
instead of the intrinsic.

Marek

>
> On Wed, Oct 4, 2017 at 1:00 PM, Connor Abbott  wrote:
>> No. From the LLVM langref:
>>
>> The ‘llvm.fmuladd.*‘ intrinsic functions represent multiply-add
>> expressions that can be fused if the code generator determines that
>> (a) the target instruction set has support for a fused operation, and
>> (b) that the fused operation is more efficient than the equivalent,
>> separate pair of mul and add instructions.
>>
>> The (b) part is especially important -- it says that LLVM can pick and
>> choose which fmuladd intrinsics to turn into FMA instructions, or
>> unfused MULADD instructions, or just a sequence of mul+add. For
>> example, if many instructions call fmuladd with the first two
>> arguments the same, it can break it up into a mul followed by a bunch
>> of adds. That wouldn't be ok under the GLSL precise semantics
>> (assuming the target would've used FMA otherwise, which I think some
>> GCN cards will do).
>>
>> Also, and maybe more importantly, if an app developer explicitly asks
>> for fma() with a precise modifier, it's probably not a great idea to
>> then give them an unfused mul+add -- it's legal, thanks to GLSL's
>> weasel-wording, but probably not what you really want, on HW which
>> actually does have an FMA instruction :)
>>
>> Connor
>>
>>
>> On Wed, Oct 4, 2017 at 11:25 AM, Ilia Mirkin  wrote:
>>> Wouldn't this guarantee that nothing is fused (and thus fine)?
>>> Presumably fmuladd always does mul+add either as 1 or 2 instructions?
>>>
>>> On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbott  wrote:
 If the fma has the exact flag, then we need to use the llvm.fma
 intrinsic. These come from fma() calls with the precise or invariant
 qualifiers in GLSL, where you basically have to fuse everything or
 fuse nothing consistently, and llvm.fmuladd doesn't guarantee that.

 On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> For Vulkan SPIR-V the spec states
> fma() Inherited from OpFMul followed by OpFAdd.
>
> Matt says the backend will do the right thing depending on the
> hardware being compiled for, if you use the fmuladd intrinsic.
>
> Using the Mad Max pts test, on high settings at 4K:
> CHP: 55->60
> HGDD: 46->50
> LM: 55->60
> No change on Stronghold.
>
> Thanks to Feral for spending the time to track this down.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c 
> b/src/amd/common/ac_nir_to_llvm.c
> index d7b6259..11ba487 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, 
> const nir_alu_instr *instr)
>   result);
> break;
> case nir_op_ffma:
> -   result = emit_intrin_3f_param(>ac, "llvm.fma",
> +   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
>   ac_to_float_type(>ac, 
> def_type), src[0], src[1], src[2]);
> break;
> case nir_op_ibitfield_extract:
> --
> 2.9.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-04 Thread Ilia Mirkin
Ah OK. So llvm.fmuladd is more like llvm.fmadontcare. Wrong assumption
on my part.

On Wed, Oct 4, 2017 at 1:00 PM, Connor Abbott  wrote:
> No. From the LLVM langref:
>
> The ‘llvm.fmuladd.*‘ intrinsic functions represent multiply-add
> expressions that can be fused if the code generator determines that
> (a) the target instruction set has support for a fused operation, and
> (b) that the fused operation is more efficient than the equivalent,
> separate pair of mul and add instructions.
>
> The (b) part is especially important -- it says that LLVM can pick and
> choose which fmuladd intrinsics to turn into FMA instructions, or
> unfused MULADD instructions, or just a sequence of mul+add. For
> example, if many instructions call fmuladd with the first two
> arguments the same, it can break it up into a mul followed by a bunch
> of adds. That wouldn't be ok under the GLSL precise semantics
> (assuming the target would've used FMA otherwise, which I think some
> GCN cards will do).
>
> Also, and maybe more importantly, if an app developer explicitly asks
> for fma() with a precise modifier, it's probably not a great idea to
> then give them an unfused mul+add -- it's legal, thanks to GLSL's
> weasel-wording, but probably not what you really want, on HW which
> actually does have an FMA instruction :)
>
> Connor
>
>
> On Wed, Oct 4, 2017 at 11:25 AM, Ilia Mirkin  wrote:
>> Wouldn't this guarantee that nothing is fused (and thus fine)?
>> Presumably fmuladd always does mul+add either as 1 or 2 instructions?
>>
>> On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbott  wrote:
>>> If the fma has the exact flag, then we need to use the llvm.fma
>>> intrinsic. These come from fma() calls with the precise or invariant
>>> qualifiers in GLSL, where you basically have to fuse everything or
>>> fuse nothing consistently, and llvm.fmuladd doesn't guarantee that.
>>>
>>> On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie  wrote:
 From: Dave Airlie 

 For Vulkan SPIR-V the spec states
 fma() Inherited from OpFMul followed by OpFAdd.

 Matt says the backend will do the right thing depending on the
 hardware being compiled for, if you use the fmuladd intrinsic.

 Using the Mad Max pts test, on high settings at 4K:
 CHP: 55->60
 HGDD: 46->50
 LM: 55->60
 No change on Stronghold.

 Thanks to Feral for spending the time to track this down.

 Signed-off-by: Dave Airlie 
 ---
  src/amd/common/ac_nir_to_llvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/amd/common/ac_nir_to_llvm.c 
 b/src/amd/common/ac_nir_to_llvm.c
 index d7b6259..11ba487 100644
 --- a/src/amd/common/ac_nir_to_llvm.c
 +++ b/src/amd/common/ac_nir_to_llvm.c
 @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, 
 const nir_alu_instr *instr)
   result);
 break;
 case nir_op_ffma:
 -   result = emit_intrin_3f_param(>ac, "llvm.fma",
 +   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
   ac_to_float_type(>ac, 
 def_type), src[0], src[1], src[2]);
 break;
 case nir_op_ibitfield_extract:
 --
 2.9.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-04 Thread Connor Abbott
No. From the LLVM langref:

The ‘llvm.fmuladd.*‘ intrinsic functions represent multiply-add
expressions that can be fused if the code generator determines that
(a) the target instruction set has support for a fused operation, and
(b) that the fused operation is more efficient than the equivalent,
separate pair of mul and add instructions.

The (b) part is especially important -- it says that LLVM can pick and
choose which fmuladd intrinsics to turn into FMA instructions, or
unfused MULADD instructions, or just a sequence of mul+add. For
example, if many instructions call fmuladd with the first two
arguments the same, it can break it up into a mul followed by a bunch
of adds. That wouldn't be ok under the GLSL precise semantics
(assuming the target would've used FMA otherwise, which I think some
GCN cards will do).

Also, and maybe more importantly, if an app developer explicitly asks
for fma() with a precise modifier, it's probably not a great idea to
then give them an unfused mul+add -- it's legal, thanks to GLSL's
weasel-wording, but probably not what you really want, on HW which
actually does have an FMA instruction :)

Connor


On Wed, Oct 4, 2017 at 11:25 AM, Ilia Mirkin  wrote:
> Wouldn't this guarantee that nothing is fused (and thus fine)?
> Presumably fmuladd always does mul+add either as 1 or 2 instructions?
>
> On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbott  wrote:
>> If the fma has the exact flag, then we need to use the llvm.fma
>> intrinsic. These come from fma() calls with the precise or invariant
>> qualifiers in GLSL, where you basically have to fuse everything or
>> fuse nothing consistently, and llvm.fmuladd doesn't guarantee that.
>>
>> On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie  wrote:
>>> From: Dave Airlie 
>>>
>>> For Vulkan SPIR-V the spec states
>>> fma() Inherited from OpFMul followed by OpFAdd.
>>>
>>> Matt says the backend will do the right thing depending on the
>>> hardware being compiled for, if you use the fmuladd intrinsic.
>>>
>>> Using the Mad Max pts test, on high settings at 4K:
>>> CHP: 55->60
>>> HGDD: 46->50
>>> LM: 55->60
>>> No change on Stronghold.
>>>
>>> Thanks to Feral for spending the time to track this down.
>>>
>>> Signed-off-by: Dave Airlie 
>>> ---
>>>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/src/amd/common/ac_nir_to_llvm.c 
>>> b/src/amd/common/ac_nir_to_llvm.c
>>> index d7b6259..11ba487 100644
>>> --- a/src/amd/common/ac_nir_to_llvm.c
>>> +++ b/src/amd/common/ac_nir_to_llvm.c
>>> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, 
>>> const nir_alu_instr *instr)
>>>   result);
>>> break;
>>> case nir_op_ffma:
>>> -   result = emit_intrin_3f_param(>ac, "llvm.fma",
>>> +   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
>>>   ac_to_float_type(>ac, 
>>> def_type), src[0], src[1], src[2]);
>>> break;
>>> case nir_op_ibitfield_extract:
>>> --
>>> 2.9.4
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-04 Thread Ilia Mirkin
Wouldn't this guarantee that nothing is fused (and thus fine)?
Presumably fmuladd always does mul+add either as 1 or 2 instructions?

On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbott  wrote:
> If the fma has the exact flag, then we need to use the llvm.fma
> intrinsic. These come from fma() calls with the precise or invariant
> qualifiers in GLSL, where you basically have to fuse everything or
> fuse nothing consistently, and llvm.fmuladd doesn't guarantee that.
>
> On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie  wrote:
>> From: Dave Airlie 
>>
>> For Vulkan SPIR-V the spec states
>> fma() Inherited from OpFMul followed by OpFAdd.
>>
>> Matt says the backend will do the right thing depending on the
>> hardware being compiled for, if you use the fmuladd intrinsic.
>>
>> Using the Mad Max pts test, on high settings at 4K:
>> CHP: 55->60
>> HGDD: 46->50
>> LM: 55->60
>> No change on Stronghold.
>>
>> Thanks to Feral for spending the time to track this down.
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/amd/common/ac_nir_to_llvm.c 
>> b/src/amd/common/ac_nir_to_llvm.c
>> index d7b6259..11ba487 100644
>> --- a/src/amd/common/ac_nir_to_llvm.c
>> +++ b/src/amd/common/ac_nir_to_llvm.c
>> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, 
>> const nir_alu_instr *instr)
>>   result);
>> break;
>> case nir_op_ffma:
>> -   result = emit_intrin_3f_param(>ac, "llvm.fma",
>> +   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
>>   ac_to_float_type(>ac, 
>> def_type), src[0], src[1], src[2]);
>> break;
>> case nir_op_ibitfield_extract:
>> --
>> 2.9.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-04 Thread Connor Abbott
If the fma has the exact flag, then we need to use the llvm.fma
intrinsic. These come from fma() calls with the precise or invariant
qualifiers in GLSL, where you basically have to fuse everything or
fuse nothing consistently, and llvm.fmuladd doesn't guarantee that.

On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> For Vulkan SPIR-V the spec states
> fma() Inherited from OpFMul followed by OpFAdd.
>
> Matt says the backend will do the right thing depending on the
> hardware being compiled for, if you use the fmuladd intrinsic.
>
> Using the Mad Max pts test, on high settings at 4K:
> CHP: 55->60
> HGDD: 46->50
> LM: 55->60
> No change on Stronghold.
>
> Thanks to Feral for spending the time to track this down.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index d7b6259..11ba487 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const 
> nir_alu_instr *instr)
>   result);
> break;
> case nir_op_ffma:
> -   result = emit_intrin_3f_param(>ac, "llvm.fma",
> +   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
>   ac_to_float_type(>ac, 
> def_type), src[0], src[1], src[2]);
> break;
> case nir_op_ibitfield_extract:
> --
> 2.9.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-03 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On 4 Oct 2017 04:11, "Dave Airlie"  wrote:

From: Dave Airlie 

For Vulkan SPIR-V the spec states
fma() Inherited from OpFMul followed by OpFAdd.

Matt says the backend will do the right thing depending on the
hardware being compiled for, if you use the fmuladd intrinsic.

Using the Mad Max pts test, on high settings at 4K:
CHP: 55->60
HGDD: 46->50
LM: 55->60
No change on Stronghold.

Thanks to Feral for spending the time to track this down.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_
llvm.c
index d7b6259..11ba487 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx,
const nir_alu_instr *instr)
  result);
break;
case nir_op_ffma:
-   result = emit_intrin_3f_param(>ac, "llvm.fma",
+   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
  ac_to_float_type(>ac,
def_type), src[0], src[1], src[2]);
break;
case nir_op_ibitfield_extract:
--
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-03 Thread Dave Airlie
From: Dave Airlie 

For Vulkan SPIR-V the spec states
fma() Inherited from OpFMul followed by OpFAdd.

Matt says the backend will do the right thing depending on the
hardware being compiled for, if you use the fmuladd intrinsic.

Using the Mad Max pts test, on high settings at 4K:
CHP: 55->60
HGDD: 46->50
LM: 55->60
No change on Stronghold.

Thanks to Feral for spending the time to track this down.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index d7b6259..11ba487 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const 
nir_alu_instr *instr)
  result);
break;
case nir_op_ffma:
-   result = emit_intrin_3f_param(>ac, "llvm.fma",
+   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
  ac_to_float_type(>ac, 
def_type), src[0], src[1], src[2]);
break;
case nir_op_ibitfield_extract:
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev