Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
> On Oct 4, 2017, at 12:50, Marek Olšákwrote: > > The LLVM backends selects MAD (unfused) for fmuladd, and FMA (fused) for fma. For f64 and f16 by default it will emit an FMA since mad doesn’t support denorms.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
On Wed, Oct 4, 2017 at 7:35 PM, Ilia Mirkinwrote: > Ah OK. So llvm.fmuladd is more like llvm.fmadontcare. Wrong assumption > on my part. The LLVM backends selects MAD (unfused) for fmuladd, and FMA (fused) for fma. Personally I would prefer having separate fmul and fadd in LLVM IR instead of the intrinsic. Marek > > On Wed, Oct 4, 2017 at 1:00 PM, Connor Abbott wrote: >> No. From the LLVM langref: >> >> The ‘llvm.fmuladd.*‘ intrinsic functions represent multiply-add >> expressions that can be fused if the code generator determines that >> (a) the target instruction set has support for a fused operation, and >> (b) that the fused operation is more efficient than the equivalent, >> separate pair of mul and add instructions. >> >> The (b) part is especially important -- it says that LLVM can pick and >> choose which fmuladd intrinsics to turn into FMA instructions, or >> unfused MULADD instructions, or just a sequence of mul+add. For >> example, if many instructions call fmuladd with the first two >> arguments the same, it can break it up into a mul followed by a bunch >> of adds. That wouldn't be ok under the GLSL precise semantics >> (assuming the target would've used FMA otherwise, which I think some >> GCN cards will do). >> >> Also, and maybe more importantly, if an app developer explicitly asks >> for fma() with a precise modifier, it's probably not a great idea to >> then give them an unfused mul+add -- it's legal, thanks to GLSL's >> weasel-wording, but probably not what you really want, on HW which >> actually does have an FMA instruction :) >> >> Connor >> >> >> On Wed, Oct 4, 2017 at 11:25 AM, Ilia Mirkin wrote: >>> Wouldn't this guarantee that nothing is fused (and thus fine)? >>> Presumably fmuladd always does mul+add either as 1 or 2 instructions? >>> >>> On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbott wrote: If the fma has the exact flag, then we need to use the llvm.fma intrinsic. These come from fma() calls with the precise or invariant qualifiers in GLSL, where you basically have to fuse everything or fuse nothing consistently, and llvm.fmuladd doesn't guarantee that. On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie wrote: > From: Dave Airlie > > For Vulkan SPIR-V the spec states > fma() Inherited from OpFMul followed by OpFAdd. > > Matt says the backend will do the right thing depending on the > hardware being compiled for, if you use the fmuladd intrinsic. > > Using the Mad Max pts test, on high settings at 4K: > CHP: 55->60 > HGDD: 46->50 > LM: 55->60 > No change on Stronghold. > > Thanks to Feral for spending the time to track this down. > > Signed-off-by: Dave Airlie > --- > src/amd/common/ac_nir_to_llvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amd/common/ac_nir_to_llvm.c > b/src/amd/common/ac_nir_to_llvm.c > index d7b6259..11ba487 100644 > --- a/src/amd/common/ac_nir_to_llvm.c > +++ b/src/amd/common/ac_nir_to_llvm.c > @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, > const nir_alu_instr *instr) > result); > break; > case nir_op_ffma: > - result = emit_intrin_3f_param(>ac, "llvm.fma", > + result = emit_intrin_3f_param(>ac, "llvm.fmuladd", > ac_to_float_type(>ac, > def_type), src[0], src[1], src[2]); > break; > case nir_op_ibitfield_extract: > -- > 2.9.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
Ah OK. So llvm.fmuladd is more like llvm.fmadontcare. Wrong assumption on my part. On Wed, Oct 4, 2017 at 1:00 PM, Connor Abbottwrote: > No. From the LLVM langref: > > The ‘llvm.fmuladd.*‘ intrinsic functions represent multiply-add > expressions that can be fused if the code generator determines that > (a) the target instruction set has support for a fused operation, and > (b) that the fused operation is more efficient than the equivalent, > separate pair of mul and add instructions. > > The (b) part is especially important -- it says that LLVM can pick and > choose which fmuladd intrinsics to turn into FMA instructions, or > unfused MULADD instructions, or just a sequence of mul+add. For > example, if many instructions call fmuladd with the first two > arguments the same, it can break it up into a mul followed by a bunch > of adds. That wouldn't be ok under the GLSL precise semantics > (assuming the target would've used FMA otherwise, which I think some > GCN cards will do). > > Also, and maybe more importantly, if an app developer explicitly asks > for fma() with a precise modifier, it's probably not a great idea to > then give them an unfused mul+add -- it's legal, thanks to GLSL's > weasel-wording, but probably not what you really want, on HW which > actually does have an FMA instruction :) > > Connor > > > On Wed, Oct 4, 2017 at 11:25 AM, Ilia Mirkin wrote: >> Wouldn't this guarantee that nothing is fused (and thus fine)? >> Presumably fmuladd always does mul+add either as 1 or 2 instructions? >> >> On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbott wrote: >>> If the fma has the exact flag, then we need to use the llvm.fma >>> intrinsic. These come from fma() calls with the precise or invariant >>> qualifiers in GLSL, where you basically have to fuse everything or >>> fuse nothing consistently, and llvm.fmuladd doesn't guarantee that. >>> >>> On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie wrote: From: Dave Airlie For Vulkan SPIR-V the spec states fma() Inherited from OpFMul followed by OpFAdd. Matt says the backend will do the right thing depending on the hardware being compiled for, if you use the fmuladd intrinsic. Using the Mad Max pts test, on high settings at 4K: CHP: 55->60 HGDD: 46->50 LM: 55->60 No change on Stronghold. Thanks to Feral for spending the time to track this down. Signed-off-by: Dave Airlie --- src/amd/common/ac_nir_to_llvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index d7b6259..11ba487 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr) result); break; case nir_op_ffma: - result = emit_intrin_3f_param(>ac, "llvm.fma", + result = emit_intrin_3f_param(>ac, "llvm.fmuladd", ac_to_float_type(>ac, def_type), src[0], src[1], src[2]); break; case nir_op_ibitfield_extract: -- 2.9.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
No. From the LLVM langref: The ‘llvm.fmuladd.*‘ intrinsic functions represent multiply-add expressions that can be fused if the code generator determines that (a) the target instruction set has support for a fused operation, and (b) that the fused operation is more efficient than the equivalent, separate pair of mul and add instructions. The (b) part is especially important -- it says that LLVM can pick and choose which fmuladd intrinsics to turn into FMA instructions, or unfused MULADD instructions, or just a sequence of mul+add. For example, if many instructions call fmuladd with the first two arguments the same, it can break it up into a mul followed by a bunch of adds. That wouldn't be ok under the GLSL precise semantics (assuming the target would've used FMA otherwise, which I think some GCN cards will do). Also, and maybe more importantly, if an app developer explicitly asks for fma() with a precise modifier, it's probably not a great idea to then give them an unfused mul+add -- it's legal, thanks to GLSL's weasel-wording, but probably not what you really want, on HW which actually does have an FMA instruction :) Connor On Wed, Oct 4, 2017 at 11:25 AM, Ilia Mirkinwrote: > Wouldn't this guarantee that nothing is fused (and thus fine)? > Presumably fmuladd always does mul+add either as 1 or 2 instructions? > > On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbott wrote: >> If the fma has the exact flag, then we need to use the llvm.fma >> intrinsic. These come from fma() calls with the precise or invariant >> qualifiers in GLSL, where you basically have to fuse everything or >> fuse nothing consistently, and llvm.fmuladd doesn't guarantee that. >> >> On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie wrote: >>> From: Dave Airlie >>> >>> For Vulkan SPIR-V the spec states >>> fma() Inherited from OpFMul followed by OpFAdd. >>> >>> Matt says the backend will do the right thing depending on the >>> hardware being compiled for, if you use the fmuladd intrinsic. >>> >>> Using the Mad Max pts test, on high settings at 4K: >>> CHP: 55->60 >>> HGDD: 46->50 >>> LM: 55->60 >>> No change on Stronghold. >>> >>> Thanks to Feral for spending the time to track this down. >>> >>> Signed-off-by: Dave Airlie >>> --- >>> src/amd/common/ac_nir_to_llvm.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/src/amd/common/ac_nir_to_llvm.c >>> b/src/amd/common/ac_nir_to_llvm.c >>> index d7b6259..11ba487 100644 >>> --- a/src/amd/common/ac_nir_to_llvm.c >>> +++ b/src/amd/common/ac_nir_to_llvm.c >>> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, >>> const nir_alu_instr *instr) >>> result); >>> break; >>> case nir_op_ffma: >>> - result = emit_intrin_3f_param(>ac, "llvm.fma", >>> + result = emit_intrin_3f_param(>ac, "llvm.fmuladd", >>> ac_to_float_type(>ac, >>> def_type), src[0], src[1], src[2]); >>> break; >>> case nir_op_ibitfield_extract: >>> -- >>> 2.9.4 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
Wouldn't this guarantee that nothing is fused (and thus fine)? Presumably fmuladd always does mul+add either as 1 or 2 instructions? On Wed, Oct 4, 2017 at 10:57 AM, Connor Abbottwrote: > If the fma has the exact flag, then we need to use the llvm.fma > intrinsic. These come from fma() calls with the precise or invariant > qualifiers in GLSL, where you basically have to fuse everything or > fuse nothing consistently, and llvm.fmuladd doesn't guarantee that. > > On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie wrote: >> From: Dave Airlie >> >> For Vulkan SPIR-V the spec states >> fma() Inherited from OpFMul followed by OpFAdd. >> >> Matt says the backend will do the right thing depending on the >> hardware being compiled for, if you use the fmuladd intrinsic. >> >> Using the Mad Max pts test, on high settings at 4K: >> CHP: 55->60 >> HGDD: 46->50 >> LM: 55->60 >> No change on Stronghold. >> >> Thanks to Feral for spending the time to track this down. >> >> Signed-off-by: Dave Airlie >> --- >> src/amd/common/ac_nir_to_llvm.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/amd/common/ac_nir_to_llvm.c >> b/src/amd/common/ac_nir_to_llvm.c >> index d7b6259..11ba487 100644 >> --- a/src/amd/common/ac_nir_to_llvm.c >> +++ b/src/amd/common/ac_nir_to_llvm.c >> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, >> const nir_alu_instr *instr) >> result); >> break; >> case nir_op_ffma: >> - result = emit_intrin_3f_param(>ac, "llvm.fma", >> + result = emit_intrin_3f_param(>ac, "llvm.fmuladd", >> ac_to_float_type(>ac, >> def_type), src[0], src[1], src[2]); >> break; >> case nir_op_ibitfield_extract: >> -- >> 2.9.4 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
If the fma has the exact flag, then we need to use the llvm.fma intrinsic. These come from fma() calls with the precise or invariant qualifiers in GLSL, where you basically have to fuse everything or fuse nothing consistently, and llvm.fmuladd doesn't guarantee that. On Tue, Oct 3, 2017 at 10:10 PM, Dave Airliewrote: > From: Dave Airlie > > For Vulkan SPIR-V the spec states > fma() Inherited from OpFMul followed by OpFAdd. > > Matt says the backend will do the right thing depending on the > hardware being compiled for, if you use the fmuladd intrinsic. > > Using the Mad Max pts test, on high settings at 4K: > CHP: 55->60 > HGDD: 46->50 > LM: 55->60 > No change on Stronghold. > > Thanks to Feral for spending the time to track this down. > > Signed-off-by: Dave Airlie > --- > src/amd/common/ac_nir_to_llvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c > index d7b6259..11ba487 100644 > --- a/src/amd/common/ac_nir_to_llvm.c > +++ b/src/amd/common/ac_nir_to_llvm.c > @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const > nir_alu_instr *instr) > result); > break; > case nir_op_ffma: > - result = emit_intrin_3f_param(>ac, "llvm.fma", > + result = emit_intrin_3f_param(>ac, "llvm.fmuladd", > ac_to_float_type(>ac, > def_type), src[0], src[1], src[2]); > break; > case nir_op_ibitfield_extract: > -- > 2.9.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
Reviewed-by: Bas NieuwenhuizenOn 4 Oct 2017 04:11, "Dave Airlie" wrote: From: Dave Airlie For Vulkan SPIR-V the spec states fma() Inherited from OpFMul followed by OpFAdd. Matt says the backend will do the right thing depending on the hardware being compiled for, if you use the fmuladd intrinsic. Using the Mad Max pts test, on high settings at 4K: CHP: 55->60 HGDD: 46->50 LM: 55->60 No change on Stronghold. Thanks to Feral for spending the time to track this down. Signed-off-by: Dave Airlie --- src/amd/common/ac_nir_to_llvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_ llvm.c index d7b6259..11ba487 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr) result); break; case nir_op_ffma: - result = emit_intrin_3f_param(>ac, "llvm.fma", + result = emit_intrin_3f_param(>ac, "llvm.fmuladd", ac_to_float_type(>ac, def_type), src[0], src[1], src[2]); break; case nir_op_ibitfield_extract: -- 2.9.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.
From: Dave AirlieFor Vulkan SPIR-V the spec states fma() Inherited from OpFMul followed by OpFAdd. Matt says the backend will do the right thing depending on the hardware being compiled for, if you use the fmuladd intrinsic. Using the Mad Max pts test, on high settings at 4K: CHP: 55->60 HGDD: 46->50 LM: 55->60 No change on Stronghold. Thanks to Feral for spending the time to track this down. Signed-off-by: Dave Airlie --- src/amd/common/ac_nir_to_llvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index d7b6259..11ba487 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr) result); break; case nir_op_ffma: - result = emit_intrin_3f_param(>ac, "llvm.fma", + result = emit_intrin_3f_param(>ac, "llvm.fmuladd", ac_to_float_type(>ac, def_type), src[0], src[1], src[2]); break; case nir_op_ibitfield_extract: -- 2.9.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev