[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-17 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

a4451d88ee456304c26d552749aea6a7f5154bde 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-17 Thread Cameron McInally via Phabricator via cfe-commits
cameron.mcinally accepted this revision.
cameron.mcinally added a comment.

LGTM too. Would be good if an expert reviewed the target-specific changes, but 
they seem safe enough either way.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-17 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

I don't know if there were other reviewers who haven't commented on how you 
addressed their concerns, but this looks good to me.

Thanks for taking the time to improve our handling of this!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2757
   // subsequent options conflict then emit warning diagnostic.
+  // TODO: How should this interact with DenormalFP32Math?
   if (HonorINFs && HonorNaNs &&

I think this should just follow along with DenormalFPMath, but I'll put this 
off to the later patch since this one still is slightly awkward trying to avoid 
changing the meaning of the absence of the option


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 238670.
arsenm added a comment.

Forgot clang parts


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/CC1Options.td
  clang/include/clang/Driver/Options.td
  clang/include/clang/Driver/ToolChain.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDA/flush-denormals.cu
  clang/test/CodeGenCUDA/propagate-metadata.cu
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/denorms-are-zero.cl
  clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl
  clang/test/Driver/cl-denorms-are-zero.cl
  clang/test/Driver/cuda-flush-denormals-to-zero.cu
  clang/test/Driver/denormal-fp-math.c
  clang/test/Driver/opencl.cl
  llvm/docs/LangRef.rst
  llvm/lib/CodeGen/MachineFunction.cpp
  llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
  llvm/test/CodeGen/NVPTX/fast-math.ll
  llvm/test/CodeGen/NVPTX/math-intrins.ll
  llvm/test/CodeGen/NVPTX/sqrt-approx.ll
  llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

Index: llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
===
--- llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -5,11 +5,11 @@
 ; hackery:
 
 ; RUN: cat %s > %t.ftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "true" }' >> %t.ftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
 ; RUN: opt < %t.ftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=FTZ
 
 ; RUN: cat %s > %t.noftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "false" }' >> %t.noftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
 ; RUN: opt < %t.noftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ
 
 ; We handle nvvm intrinsics with ftz variants as follows:
Index: llvm/test/CodeGen/NVPTX/sqrt-approx.ll
===
--- llvm/test/CodeGen/NVPTX/sqrt-approx.ll
+++ llvm/test/CodeGen/NVPTX/sqrt-approx.ll
@@ -146,5 +146,5 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
 attributes #2 = { "reciprocal-estimates" = "rsqrtf:1,rsqrtd:1,sqrtf:1,sqrtd:1" }
Index: llvm/test/CodeGen/NVPTX/math-intrins.ll
===
--- llvm/test/CodeGen/NVPTX/math-intrins.ll
+++ llvm/test/CodeGen/NVPTX/math-intrins.ll
@@ -289,4 +289,4 @@
 }
 
 attributes #0 = { nounwind readnone }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/test/CodeGen/NVPTX/fast-math.ll
===
--- llvm/test/CodeGen/NVPTX/fast-math.ll
+++ llvm/test/CodeGen/NVPTX/fast-math.ll
@@ -162,4 +162,4 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
===
--- llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -15,6 +15,7 @@
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/STLExtras.h"
@@ -1709,9 +1710,10 @@
   // intrinsic, we don't have to look up any module metadata, as
   // FtzRequirementTy will be FTZ_Any.)
   if (Action.FtzRequirement != FTZ_Any) {
-bool FtzEnabled =
-II->getFunction()->getFnAttribute("nvptx-f32ftz").getValueAsString() ==
-"true";
+StringRef Attr = II->getFunction()
+ ->getFnAttribute("denormal-fp-math-f32")
+ .getValueAsString();
+bool FtzEnabled = parseDenormalFPAttribute(Attr) != DenormalMode::IEEE;
 
 if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
   return nullptr;
Index: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
===
--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -121,14 +121,10 @@
   if 

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-16 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2311
+  bool TrappingMath = true;
 // overriden by ffp-exception-behavior?
   bool RoundingFPMath = false;

arsenm wrote:
> cameron.mcinally wrote:
> > Last line of comment was not removed.
> > 
> > Also, is it safe to remove `TrappingMathPresent`? Is that part of the 
> > work-in-progress to support `ffp-exception-behavior`?
> I think this is a rebase gone bad. The patch changing the strict math was 
> revered and recommitted and I probably broke this
Looks like this is still wrong. You didn't intend to change either TrappingMath 
flag, did you?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2657
   // -fno_unsafe_math_optimizations restores default denormal handling
-  DenormalFPMath = "";
+  DenormalFPMath = DefaultDenormalFPMath;
   break;

Shouldn't this also restore DenormalFP32Math to its default value?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.
Herald added a subscriber: kerbowa.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 237665.
arsenm added a comment.

Mention support in langref


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/CC1Options.td
  clang/include/clang/Driver/Options.td
  clang/include/clang/Driver/ToolChain.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDA/flush-denormals.cu
  clang/test/CodeGenCUDA/propagate-metadata.cu
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/denorms-are-zero.cl
  clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl
  clang/test/Driver/cl-denorms-are-zero.cl
  clang/test/Driver/cuda-flush-denormals-to-zero.cu
  clang/test/Driver/denormal-fp-math.c
  clang/test/Driver/opencl.cl
  llvm/docs/LangRef.rst
  llvm/lib/CodeGen/MachineFunction.cpp
  llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
  llvm/test/CodeGen/NVPTX/fast-math.ll
  llvm/test/CodeGen/NVPTX/math-intrins.ll
  llvm/test/CodeGen/NVPTX/sqrt-approx.ll
  llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

Index: llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
===
--- llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -5,11 +5,11 @@
 ; hackery:
 
 ; RUN: cat %s > %t.ftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "true" }' >> %t.ftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
 ; RUN: opt < %t.ftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=FTZ
 
 ; RUN: cat %s > %t.noftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "false" }' >> %t.noftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
 ; RUN: opt < %t.noftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ
 
 ; We handle nvvm intrinsics with ftz variants as follows:
Index: llvm/test/CodeGen/NVPTX/sqrt-approx.ll
===
--- llvm/test/CodeGen/NVPTX/sqrt-approx.ll
+++ llvm/test/CodeGen/NVPTX/sqrt-approx.ll
@@ -146,5 +146,5 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
 attributes #2 = { "reciprocal-estimates" = "rsqrtf:1,rsqrtd:1,sqrtf:1,sqrtd:1" }
Index: llvm/test/CodeGen/NVPTX/math-intrins.ll
===
--- llvm/test/CodeGen/NVPTX/math-intrins.ll
+++ llvm/test/CodeGen/NVPTX/math-intrins.ll
@@ -289,4 +289,4 @@
 }
 
 attributes #0 = { nounwind readnone }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/test/CodeGen/NVPTX/fast-math.ll
===
--- llvm/test/CodeGen/NVPTX/fast-math.ll
+++ llvm/test/CodeGen/NVPTX/fast-math.ll
@@ -162,4 +162,4 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
===
--- llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -15,6 +15,7 @@
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/STLExtras.h"
@@ -1709,9 +1710,10 @@
   // intrinsic, we don't have to look up any module metadata, as
   // FtzRequirementTy will be FTZ_Any.)
   if (Action.FtzRequirement != FTZ_Any) {
-bool FtzEnabled =
-II->getFunction()->getFnAttribute("nvptx-f32ftz").getValueAsString() ==
-"true";
+StringRef Attr = II->getFunction()
+ ->getFnAttribute("denormal-fp-math-f32")
+ .getValueAsString();
+bool FtzEnabled = parseDenormalFPAttribute(Attr) != DenormalMode::IEEE;
 
 if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
   return nullptr;
Index: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
===
--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -121,14 +121,10 @@
   if 

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-10 Thread Cameron McInally via Phabricator via cfe-commits
cameron.mcinally added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

arsenm wrote:
> andrew.w.kaylor wrote:
> > cameron.mcinally wrote:
> > > andrew.w.kaylor wrote:
> > > > arsenm wrote:
> > > > > andrew.w.kaylor wrote:
> > > > > > arsenm wrote:
> > > > > > > andrew.w.kaylor wrote:
> > > > > > > > Can you document which targets do support the option? What 
> > > > > > > > happens if I try to use the option on a target where it is not 
> > > > > > > > supported?
> > > > > > > I'm not sure where to document this, or if/how/where to diagnose 
> > > > > > > it. I don't think the high level LangRef description is the right 
> > > > > > > place to discuss specific target handling.
> > > > > > > 
> > > > > > > Currently it won't error or anything. Code checking the denorm 
> > > > > > > mode will see the f32 specific mode, even if the target in the 
> > > > > > > end isn't really going to respect this.
> > > > > > > 
> > > > > > > One problem is this potentially does require coordination with 
> > > > > > > other toolchain components. For AMDGPU, the compiler can directly 
> > > > > > > tell the driver what FP mode to set on each entry point, but for 
> > > > > > > x86 it requires linking in crtfastmath to set the default mode 
> > > > > > > bits. If another target had a similar runtime environment 
> > > > > > > requirement, I don't think we can be sure the attribute is 
> > > > > > > correct or not.
> > > > > > There is precedent for describing target-specific behavior in 
> > > > > > LangRef. It just doesn't seem useful to say that not all targets 
> > > > > > support the attribute without saying which ones do. We should also 
> > > > > > say what is expected if a target doesn't support the attribute. It 
> > > > > > seems reasonable for the function attribute to be silently ignored.
> > > > > > 
> > > > > > > One problem is this potentially does require coordination with 
> > > > > > > other toolchain components. For AMDGPU, the compiler can directly 
> > > > > > > tell the driver what FP mode to set on each entry point, but for 
> > > > > > > x86 it requires linking in crtfastmath to set the default mode 
> > > > > > > bits.
> > > > > > 
> > > > > > This is a point I'm interested in. I don't like the current 
> > > > > > crtfastmath.o handling. It feels almost accidental when FTZ works 
> > > > > > as expected. My understanding is we link crtfastmath.o if we find 
> > > > > > it but if not everything just goes about its business. The Intel 
> > > > > > compiler injects code into main() to explicitly set the FTZ/DAZ 
> > > > > > control modes. That obviously has problems too, but it's at least 
> > > > > > consistent and predictable. As I understand it, crtfastmath.o sets 
> > > > > > these modes from a static initializer, but I'm not sure anything is 
> > > > > > done to determine the order of that initializer relative to others.
> > > > > > 
> > > > > > How does the compiler identify entry points for AMDGPU? And does it 
> > > > > > emit code to set FTZ based on the function attribute here?
> > > > > The entry points are a specific calling convention. There's no real 
> > > > > concept of main. Each kernel has an associated blob of metadata the 
> > > > > driver uses to set up various config registers on dispatch.
> > > > > 
> > > > > I don't think specially recognizing main in the compiler is 
> > > > > fundamentally different than having it done in a static constructor. 
> > > > > It's still a construct not associated with any particular function or 
> > > > > anything.
> > > > The problem with having it done in a static constructor is that you 
> > > > have no certainty of when it will be done relative to other static 
> > > > constructors. If it's in main you can at least say that it's after all 
> > > > the static constructors (assuming main is your entry point).
> > > Yes and no. The linker should honor static constructor priorities. But, 
> > > yeah, there's no guarantee that this constructor will run before other 
> > > priority 101 constructors.
> > > 
> > > The performance penalty for setting denormal flushing in main could be 
> > > significant (think C++). Also, there's precedent for using static 
> > > constructors, like GCC's crtfastmath.o.
> > Fair enough. I don't necessarily like how icc handles this. I don't have a 
> > problem with how gcc handles it. I just really don't like how LLVM does it. 
> > If we want to take the static constructor approach we should define our 
> > own, not depend on whether or not the GNU object file happens to be around.
> > 
> > Static initialization doesn't help for AMDGPU, and I suppose that's likely 
> > to be the case for any offload execution model. Since this patch is moving 
> > us toward a more consistent implementation I'm wondering if we can define 
> > some general rules 

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-10 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

andrew.w.kaylor wrote:
> cameron.mcinally wrote:
> > andrew.w.kaylor wrote:
> > > arsenm wrote:
> > > > andrew.w.kaylor wrote:
> > > > > arsenm wrote:
> > > > > > andrew.w.kaylor wrote:
> > > > > > > Can you document which targets do support the option? What 
> > > > > > > happens if I try to use the option on a target where it is not 
> > > > > > > supported?
> > > > > > I'm not sure where to document this, or if/how/where to diagnose 
> > > > > > it. I don't think the high level LangRef description is the right 
> > > > > > place to discuss specific target handling.
> > > > > > 
> > > > > > Currently it won't error or anything. Code checking the denorm mode 
> > > > > > will see the f32 specific mode, even if the target in the end isn't 
> > > > > > really going to respect this.
> > > > > > 
> > > > > > One problem is this potentially does require coordination with 
> > > > > > other toolchain components. For AMDGPU, the compiler can directly 
> > > > > > tell the driver what FP mode to set on each entry point, but for 
> > > > > > x86 it requires linking in crtfastmath to set the default mode 
> > > > > > bits. If another target had a similar runtime environment 
> > > > > > requirement, I don't think we can be sure the attribute is correct 
> > > > > > or not.
> > > > > There is precedent for describing target-specific behavior in 
> > > > > LangRef. It just doesn't seem useful to say that not all targets 
> > > > > support the attribute without saying which ones do. We should also 
> > > > > say what is expected if a target doesn't support the attribute. It 
> > > > > seems reasonable for the function attribute to be silently ignored.
> > > > > 
> > > > > > One problem is this potentially does require coordination with 
> > > > > > other toolchain components. For AMDGPU, the compiler can directly 
> > > > > > tell the driver what FP mode to set on each entry point, but for 
> > > > > > x86 it requires linking in crtfastmath to set the default mode bits.
> > > > > 
> > > > > This is a point I'm interested in. I don't like the current 
> > > > > crtfastmath.o handling. It feels almost accidental when FTZ works as 
> > > > > expected. My understanding is we link crtfastmath.o if we find it but 
> > > > > if not everything just goes about its business. The Intel compiler 
> > > > > injects code into main() to explicitly set the FTZ/DAZ control modes. 
> > > > > That obviously has problems too, but it's at least consistent and 
> > > > > predictable. As I understand it, crtfastmath.o sets these modes from 
> > > > > a static initializer, but I'm not sure anything is done to determine 
> > > > > the order of that initializer relative to others.
> > > > > 
> > > > > How does the compiler identify entry points for AMDGPU? And does it 
> > > > > emit code to set FTZ based on the function attribute here?
> > > > The entry points are a specific calling convention. There's no real 
> > > > concept of main. Each kernel has an associated blob of metadata the 
> > > > driver uses to set up various config registers on dispatch.
> > > > 
> > > > I don't think specially recognizing main in the compiler is 
> > > > fundamentally different than having it done in a static constructor. 
> > > > It's still a construct not associated with any particular function or 
> > > > anything.
> > > The problem with having it done in a static constructor is that you have 
> > > no certainty of when it will be done relative to other static 
> > > constructors. If it's in main you can at least say that it's after all 
> > > the static constructors (assuming main is your entry point).
> > Yes and no. The linker should honor static constructor priorities. But, 
> > yeah, there's no guarantee that this constructor will run before other 
> > priority 101 constructors.
> > 
> > The performance penalty for setting denormal flushing in main could be 
> > significant (think C++). Also, there's precedent for using static 
> > constructors, like GCC's crtfastmath.o.
> Fair enough. I don't necessarily like how icc handles this. I don't have a 
> problem with how gcc handles it. I just really don't like how LLVM does it. 
> If we want to take the static constructor approach we should define our own, 
> not depend on whether or not the GNU object file happens to be around.
> 
> Static initialization doesn't help for AMDGPU, and I suppose that's likely to 
> be the case for any offload execution model. Since this patch is moving us 
> toward a more consistent implementation I'm wondering if we can define some 
> general rules for how this is supposed to work. Like when the function 
> attribute will result in injected instructions setting the control flags and 
> when it 

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-10 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

cameron.mcinally wrote:
> andrew.w.kaylor wrote:
> > arsenm wrote:
> > > andrew.w.kaylor wrote:
> > > > arsenm wrote:
> > > > > andrew.w.kaylor wrote:
> > > > > > Can you document which targets do support the option? What happens 
> > > > > > if I try to use the option on a target where it is not supported?
> > > > > I'm not sure where to document this, or if/how/where to diagnose it. 
> > > > > I don't think the high level LangRef description is the right place 
> > > > > to discuss specific target handling.
> > > > > 
> > > > > Currently it won't error or anything. Code checking the denorm mode 
> > > > > will see the f32 specific mode, even if the target in the end isn't 
> > > > > really going to respect this.
> > > > > 
> > > > > One problem is this potentially does require coordination with other 
> > > > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > > > driver what FP mode to set on each entry point, but for x86 it 
> > > > > requires linking in crtfastmath to set the default mode bits. If 
> > > > > another target had a similar runtime environment requirement, I don't 
> > > > > think we can be sure the attribute is correct or not.
> > > > There is precedent for describing target-specific behavior in LangRef. 
> > > > It just doesn't seem useful to say that not all targets support the 
> > > > attribute without saying which ones do. We should also say what is 
> > > > expected if a target doesn't support the attribute. It seems reasonable 
> > > > for the function attribute to be silently ignored.
> > > > 
> > > > > One problem is this potentially does require coordination with other 
> > > > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > > > driver what FP mode to set on each entry point, but for x86 it 
> > > > > requires linking in crtfastmath to set the default mode bits.
> > > > 
> > > > This is a point I'm interested in. I don't like the current 
> > > > crtfastmath.o handling. It feels almost accidental when FTZ works as 
> > > > expected. My understanding is we link crtfastmath.o if we find it but 
> > > > if not everything just goes about its business. The Intel compiler 
> > > > injects code into main() to explicitly set the FTZ/DAZ control modes. 
> > > > That obviously has problems too, but it's at least consistent and 
> > > > predictable. As I understand it, crtfastmath.o sets these modes from a 
> > > > static initializer, but I'm not sure anything is done to determine the 
> > > > order of that initializer relative to others.
> > > > 
> > > > How does the compiler identify entry points for AMDGPU? And does it 
> > > > emit code to set FTZ based on the function attribute here?
> > > The entry points are a specific calling convention. There's no real 
> > > concept of main. Each kernel has an associated blob of metadata the 
> > > driver uses to set up various config registers on dispatch.
> > > 
> > > I don't think specially recognizing main in the compiler is fundamentally 
> > > different than having it done in a static constructor. It's still a 
> > > construct not associated with any particular function or anything.
> > The problem with having it done in a static constructor is that you have no 
> > certainty of when it will be done relative to other static constructors. If 
> > it's in main you can at least say that it's after all the static 
> > constructors (assuming main is your entry point).
> Yes and no. The linker should honor static constructor priorities. But, yeah, 
> there's no guarantee that this constructor will run before other priority 101 
> constructors.
> 
> The performance penalty for setting denormal flushing in main could be 
> significant (think C++). Also, there's precedent for using static 
> constructors, like GCC's crtfastmath.o.
Fair enough. I don't necessarily like how icc handles this. I don't have a 
problem with how gcc handles it. I just really don't like how LLVM does it. If 
we want to take the static constructor approach we should define our own, not 
depend on whether or not the GNU object file happens to be around.

Static initialization doesn't help for AMDGPU, and I suppose that's likely to 
be the case for any offload execution model. Since this patch is moving us 
toward a more consistent implementation I'm wondering if we can define some 
general rules for how this is supposed to work. Like when the function 
attribute will result in injected instructions setting the control flags and 
when it won't.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-10 Thread Cameron McInally via Phabricator via cfe-commits
cameron.mcinally added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

andrew.w.kaylor wrote:
> arsenm wrote:
> > andrew.w.kaylor wrote:
> > > arsenm wrote:
> > > > andrew.w.kaylor wrote:
> > > > > Can you document which targets do support the option? What happens if 
> > > > > I try to use the option on a target where it is not supported?
> > > > I'm not sure where to document this, or if/how/where to diagnose it. I 
> > > > don't think the high level LangRef description is the right place to 
> > > > discuss specific target handling.
> > > > 
> > > > Currently it won't error or anything. Code checking the denorm mode 
> > > > will see the f32 specific mode, even if the target in the end isn't 
> > > > really going to respect this.
> > > > 
> > > > One problem is this potentially does require coordination with other 
> > > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > > driver what FP mode to set on each entry point, but for x86 it requires 
> > > > linking in crtfastmath to set the default mode bits. If another target 
> > > > had a similar runtime environment requirement, I don't think we can be 
> > > > sure the attribute is correct or not.
> > > There is precedent for describing target-specific behavior in LangRef. It 
> > > just doesn't seem useful to say that not all targets support the 
> > > attribute without saying which ones do. We should also say what is 
> > > expected if a target doesn't support the attribute. It seems reasonable 
> > > for the function attribute to be silently ignored.
> > > 
> > > > One problem is this potentially does require coordination with other 
> > > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > > driver what FP mode to set on each entry point, but for x86 it requires 
> > > > linking in crtfastmath to set the default mode bits.
> > > 
> > > This is a point I'm interested in. I don't like the current crtfastmath.o 
> > > handling. It feels almost accidental when FTZ works as expected. My 
> > > understanding is we link crtfastmath.o if we find it but if not 
> > > everything just goes about its business. The Intel compiler injects code 
> > > into main() to explicitly set the FTZ/DAZ control modes. That obviously 
> > > has problems too, but it's at least consistent and predictable. As I 
> > > understand it, crtfastmath.o sets these modes from a static initializer, 
> > > but I'm not sure anything is done to determine the order of that 
> > > initializer relative to others.
> > > 
> > > How does the compiler identify entry points for AMDGPU? And does it emit 
> > > code to set FTZ based on the function attribute here?
> > The entry points are a specific calling convention. There's no real concept 
> > of main. Each kernel has an associated blob of metadata the driver uses to 
> > set up various config registers on dispatch.
> > 
> > I don't think specially recognizing main in the compiler is fundamentally 
> > different than having it done in a static constructor. It's still a 
> > construct not associated with any particular function or anything.
> The problem with having it done in a static constructor is that you have no 
> certainty of when it will be done relative to other static constructors. If 
> it's in main you can at least say that it's after all the static constructors 
> (assuming main is your entry point).
Yes and no. The linker should honor static constructor priorities. But, yeah, 
there's no guarantee that this constructor will run before other priority 101 
constructors.

The performance penalty for setting denormal flushing in main could be 
significant (think C++). Also, there's precedent for using static constructors, 
like GCC's crtfastmath.o.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-09 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

arsenm wrote:
> andrew.w.kaylor wrote:
> > arsenm wrote:
> > > andrew.w.kaylor wrote:
> > > > Can you document which targets do support the option? What happens if I 
> > > > try to use the option on a target where it is not supported?
> > > I'm not sure where to document this, or if/how/where to diagnose it. I 
> > > don't think the high level LangRef description is the right place to 
> > > discuss specific target handling.
> > > 
> > > Currently it won't error or anything. Code checking the denorm mode will 
> > > see the f32 specific mode, even if the target in the end isn't really 
> > > going to respect this.
> > > 
> > > One problem is this potentially does require coordination with other 
> > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > driver what FP mode to set on each entry point, but for x86 it requires 
> > > linking in crtfastmath to set the default mode bits. If another target 
> > > had a similar runtime environment requirement, I don't think we can be 
> > > sure the attribute is correct or not.
> > There is precedent for describing target-specific behavior in LangRef. It 
> > just doesn't seem useful to say that not all targets support the attribute 
> > without saying which ones do. We should also say what is expected if a 
> > target doesn't support the attribute. It seems reasonable for the function 
> > attribute to be silently ignored.
> > 
> > > One problem is this potentially does require coordination with other 
> > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > driver what FP mode to set on each entry point, but for x86 it requires 
> > > linking in crtfastmath to set the default mode bits.
> > 
> > This is a point I'm interested in. I don't like the current crtfastmath.o 
> > handling. It feels almost accidental when FTZ works as expected. My 
> > understanding is we link crtfastmath.o if we find it but if not everything 
> > just goes about its business. The Intel compiler injects code into main() 
> > to explicitly set the FTZ/DAZ control modes. That obviously has problems 
> > too, but it's at least consistent and predictable. As I understand it, 
> > crtfastmath.o sets these modes from a static initializer, but I'm not sure 
> > anything is done to determine the order of that initializer relative to 
> > others.
> > 
> > How does the compiler identify entry points for AMDGPU? And does it emit 
> > code to set FTZ based on the function attribute here?
> The entry points are a specific calling convention. There's no real concept 
> of main. Each kernel has an associated blob of metadata the driver uses to 
> set up various config registers on dispatch.
> 
> I don't think specially recognizing main in the compiler is fundamentally 
> different than having it done in a static constructor. It's still a construct 
> not associated with any particular function or anything.
The problem with having it done in a static constructor is that you have no 
certainty of when it will be done relative to other static constructors. If 
it's in main you can at least say that it's after all the static constructors 
(assuming main is your entry point).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

andrew.w.kaylor wrote:
> arsenm wrote:
> > andrew.w.kaylor wrote:
> > > Can you document which targets do support the option? What happens if I 
> > > try to use the option on a target where it is not supported?
> > I'm not sure where to document this, or if/how/where to diagnose it. I 
> > don't think the high level LangRef description is the right place to 
> > discuss specific target handling.
> > 
> > Currently it won't error or anything. Code checking the denorm mode will 
> > see the f32 specific mode, even if the target in the end isn't really going 
> > to respect this.
> > 
> > One problem is this potentially does require coordination with other 
> > toolchain components. For AMDGPU, the compiler can directly tell the driver 
> > what FP mode to set on each entry point, but for x86 it requires linking in 
> > crtfastmath to set the default mode bits. If another target had a similar 
> > runtime environment requirement, I don't think we can be sure the attribute 
> > is correct or not.
> There is precedent for describing target-specific behavior in LangRef. It 
> just doesn't seem useful to say that not all targets support the attribute 
> without saying which ones do. We should also say what is expected if a target 
> doesn't support the attribute. It seems reasonable for the function attribute 
> to be silently ignored.
> 
> > One problem is this potentially does require coordination with other 
> > toolchain components. For AMDGPU, the compiler can directly tell the driver 
> > what FP mode to set on each entry point, but for x86 it requires linking in 
> > crtfastmath to set the default mode bits.
> 
> This is a point I'm interested in. I don't like the current crtfastmath.o 
> handling. It feels almost accidental when FTZ works as expected. My 
> understanding is we link crtfastmath.o if we find it but if not everything 
> just goes about its business. The Intel compiler injects code into main() to 
> explicitly set the FTZ/DAZ control modes. That obviously has problems too, 
> but it's at least consistent and predictable. As I understand it, 
> crtfastmath.o sets these modes from a static initializer, but I'm not sure 
> anything is done to determine the order of that initializer relative to 
> others.
> 
> How does the compiler identify entry points for AMDGPU? And does it emit code 
> to set FTZ based on the function attribute here?
The entry points are a specific calling convention. There's no real concept of 
main. Each kernel has an associated blob of metadata the driver uses to set up 
various config registers on dispatch.

I don't think specially recognizing main in the compiler is fundamentally 
different than having it done in a static constructor. It's still a construct 
not associated with any particular function or anything.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

arsenm wrote:
> andrew.w.kaylor wrote:
> > Can you document which targets do support the option? What happens if I try 
> > to use the option on a target where it is not supported?
> I'm not sure where to document this, or if/how/where to diagnose it. I don't 
> think the high level LangRef description is the right place to discuss 
> specific target handling.
> 
> Currently it won't error or anything. Code checking the denorm mode will see 
> the f32 specific mode, even if the target in the end isn't really going to 
> respect this.
> 
> One problem is this potentially does require coordination with other 
> toolchain components. For AMDGPU, the compiler can directly tell the driver 
> what FP mode to set on each entry point, but for x86 it requires linking in 
> crtfastmath to set the default mode bits. If another target had a similar 
> runtime environment requirement, I don't think we can be sure the attribute 
> is correct or not.
There is precedent for describing target-specific behavior in LangRef. It just 
doesn't seem useful to say that not all targets support the attribute without 
saying which ones do. We should also say what is expected if a target doesn't 
support the attribute. It seems reasonable for the function attribute to be 
silently ignored.

> One problem is this potentially does require coordination with other 
> toolchain components. For AMDGPU, the compiler can directly tell the driver 
> what FP mode to set on each entry point, but for x86 it requires linking in 
> crtfastmath to set the default mode bits.

This is a point I'm interested in. I don't like the current crtfastmath.o 
handling. It feels almost accidental when FTZ works as expected. My 
understanding is we link crtfastmath.o if we find it but if not everything just 
goes about its business. The Intel compiler injects code into main() to 
explicitly set the FTZ/DAZ control modes. That obviously has problems too, but 
it's at least consistent and predictable. As I understand it, crtfastmath.o 
sets these modes from a static initializer, but I'm not sure anything is done 
to determine the order of that initializer relative to others.

How does the compiler identify entry points for AMDGPU? And does it emit code 
to set FTZ based on the function attribute here?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

andrew.w.kaylor wrote:
> Can you document which targets do support the option? What happens if I try 
> to use the option on a target where it is not supported?
I'm not sure where to document this, or if/how/where to diagnose it. I don't 
think the high level LangRef description is the right place to discuss specific 
target handling.

Currently it won't error or anything. Code checking the denorm mode will see 
the f32 specific mode, even if the target in the end isn't really going to 
respect this.

One problem is this potentially does require coordination with other toolchain 
components. For AMDGPU, the compiler can directly tell the driver what FP mode 
to set on each entry point, but for x86 it requires linking in crtfastmath to 
set the default mode bits. If another target had a similar runtime environment 
requirement, I don't think we can be sure the attribute is correct or not.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

Can you document which targets do support the option? What happens if I try to 
use the option on a target where it is not supported?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Cameron McInally via Phabricator via cfe-commits
cameron.mcinally added a comment.

Ok, thanks for the clarifications. Looks good to me, but it would be good to 
have experts in OpenCL/Cuda/AMDGPU review the target specific changes.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D69878#1805804 , @arsenm wrote:

> In D69878#1801508 , 
> @cameron.mcinally wrote:
>
> > This is looking pretty good to me, but I'm ignoring some of the target 
> > specific code that I'm not familiar with.
> >
> > Is `denormal-fp-math` influenced by `-Ofast`? Or are there plans for that? 
> > Seems like `-Ofast` should imply DAZ and FTZ (if supported by target).
>
>
> Yes, through the toolchain handling. I copied the logic for when crtfastmath 
> is linked for the default mode for x86.
>
> > I think we discussed this before, but it's worth repeating. If 
> > `denormal-fp-math` isn't specified, we default to IEEE behavior, right? 
> > When this lands in master, there could be an unexpected performance hit for 
> > targets that aren't paying attention. E.g. I want to use `denormal-fp-math` 
> > to toggle whether a FSUB(-0.0,X) is converted to a FNEG(X) in 
> > SelectionDAGBuilder.
> > 
> > Apologies in advance if this has been discussed recently. I've been 
> > distracted with another project for the passed few months...
>
> Yes, ieee should be the default. The dependent patches start adding the 
> attribute by default for platforms with flushing enabled with fast math


To clarify this patch leaves the default and defers changing that to a later 
patch


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added a comment.

In D69878#1801508 , @cameron.mcinally 
wrote:

> This is looking pretty good to me, but I'm ignoring some of the target 
> specific code that I'm not familiar with.
>
> Is `denormal-fp-math` influenced by `-Ofast`? Or are there plans for that? 
> Seems like `-Ofast` should imply DAZ and FTZ (if supported by target).


Yes, through the toolchain handling. I copied the logic for when crtfastmath is 
linked for the default mode for x86.

> I think we discussed this before, but it's worth repeating. If 
> `denormal-fp-math` isn't specified, we default to IEEE behavior, right? When 
> this lands in master, there could be an unexpected performance hit for 
> targets that aren't paying attention. E.g. I want to use `denormal-fp-math` 
> to toggle whether a FSUB(-0.0,X) is converted to a FNEG(X) in 
> SelectionDAGBuilder.
> 
> Apologies in advance if this has been discussed recently. I've been 
> distracted with another project for the passed few months...

Yes, ieee should be the default. The dependent patches start adding the 
attribute by default for platforms with flushing enabled with fast math




Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2311
+  bool TrappingMath = true;
 // overriden by ffp-exception-behavior?
   bool RoundingFPMath = false;

cameron.mcinally wrote:
> Last line of comment was not removed.
> 
> Also, is it safe to remove `TrappingMathPresent`? Is that part of the 
> work-in-progress to support `ffp-exception-behavior`?
I think this is a rebase gone bad. The patch changing the strict math was 
revered and recommitted and I probably broke this


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-02 Thread Cameron McInally via Phabricator via cfe-commits
cameron.mcinally added a comment.

This is looking pretty good to me, but I'm ignoring some of the target specific 
code that I'm not familiar with.

Is `denormal-fp-math` influenced by `-Ofast`? Or are there plans for that? 
Seems like `-Ofast` should imply DAZ and FTZ (if supported by target).

I think we discussed this before, but it's worth repeating. If 
`denormal-fp-math` isn't specified, we default to IEEE behavior, right? When 
this lands in master, there could be an unexpected performance hit for targets 
that aren't paying attention. E.g. I want to use `denormal-fp-math` to toggle 
whether a FSUB(-0.0,X) is converted to a FNEG(X) in SelectionDAGBuilder.

Apologies in advance if this has been discussed recently. I've been distracted 
with another project for the passed few months...




Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2311
+  bool TrappingMath = true;
 // overriden by ffp-exception-behavior?
   bool RoundingFPMath = false;

Last line of comment was not removed.

Also, is it safe to remove `TrappingMathPresent`? Is that part of the 
work-in-progress to support `ffp-exception-behavior`?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-02 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-12-10 Thread Anastasia Stulova via Phabricator via cfe-commits
Anastasia added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:1775
-if (getLangOpts().OpenCL)
-  FuncAttrs.addAttribute("denorms-are-zero",
- llvm::toStringRef(CodeGenOpts.FlushDenorm));

arsenm wrote:
> Anastasia wrote:
> > arsenm wrote:
> > > Anastasia wrote:
> > > > so where would `denorms-are-zero` be emitted now (in case some out of 
> > > > tree implementations rely on this)?
> > > Rely on in what sense? Do you have a concrete use of this?
> > Since it has been emitted before in the module potentially some LLVM 
> > implementations could be using that attribute?
> I'm disinclined to leave things around just in case some unknown user might 
> have been using them. We've dropped attributes like this before (I think the 
> less-precise-fp-mad one for disuse). This also isn't needed for correctness, 
> so it should be pretty safe to drop
Ok, fair enough!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-12-10 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 233059.
arsenm added a comment.

Reword langref, fix name in langref


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/CC1Options.td
  clang/include/clang/Driver/Options.td
  clang/include/clang/Driver/ToolChain.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDA/flush-denormals.cu
  clang/test/CodeGenCUDA/propagate-metadata.cu
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/denorms-are-zero.cl
  clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl
  clang/test/Driver/cl-denorms-are-zero.cl
  clang/test/Driver/cuda-flush-denormals-to-zero.cu
  clang/test/Driver/denormal-fp-math.c
  clang/test/Driver/opencl.cl
  llvm/docs/LangRef.rst
  llvm/lib/CodeGen/MachineFunction.cpp
  llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
  llvm/test/CodeGen/NVPTX/fast-math.ll
  llvm/test/CodeGen/NVPTX/math-intrins.ll
  llvm/test/CodeGen/NVPTX/sqrt-approx.ll
  llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

Index: llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
===
--- llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -5,11 +5,11 @@
 ; hackery:
 
 ; RUN: cat %s > %t.ftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "true" }' >> %t.ftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
 ; RUN: opt < %t.ftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=FTZ
 
 ; RUN: cat %s > %t.noftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "false" }' >> %t.noftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
 ; RUN: opt < %t.noftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ
 
 ; We handle nvvm intrinsics with ftz variants as follows:
Index: llvm/test/CodeGen/NVPTX/sqrt-approx.ll
===
--- llvm/test/CodeGen/NVPTX/sqrt-approx.ll
+++ llvm/test/CodeGen/NVPTX/sqrt-approx.ll
@@ -146,5 +146,5 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
 attributes #2 = { "reciprocal-estimates" = "rsqrtf:1,rsqrtd:1,sqrtf:1,sqrtd:1" }
Index: llvm/test/CodeGen/NVPTX/math-intrins.ll
===
--- llvm/test/CodeGen/NVPTX/math-intrins.ll
+++ llvm/test/CodeGen/NVPTX/math-intrins.ll
@@ -289,4 +289,4 @@
 }
 
 attributes #0 = { nounwind readnone }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/test/CodeGen/NVPTX/fast-math.ll
===
--- llvm/test/CodeGen/NVPTX/fast-math.ll
+++ llvm/test/CodeGen/NVPTX/fast-math.ll
@@ -162,4 +162,4 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
===
--- llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -15,6 +15,7 @@
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/STLExtras.h"
@@ -1703,9 +1704,10 @@
   // intrinsic, we don't have to look up any module metadata, as
   // FtzRequirementTy will be FTZ_Any.)
   if (Action.FtzRequirement != FTZ_Any) {
-bool FtzEnabled =
-II->getFunction()->getFnAttribute("nvptx-f32ftz").getValueAsString() ==
-"true";
+StringRef Attr = II->getFunction()
+ ->getFnAttribute("denormal-fp-math-f32")
+ .getValueAsString();
+bool FtzEnabled = parseDenormalFPAttribute(Attr) != DenormalMode::IEEE;
 
 if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
   return nullptr;
Index: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
===
--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -120,14 +120,10 @@
   if 

[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-12-04 Thread Steve Canon via Phabricator via cfe-commits
scanon added inline comments.



Comment at: llvm/docs/LangRef.rst:1834
+``"denormal-fp-math-f32"``
+   Same as ``"denorm-fp-math-f32"``, except for float types. If both
+   are present, this overrides ``"denorm-fp-math"``.

Can you clarify this a little bit? I'd prefer something like "Same as 
``"denorm-fp-math"``, but only controls the behavior of the 32-bit float type.".


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-12-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:1775
-if (getLangOpts().OpenCL)
-  FuncAttrs.addAttribute("denorms-are-zero",
- llvm::toStringRef(CodeGenOpts.FlushDenorm));

Anastasia wrote:
> arsenm wrote:
> > Anastasia wrote:
> > > so where would `denorms-are-zero` be emitted now (in case some out of 
> > > tree implementations rely on this)?
> > Rely on in what sense? Do you have a concrete use of this?
> Since it has been emitted before in the module potentially some LLVM 
> implementations could be using that attribute?
I'm disinclined to leave things around just in case some unknown user might 
have been using them. We've dropped attributes like this before (I think the 
less-precise-fp-mad one for disuse). This also isn't needed for correctness, so 
it should be pretty safe to drop


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-26 Thread Anastasia Stulova via Phabricator via cfe-commits
Anastasia added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:1775
-if (getLangOpts().OpenCL)
-  FuncAttrs.addAttribute("denorms-are-zero",
- llvm::toStringRef(CodeGenOpts.FlushDenorm));

arsenm wrote:
> Anastasia wrote:
> > so where would `denorms-are-zero` be emitted now (in case some out of tree 
> > implementations rely on this)?
> Rely on in what sense? Do you have a concrete use of this?
Since it has been emitted before in the module potentially some LLVM 
implementations could be using that attribute?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:1775
-if (getLangOpts().OpenCL)
-  FuncAttrs.addAttribute("denorms-are-zero",
- llvm::toStringRef(CodeGenOpts.FlushDenorm));

Anastasia wrote:
> so where would `denorms-are-zero` be emitted now (in case some out of tree 
> implementations rely on this)?
Rely on in what sense? Do you have a concrete use of this?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-19 Thread Anastasia Stulova via Phabricator via cfe-commits
Anastasia added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:1775
-if (getLangOpts().OpenCL)
-  FuncAttrs.addAttribute("denorms-are-zero",
- llvm::toStringRef(CodeGenOpts.FlushDenorm));

so where would `denorms-are-zero` be emitted now (in case some out of tree 
implementations rely on this)?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-08 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as 
> I can tell the same is true for NVPTX (based on the attribute name).

I may be corrected, but I believe nvptx only supports ftz for f32.

> Double-precision instructions support subnormal inputs and results. 
> Single-precision instructions support subnormal inputs and results by default 
> for sm_20 and subsequent targets, and flush subnormal inputs and results to 
> sign-preserving zero for sm_1x targets. The optional .ftz modifier on 
> single-precision instructions provides backward compatibility with sm_1x 
> targets by flushing subnormal inputs and results to sign-preserving zero 
> regardless of the target architecture.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-07 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 228348.
arsenm added a comment.

Fix name in documentation


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/CC1Options.td
  clang/include/clang/Driver/Options.td
  clang/include/clang/Driver/ToolChain.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDA/flush-denormals.cu
  clang/test/CodeGenCUDA/propagate-metadata.cu
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/denorms-are-zero.cl
  clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl
  clang/test/Driver/cl-denorms-are-zero.cl
  clang/test/Driver/cuda-flush-denormals-to-zero.cu
  clang/test/Driver/denormal-fp-math.c
  clang/test/Driver/opencl.cl
  llvm/docs/LangRef.rst
  llvm/lib/CodeGen/MachineFunction.cpp
  llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
  llvm/test/CodeGen/NVPTX/fast-math.ll
  llvm/test/CodeGen/NVPTX/math-intrins.ll
  llvm/test/CodeGen/NVPTX/sqrt-approx.ll
  llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

Index: llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
===
--- llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -5,11 +5,11 @@
 ; hackery:
 
 ; RUN: cat %s > %t.ftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "true" }' >> %t.ftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
 ; RUN: opt < %t.ftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=FTZ
 
 ; RUN: cat %s > %t.noftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "false" }' >> %t.noftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
 ; RUN: opt < %t.noftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ
 
 ; We handle nvvm intrinsics with ftz variants as follows:
Index: llvm/test/CodeGen/NVPTX/sqrt-approx.ll
===
--- llvm/test/CodeGen/NVPTX/sqrt-approx.ll
+++ llvm/test/CodeGen/NVPTX/sqrt-approx.ll
@@ -146,5 +146,5 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
 attributes #2 = { "reciprocal-estimates" = "rsqrtf:1,rsqrtd:1,sqrtf:1,sqrtd:1" }
Index: llvm/test/CodeGen/NVPTX/math-intrins.ll
===
--- llvm/test/CodeGen/NVPTX/math-intrins.ll
+++ llvm/test/CodeGen/NVPTX/math-intrins.ll
@@ -289,4 +289,4 @@
 }
 
 attributes #0 = { nounwind readnone }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/test/CodeGen/NVPTX/fast-math.ll
===
--- llvm/test/CodeGen/NVPTX/fast-math.ll
+++ llvm/test/CodeGen/NVPTX/fast-math.ll
@@ -162,4 +162,4 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
===
--- llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -15,6 +15,7 @@
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/STLExtras.h"
@@ -1703,9 +1704,10 @@
   // intrinsic, we don't have to look up any module metadata, as
   // FtzRequirementTy will be FTZ_Any.)
   if (Action.FtzRequirement != FTZ_Any) {
-bool FtzEnabled =
-II->getFunction()->getFnAttribute("nvptx-f32ftz").getValueAsString() ==
-"true";
+StringRef Attr = II->getFunction()
+ ->getFnAttribute("denormal-fp-math-f32")
+ .getValueAsString();
+bool FtzEnabled = parseDenormalFPAttribute(Attr) != DenormalMode::IEEE;
 
 if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
   return nullptr;
Index: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
===
--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -120,14 +120,10 @@
   if 

[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-07 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added a comment.

In D69878#1736865 , @Anastasia wrote:

> > Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
> >  has no in-tree users. The meaning would also be target dependent, such
> >  as the AMDGPU choice to treat this as only meaning allow flushing of
> >  f32 and not f16 or f64. The naming is also potentially confusing,
> >  since DAZ in other contexts refers to instructions implicitly treating
> >  input denormals as zero, not necessarily flushing output denormals to
> >  zero.
>
> Would the targets supporting OpenCL need to define their own behavior in 
> `getDefaultDenormalModeForType`?


Yes. The future ieee default should be conservatively correct though




Comment at: clang/include/clang/Driver/ToolChain.h:619
+  const llvm::fltSemantics *FPType = nullptr) const {
+// FIXME: This should be IEEE when default handling is fixed.
+return llvm::DenormalMode::Invalid;

Anastasia wrote:
> Can you elaborate what has to be done in order to fix this?
The main problem is the current user assumes non-ieee by default. The main 
blocker is knowing what platforms should default to something different to 
avoid performance regressions. I have the patch almost ready to switch the 
default, it’s just missing toolchain overrides


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-07 Thread Anastasia Stulova via Phabricator via cfe-commits
Anastasia added a comment.

> Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
>  has no in-tree users. The meaning would also be target dependent, such
>  as the AMDGPU choice to treat this as only meaning allow flushing of
>  f32 and not f16 or f64. The naming is also potentially confusing,
>  since DAZ in other contexts refers to instructions implicitly treating
>  input denormals as zero, not necessarily flushing output denormals to
>  zero.

Would the targets supporting OpenCL need to define their own behavior in 
`getDefaultDenormalModeForType`?




Comment at: clang/include/clang/Driver/ToolChain.h:619
+  const llvm::fltSemantics *FPType = nullptr) const {
+// FIXME: This should be IEEE when default handling is fixed.
+return llvm::DenormalMode::Invalid;

Can you elaborate what has to be done in order to fix this?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 228167.
arsenm added a comment.

Rename subnormal to denormal. Will defer splitting input and output setting 
into a future patch before switching default behavior


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/CC1Options.td
  clang/include/clang/Driver/Options.td
  clang/include/clang/Driver/ToolChain.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDA/flush-denormals.cu
  clang/test/CodeGenCUDA/propagate-metadata.cu
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/denorms-are-zero.cl
  clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl
  clang/test/Driver/cl-denorms-are-zero.cl
  clang/test/Driver/cuda-flush-denormals-to-zero.cu
  clang/test/Driver/denormal-fp-math.c
  clang/test/Driver/opencl.cl
  llvm/docs/LangRef.rst
  llvm/lib/CodeGen/MachineFunction.cpp
  llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
  llvm/test/CodeGen/NVPTX/fast-math.ll
  llvm/test/CodeGen/NVPTX/math-intrins.ll
  llvm/test/CodeGen/NVPTX/sqrt-approx.ll
  llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

Index: llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
===
--- llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -5,11 +5,11 @@
 ; hackery:
 
 ; RUN: cat %s > %t.ftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "true" }' >> %t.ftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
 ; RUN: opt < %t.ftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=FTZ
 
 ; RUN: cat %s > %t.noftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "false" }' >> %t.noftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
 ; RUN: opt < %t.noftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ
 
 ; We handle nvvm intrinsics with ftz variants as follows:
Index: llvm/test/CodeGen/NVPTX/sqrt-approx.ll
===
--- llvm/test/CodeGen/NVPTX/sqrt-approx.ll
+++ llvm/test/CodeGen/NVPTX/sqrt-approx.ll
@@ -146,5 +146,5 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
 attributes #2 = { "reciprocal-estimates" = "rsqrtf:1,rsqrtd:1,sqrtf:1,sqrtd:1" }
Index: llvm/test/CodeGen/NVPTX/math-intrins.ll
===
--- llvm/test/CodeGen/NVPTX/math-intrins.ll
+++ llvm/test/CodeGen/NVPTX/math-intrins.ll
@@ -289,4 +289,4 @@
 }
 
 attributes #0 = { nounwind readnone }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/test/CodeGen/NVPTX/fast-math.ll
===
--- llvm/test/CodeGen/NVPTX/fast-math.ll
+++ llvm/test/CodeGen/NVPTX/fast-math.ll
@@ -162,4 +162,4 @@
 }
 
 attributes #0 = { "unsafe-fp-math" = "true" }
-attributes #1 = { "nvptx-f32ftz" = "true" }
+attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
Index: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
===
--- llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -15,6 +15,7 @@
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/STLExtras.h"
@@ -1703,9 +1704,10 @@
   // intrinsic, we don't have to look up any module metadata, as
   // FtzRequirementTy will be FTZ_Any.)
   if (Action.FtzRequirement != FTZ_Any) {
-bool FtzEnabled =
-II->getFunction()->getFnAttribute("nvptx-f32ftz").getValueAsString() ==
-"true";
+StringRef Attr = II->getFunction()
+ ->getFnAttribute("denormal-fp-math-f32")
+ .getValueAsString();
+bool FtzEnabled = parseDenormalFPAttribute(Attr) != DenormalMode::IEEE;
 
 if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
   return nullptr;
Index: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
===
--- 

[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: llvm/docs/LangRef.rst:1828-1831
+   be flushed to zero by standard floating point operations. It is not
+   mandated that flushing to zero occurs, but if a subnormal output is
+   flushed to zero, it must respect the sign mode. Not all targets
+   support all modes.

arsenm wrote:
> On second thought I think this may be too permissive. I think based on the 
> use in DAGCombiner, that flushing of outputs is compulsory.
It turns out the fast sqrt usage really cares about input denormals being 
implicitly treated as 0, not the output flushing (i.e. this only needs DAZ, not 
FTZ). I think being permissive on the output is OK, but if implicit input 
flushing is required then it's compulsory and a target is responsible for 
inserting a flush of some kind if the use instruction isn't known to follow 
this mode.

Because of this, I do think it's necessary to treat this as two separate modes. 
I'm thinking to comma separate output-mode,input-mode, and assume 
input-mode=output-mode if the second half isn't specified for compatibility 
with the existing attribute.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done.
arsenm added inline comments.



Comment at: llvm/docs/LangRef.rst:1828-1831
+   be flushed to zero by standard floating point operations. It is not
+   mandated that flushing to zero occurs, but if a subnormal output is
+   flushed to zero, it must respect the sign mode. Not all targets
+   support all modes.

On second thought I think this may be too permissive. I think based on the use 
in DAGCombiner, that flushing of outputs is compulsory.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69878: Consoldiate internal denormal flushing controls

2019-11-05 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
arsenm added reviewers: scanon, spatel, cameron.mcinally, andrew.w.kaylor, tra, 
jlebar, Anastasia, yaxunl.
Herald added subscribers: hiraditya, kristof.beyls, tpr, nhaehnle, wdng, 
jvesely, jholewinski.
Herald added a project: LLVM.
arsenm added parent revisions: D69598: Work on cleaning up denormal mode 
handling, D69583: AMDGPU: Refactor treatment of denormal mode, D69729:  AMDGPU: 
Be explicit about denormal mode in MIR tests, D69547: DAG: Add function context 
to isFMAFasterThanFMulAndFAdd.

Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

  
  - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
  - NVPTX uses the nvptx-f32ftz attribute
  - ARM directly uses the denormal-fp-math attribute
  - Other targets indirectly use denormal-fp-math in one DAGCombine
  - cl-denorms-are-zero has a corresponding denorms-are-zero attribute


AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

  

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

  

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
-attributes.

  

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

  

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

  

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.


https://reviews.llvm.org/D69878

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/CC1Options.td
  clang/include/clang/Driver/Options.td
  clang/include/clang/Driver/ToolChain.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/AMDGPU.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGenCUDA/flush-denormals.cu
  clang/test/CodeGenCUDA/propagate-metadata.cu
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/denorms-are-zero.cl
  clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl
  clang/test/Driver/cl-denorms-are-zero.cl
  clang/test/Driver/cuda-flush-denormals-to-zero.cu
  clang/test/Driver/denormal-fp-math.c
  clang/test/Driver/opencl.cl
  llvm/docs/LangRef.rst
  llvm/lib/CodeGen/MachineFunction.cpp
  llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
  llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
  llvm/test/CodeGen/NVPTX/fast-math.ll
  llvm/test/CodeGen/NVPTX/math-intrins.ll
  llvm/test/CodeGen/NVPTX/sqrt-approx.ll
  llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

Index: llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
===
--- llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -5,11 +5,11 @@
 ; hackery:
 
 ; RUN: cat %s > %t.ftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "true" }' >> %t.ftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
 ; RUN: opt < %t.ftz -instcombine -S | FileCheck %s --check-prefix=CHECK --check-prefix=FTZ
 
 ; RUN: cat %s > %t.noftz
-; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "false" }' >> %t.noftz
+; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
 ; RUN: opt < %t.noftz -instcombine -S | FileCheck %s