[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2023-07-26 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9461-9463
+  bool CorrectSqrt = CGM.getLangOpts().OpenCL
+ ? CGM.getCodeGenOpts().OpenCLCorrectlyRoundedDivSqrt
+ : CGM.getCodeGenOpts().HIPCorrectlyRoundedDivSqrt;

Can we move this into something more proper in LangOpts?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9467
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);
+  AddGlobal("__oclc_finite_only_opt", FiniteOnly || RelaxedMath, /*Size=*/8);
+  AddGlobal("__oclc_unsafe_math_opt", UnsafeMath || RelaxedMath, /*Size=*/8);

I'd hope you don't have to check relaxed math, finite only should suffice



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9476
+llvm::GlobalValue::LinkOnceODRLinkage);
+  AddGlobal("__oclc_ABI_version",
+CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32,

This should probably get an __llvm_amdgcn prefix and be renamed


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2023-07-26 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

We should just do this now. clang shouldn't have to dig around on disk to emit 
a constant definition for a constant it already knows, and we have a clear path 
to removing these globals altogether. I have adequate patches to completely 
delete `__oclc_daz_opt` today. `__oclc_finite_only_opt` should be deleteable as 
soon as nofpclass is inferred by default. Deleting 
`__oclc_correctly_rounded_sqrt32` and `__oclc_unsafe_math_opt` require more 
work, but are basically the same thing and require extending the libcall 
optimizer pass.

It will be easier to delete these from the library as they become unnecessary 
if clang stops enforcing these files exists like it does today, and it's easier 
to just stop using them entirely than to delete them one at a time


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D130096#3850708 , @b-sumner wrote:

>> Different functions providing different behaviors can be handled at link 
>> time like any other function, instead of the same functions providing 
>> different behaviors per translation unit and requires cloning. The current 
>> scheme transfers complexity from the device library build system into the 
>> driver and user binaries
>
> OK, but we are talking about trading a solved problem with a solution working 
> for years for adding a large amount of new work and new maintenance and new 
> bugs.  Does this need to be done now, or at all?

I wouldn't really call it a solved problem when only one of the many users is 
currently linking these libraries correctly


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Brian Sumner via Phabricator via cfe-commits
b-sumner added a comment.

> Different functions providing different behaviors can be handled at link time 
> like any other function, instead of the same functions providing different 
> behaviors per translation unit and requires cloning. The current scheme 
> transfers complexity from the device library build system into the driver and 
> user binaries

OK, but we are talking about trading a solved problem with a solution working 
for years for adding a large amount of new work and new maintenance and new 
bugs.  Does this need to be done now, or at all?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D130096#3850628 , @arsenm wrote:

> In D130096#3850550 , @b-sumner 
> wrote:
>
>> There's the "small matter" of implementing the new device library functions. 
>>  Why is all that more likeable than two kinds of control constants?
>
> Different functions providing different behaviors can be handled at link time 
> like any other function, instead of the same functions providing different 
> behaviors per translation unit and requires cloning. The current scheme 
> transfers complexity from the device library build system into the driver and 
> user binaries

Another benefit of this would be that linking could be done only once in a 
sound manner rather than requiring an instance of the ROCm device libraries to 
be included for each TU. Although we would probably still need the attribute 
propagation that `-mlink-builtin-bitcode` offers, so it wouldn't be quite as 
easy that throwing the device libs into the `lld` invocation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D130096#3850550 , @b-sumner wrote:

> There's the "small matter" of implementing the new device library functions.  
> Why is all that more likeable than two kinds of control constants?

Different functions providing different behaviors can be handled at link time 
like any other function, instead of the same functions providing different 
behaviors per translation unit and requires cloning. The current scheme 
transfers complexity from the device library build system into the driver and 
user binaries


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Brian Sumner via Phabricator via cfe-commits
b-sumner added a comment.

In D130096#3850473 , @arsenm wrote:

> In D130096#3850472 , @jhuber6 wrote:
>
>> I don't like the fact that we need to have two different kinds of control 
>> constants, one per-TU and others per-link job. I'm wondering how difficult 
>> it would be to make the fast versions of the math calls use different entry 
>> points. That way we could handle this in the math header wrappers.
>
> That's really how the C linkage model wants you to handle this. I also would 
> like to have FP value tracking optimizations take care of the special cases 
> in the library code

There's the "small matter" of implementing the new device library functions.  
Why is all that more likeable than two kinds of control constants?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D130096#3850472 , @jhuber6 wrote:

> I don't like the fact that we need to have two different kinds of control 
> constants, one per-TU and others per-link job. I'm wondering how difficult it 
> would be to make the fast versions of the math calls use different entry 
> points. That way we could handle this in the math header wrappers.

That's really how the C linkage model wants you to handle this. I also would 
like to have FP value tracking optimizations take care of the special cases in 
the library code


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

I don't like the fact that we need to have two different kinds of control 
constants, one per-TU and others per-link job. I'm wondering how difficult it 
would be to make the fast versions of the math calls use different entry 
points. That way we could handle this in the math header wrappers.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:8
+
+// GFX90A: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 1, align 1

jhuber6 wrote:
> yaxunl wrote:
> > yaxunl wrote:
> > > need an OpenCL test for -cl-denorms-are-zero
> > still missing this test, and some other tests for -cl-* options as 
> > commented below.
> > 
> > Also, missing a HIP test for -ffast-math
> The cc1 math options tested individually should be enabled by `-ffast-math`.
Since we cannot test -ffast-math directly, can we add a driver test to ensure 
we are not missing any -cc1 options needed by the control variables when 
-ffast-math is specified for the driver? Thanks.

Also, the -cl-* options are -cc1 options. We need to test them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:8
+
+// GFX90A: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 1, align 1

yaxunl wrote:
> yaxunl wrote:
> > need an OpenCL test for -cl-denorms-are-zero
> still missing this test, and some other tests for -cl-* options as commented 
> below.
> 
> Also, missing a HIP test for -ffast-math
The cc1 math options tested individually should be enabled by `-ffast-math`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:8
+
+// GFX90A: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 1, align 1

yaxunl wrote:
> need an OpenCL test for -cl-denorms-are-zero
still missing this test, and some other tests for -cl-* options as commented 
below.

Also, missing a HIP test for -ffast-math


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 464767.
jhuber6 added a comment.

Moving test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/lib/CodeGen/CodeGenAction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c
  clang/test/CodeGenCUDA/amdgcn-control-constants.hip

Index: clang/test/CodeGenCUDA/amdgcn-control-constants.hip
===
--- /dev/null
+++ clang/test/CodeGenCUDA/amdgcn-control-constants.hip
@@ -0,0 +1,46 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-globals --include-generated-funcs --global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device -target-cpu gfx90a -emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device -target-cpu gfx90a -mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+
+// REQUIRES: amdgpu-registered-target
+
+#include "Inputs/cuda.h"
+
+#ifdef LIBRARY
+
+extern unsigned char __constant__ __oclc_daz_opt;
+
+__device__ int foo(void) {
+  return __oclc_daz_opt ? 1 : 0;
+}
+
+#else
+
+extern __device__ int foo(void);
+
+__device__ void bar(void) {
+  foo();
+}
+
+#endif
+//.
+// CHECK: @__oclc_daz_opt = internal local_unnamed_addr addrspace(4) constant i8 0, align 1
+//.
+// CHECK-LABEL: define {{[^@]+}}@_Z3barv
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = call noundef i32 @_Z3foov() #[[ATTR1:[0-9]+]]
+// CHECK-NEXT:ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@_Z3foov
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr addrspacecast (ptr addrspace(4) @__oclc_daz_opt to ptr), align 1
+// CHECK-NEXT:[[TOBOOL:%.*]] = icmp ne i8 [[TMP0]], 0
+// CHECK-NEXT:[[TMP1:%.*]] = zext i1 [[TOBOOL]] to i64
+// CHECK-NEXT:[[COND:%.*]] = select i1 [[TOBOOL]], i32 1, i32 0
+// CHECK-NEXT:ret i32 [[COND]]
+//
Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:2-3
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature --check-globals --include-generated-funcs 
--global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+

JonChesterfield wrote:
> jhuber6 wrote:
> > jhuber6 wrote:
> > > yaxunl wrote:
> > > > This is compiling HIP as host. Please add -fcuda-is-device.
> > > This test should only require that the triple is `amdgcn`. I could 
> > > potentially make the generation of the constants require HIP or 
> > > OpenMPDevice, or OpenCL is enabled if you think that's bad.
> > I can also change it to just `-x c` if the HIP is the problem.
> We probably want these magic constants for C++ code as well, so keying it off 
> the triple (at least triple + that we're using rocm / compute stuff, which I 
> think is adequately indicated by hsa in the triple) is better. And likewise 
> don't want to emit these constants for non-gpu code, e.g. x64 host hip 
> doesn't need the daz_opt constant, which also suggests triple is the right 
> hook.
We don't officially support C on amdgcn but we officially support HIP. I would 
suggest move this to CodeGenCUDA and compile it as HIP, and use HIP syntax.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D130096#3816149 , @arsenm wrote:

> I'd prefer to avoid spreading special treatment of the device libraries into 
> the backend. The contract is poorly defined and spread around too much as it 
> is






Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:2-3
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature --check-globals --include-generated-funcs 
--global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+

jhuber6 wrote:
> jhuber6 wrote:
> > yaxunl wrote:
> > > This is compiling HIP as host. Please add -fcuda-is-device.
> > This test should only require that the triple is `amdgcn`. I could 
> > potentially make the generation of the constants require HIP or 
> > OpenMPDevice, or OpenCL is enabled if you think that's bad.
> I can also change it to just `-x c` if the HIP is the problem.
We probably want these magic constants for C++ code as well, so keying it off 
the triple (at least triple + that we're using rocm / compute stuff, which I 
think is adequately indicated by hsa in the triple) is better. And likewise 
don't want to emit these constants for non-gpu code, e.g. x64 host hip doesn't 
need the daz_opt constant, which also suggests triple is the right hook.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:2-3
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature --check-globals --include-generated-funcs 
--global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+

jhuber6 wrote:
> yaxunl wrote:
> > This is compiling HIP as host. Please add -fcuda-is-device.
> This test should only require that the triple is `amdgcn`. I could 
> potentially make the generation of the constants require HIP or OpenMPDevice, 
> or OpenCL is enabled if you think that's bad.
I can also change it to just `-x c` if the HIP is the problem.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:2-3
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature --check-globals --include-generated-funcs 
--global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+

yaxunl wrote:
> This is compiling HIP as host. Please add -fcuda-is-device.
This test should only require that the triple is `amdgcn`. I could potentially 
make the generation of the constants require HIP or OpenMPDevice, or OpenCL is 
enabled if you think that's bad.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:2-3
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature --check-globals --include-generated-funcs 
--global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+

This is compiling HIP as host. Please add -fcuda-is-device.



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:7
+
+extern unsigned char [[clang::address_space(5)]] __oclc_daz_opt;
+

use `__constant__` instead



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:9
+
+int foo(void) {
+  return __oclc_daz_opt ? 1 : 0;

add `__device__`



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:15
+
+extern int foo(void);
+

add `__device__`



Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:17
+
+void bar(void) {
+  foo();

add `__device__`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-26 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 463042.
jhuber6 added a comment.

Adding test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/lib/CodeGen/CodeGenAction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c
  clang/test/CodeGen/amdgcn-link-control-constants.c

Index: clang/test/CodeGen/amdgcn-link-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-link-control-constants.c
@@ -0,0 +1,42 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-globals --include-generated-funcs --global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+
+#ifdef LIBRARY
+
+extern unsigned char [[clang::address_space(5)]] __oclc_daz_opt;
+
+int foo(void) {
+  return __oclc_daz_opt ? 1 : 0;
+}
+
+#else
+
+extern int foo(void);
+
+void bar(void) {
+  foo();
+}
+
+#endif
+//.
+// CHECK: @__oclc_daz_opt = internal local_unnamed_addr addrspace(4) constant i8 0, align 1
+//.
+// CHECK-LABEL: define {{[^@]+}}@_Z3barv
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = call noundef i32 @_Z3foov()
+// CHECK-NEXT:ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@_Z3foov
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr addrspace(5) addrspacecast (ptr addrspace(4) @__oclc_daz_opt to ptr addrspace(5)), align 1
+// CHECK-NEXT:[[TOBOOL:%.*]] = icmp ne i8 [[TMP0]], 0
+// CHECK-NEXT:[[TMP1:%.*]] = zext i1 [[TOBOOL]] to i64
+// CHECK-NEXT:[[COND:%.*]] = select i1 [[TOBOOL]], i32 1, i32 0
+// CHECK-NEXT:ret i32 [[COND]]
+//
Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable finite 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-26 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/CodeGenAction.cpp:299-308
+  if (!LinkModules.empty() && Gen->CGM().getTriple().isAMDGCN() &&
+  !Gen->CGM().getLangOpts().GPURelocatableDeviceCode) {
+const StringRef GVS[] = {"__oclc_daz_opt", "__oclc_unsafe_math_opt",
+ "__oclc_finite_only_opt",
+ "__oclc_correctly_rounded_sqrt32"};
+for (StringRef Name : GVS) {
+  if (llvm::GlobalVariable *GV = getModule()->getGlobalVariable(Name))

need a test.

Probably let clang generate a bitcode containing a function using these control 
vars, then link the bitcode by -mlink-builtin-bitcode, then check the linkage 
of these control vars.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-26 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D130096#3815529 , @jhuber6 wrote:

> The best solution would be to handle these per-TU variables in the backend. 
> Or maybe even all of these could be placed in the backend where the code 
> paths that currently require a control constant could be a simple attribute 
> that the backend will use to control code emission.

I'd prefer to avoid spreading special treatment of the device libraries into 
the backend. The contract is poorly defined and spread around too much as it is


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-26 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 462948.
jhuber6 added a comment.

Adding an extra check in `CodeGenAction.cpp` that forcibly internalizes these 
if we link in any modules in RDC mode. This is a considerable hack, but should 
solve the problem. It's not a great solution, so let me know if you think that 
this is a leser evil than linking in many bitcode files as we do now.

To reiterate, what this patch offers is.
+ Reduces number of files needed to link in, (no on/off files, only `ocml.bc` 
and `ockl.bc` are needed).
+ Enforces that the architecture constants are the same across the compilation
And I think negatively,

- Requires a hack to internalize some variables to prevent linking problems
- Some extra code in Clang

The best solution would be to handle these per-TU variables in the backend. Or 
maybe even all of these could be placed in the backend where the code paths 
that currently require a control constant could be a simple attribute that the 
backend will use to control code emission.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/lib/CodeGen/CodeGenAction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c

Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable finite math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -ffinite-math-only \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FINITE-MATH
+// FINITE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// FINITE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+
+// Check that we can enable unsafe math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -menable-unsafe-fp-math \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=UNSAFE-MATH
+// UNSAFE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// UNSAFE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can disable/enable correctly rounded square roots.
+// RUN: %clang_cc1 -x hip -triple 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9468
+  // Control constants for math operations.
+  AddGlobal("__oclc_wavefrontsize64", Wavefront64, /*Size=*/8);
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);

jhuber6 wrote:
> yaxunl wrote:
> > jhuber6 wrote:
> > > arsenm wrote:
> > > > yaxunl wrote:
> > > > > we need to disable emitting these variables for HIP -fgpu-rdc mode 
> > > > > and OpenCL since they will break per-TU control variable. Other cases 
> > > > > are OK.
> > > > wavefrontsize belongs with the system ones
> > > But the code would still depend on these and they wouldn't be present 
> > > right
> > > wavefrontsize belongs with the system ones
> > 
> > You are right. `__oclc_wavefrontsize64` should always be emitted with 
> > linkonce_odr linkage since they need to be consistent among TU's. Therefore 
> > they should always be emitted.
> > 
> > `__oclc_daz_opt`, `__oclc_finite_only_opt`, `__oclc_unsafe_math_opt`, and 
> > `__oclc_correctly_rounded_sqrt32` can be different per TU, therefore they 
> > should not be emitted for HIP `-fgpu-rdc` and OpenCL.
> I'm still unsure, if we do not emit any of those control variables how will 
> we use the device libraries for those builds.
> I'm still unsure, if we do not emit any of those control variables how will 
> we use the device libraries for those builds.

In those cases, we will use -mlink-builtin-bitcode to get those variables from 
device libs, as we did before. They will have internal linkage after linking, 
therefore are per-TU.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9468
+  // Control constants for math operations.
+  AddGlobal("__oclc_wavefrontsize64", Wavefront64, /*Size=*/8);
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);

yaxunl wrote:
> jhuber6 wrote:
> > arsenm wrote:
> > > yaxunl wrote:
> > > > we need to disable emitting these variables for HIP -fgpu-rdc mode and 
> > > > OpenCL since they will break per-TU control variable. Other cases are 
> > > > OK.
> > > wavefrontsize belongs with the system ones
> > But the code would still depend on these and they wouldn't be present right
> > wavefrontsize belongs with the system ones
> 
> You are right. `__oclc_wavefrontsize64` should always be emitted with 
> linkonce_odr linkage since they need to be consistent among TU's. Therefore 
> they should always be emitted.
> 
> `__oclc_daz_opt`, `__oclc_finite_only_opt`, `__oclc_unsafe_math_opt`, and 
> `__oclc_correctly_rounded_sqrt32` can be different per TU, therefore they 
> should not be emitted for HIP `-fgpu-rdc` and OpenCL.
I'm still unsure, if we do not emit any of those control variables how will we 
use the device libraries for those builds.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9468
+  // Control constants for math operations.
+  AddGlobal("__oclc_wavefrontsize64", Wavefront64, /*Size=*/8);
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);

jhuber6 wrote:
> arsenm wrote:
> > yaxunl wrote:
> > > we need to disable emitting these variables for HIP -fgpu-rdc mode and 
> > > OpenCL since they will break per-TU control variable. Other cases are OK.
> > wavefrontsize belongs with the system ones
> But the code would still depend on these and they wouldn't be present right
> wavefrontsize belongs with the system ones

You are right. `__oclc_wavefrontsize64` should always be emitted with 
linkonce_odr linkage since they need to be consistent among TU's. Therefore 
they should always be emitted.

`__oclc_daz_opt`, `__oclc_finite_only_opt`, `__oclc_unsafe_math_opt`, and 
`__oclc_correctly_rounded_sqrt32` can be different per TU, therefore they 
should not be emitted for HIP `-fgpu-rdc` and OpenCL.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9468-9472
+  AddGlobal("__oclc_wavefrontsize64", Wavefront64, /*Size=*/8);
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);
+  AddGlobal("__oclc_finite_only_opt", FiniteOnly || RelaxedMath, /*Size=*/8);
+  AddGlobal("__oclc_unsafe_math_opt", UnsafeMath || RelaxedMath, /*Size=*/8);
+  AddGlobal("__oclc_correctly_rounded_sqrt32", CorrectSqrt, /*Size=*/8);

arsenm wrote:
> yaxunl wrote:
> > we need to disable emitting these variables for HIP -fgpu-rdc mode and 
> > OpenCL since they will break per-TU control variable. Other cases are OK.
> wavefrontsize belongs with the system ones
But the code would still depend on these and they wouldn't be present right


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9468
+  // Control constants for math operations.
+  AddGlobal("__oclc_wavefrontsize64", Wavefront64, /*Size=*/8);
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);

yaxunl wrote:
> we need to disable emitting these variables for HIP -fgpu-rdc mode and OpenCL 
> since they will break per-TU control variable. Other cases are OK.
wavefrontsize belongs with the system ones


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9468-9472
+  AddGlobal("__oclc_wavefrontsize64", Wavefront64, /*Size=*/8);
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);
+  AddGlobal("__oclc_finite_only_opt", FiniteOnly || RelaxedMath, /*Size=*/8);
+  AddGlobal("__oclc_unsafe_math_opt", UnsafeMath || RelaxedMath, /*Size=*/8);
+  AddGlobal("__oclc_correctly_rounded_sqrt32", CorrectSqrt, /*Size=*/8);

we need to disable emitting these variables for HIP -fgpu-rdc mode and OpenCL 
since they will break per-TU control variable. Other cases are OK.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 460812.
jhuber6 added a comment.

Addressing comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c

Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_wavefrontsize64 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_wavefrontsize64 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable finite math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -ffinite-math-only \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FINITE-MATH
+// FINITE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// FINITE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+
+// Check that we can enable unsafe math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -menable-unsafe-fp-math \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=UNSAFE-MATH
+// UNSAFE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// UNSAFE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can disable/enable correctly rounded square roots.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fno-hip-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CORRECT-SQRT
+// CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa -target-cpu gfx90a -cl-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -disable-llvm-optzns -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CL-CORRECT-SQRT
+// CL-CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
Index: clang/lib/CodeGen/TargetInfo.h
===
--- clang/lib/CodeGen/TargetInfo.h
+++ clang/lib/CodeGen/TargetInfo.h
@@ -76,6 +76,9 @@
   CodeGen::CodeGenModule ,
   const llvm::MapVector ) const {}
 
+  /// Provides a convenient hook to handle extra target-specific globals.
+  virtual void 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 marked 3 inline comments as done.
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9449-9450
+  !(Features & llvm::AMDGPU::FEATURE_WAVE32) ||
+  llvm::is_contained(CGM.getTarget().getTargetOpts().FeaturesAsWritten,
+ "+wavefrontsize64");
+

arsenm wrote:
> Do we really have to scan through the features too? This seems broken
@yaxunl wanted this so we didn't emit the global if the user manually overrode 
the features via `-Xclang` or similar.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9449-9450
+  !(Features & llvm::AMDGPU::FEATURE_WAVE32) ||
+  llvm::is_contained(CGM.getTarget().getTargetOpts().FeaturesAsWritten,
+ "+wavefrontsize64");
+

Do we really have to scan through the features too? This seems broken



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9455
+  bool UnsafeMath = CGM.getLangOpts().UnsafeFPMath;
+  bool DenormAtZero = CGM.getCodeGenOpts().FP32DenormalMode ==
+  llvm::DenormalMode::getPreserveSign();

s/DenormAtZero/DenormAreZero/?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9458
+  bool FiniteOnly =
+  CGM.getLangOpts().NoHonorInfs || CGM.getLangOpts().NoHonorNaNs;
+

or doesn't look right. finite only is no infinities and no nans (not sure why 
the library control merges the two)



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9473-9475
+  AddGlobal("__oclc_ISA_version", Minor + Major * 1000, /*Size=*/32);
+  AddGlobal("__oclc_ABI_version",
+CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32);

These probably should use linkonce_odr


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-06 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 458186.
jhuber6 added a comment.

Changing to `linkonce` linkage. According to the LLVM spec this should have the
expected behaviour where a single definition is kept at link-time for each
module. I tested this with a sample `HIP` program and it had the desired
behaviour. I could add a test attempting to show this if needed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c

Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_wavefrontsize64 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = linkonce hidden local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = linkonce hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_wavefrontsize64 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_ISA_version = linkonce hidden local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = linkonce hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable finite math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -ffinite-math-only \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FINITE-MATH
+// FINITE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+// FINITE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+
+// Check that we can enable unsafe math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -menable-unsafe-fp-math \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=UNSAFE-MATH
+// UNSAFE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// UNSAFE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can disable/enable correctly rounded square roots.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fno-hip-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CORRECT-SQRT
+// CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa -target-cpu gfx90a -cl-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -disable-llvm-optzns -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CL-CORRECT-SQRT
+// CL-CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
Index: clang/lib/CodeGen/TargetInfo.h
===
--- 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-09-01 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

yaxunl wrote:
> jhuber6 wrote:
> > yaxunl wrote:
> > > jhuber6 wrote:
> > > > yaxunl wrote:
> > > > > yaxunl wrote:
> > > > > > jhuber6 wrote:
> > > > > > > yaxunl wrote:
> > > > > > > > This does not support per-TU control variables. Probably should 
> > > > > > > > use internal linkage.
> > > > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was 
> > > > > > > the most appropriate here. It should mean that we can have 
> > > > > > > multiple identical definitions and they don't clash. There's also 
> > > > > > > no requirement for these to be emitted as symbols AFAIK.
> > > > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was 
> > > > > > > the most appropriate here. It should mean that we can have 
> > > > > > > multiple identical definitions and they don't clash. There's also 
> > > > > > > no requirement for these to be emitted as symbols AFAIK.
> > > > > > 
> > > > > > clang uses  -mlink-builtin-bitcode to link these device libraries 
> > > > > > for HIP and OpenCL. Then the linkage of these variables becomes 
> > > > > > internal linkage. That's why it works for per-TU control.
> > > > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was 
> > > > > > > the most appropriate here. It should mean that we can have 
> > > > > > > multiple identical definitions and they don't clash. There's also 
> > > > > > > no requirement for these to be emitted as symbols AFAIK.
> > > > > > 
> > > > > > clang uses  -mlink-builtin-bitcode to link these device libraries 
> > > > > > for HIP and OpenCL. Then the linkage of these variables becomes 
> > > > > > internal linkage. That's why it works for per-TU control.
> > > > > 
> > > > > You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use 
> > > > > linkonce_odr since only HIP and OpenCL toolchain use 
> > > > > -mlink-builtin-bitcode to link these device libraries 
> > > > I see, `linkonce_odr` implies that these should all have the same value 
> > > > which isn't necessarily true after linking. I'll change it to use 
> > > > private linkage.
> > > > 
> > > > OpenMP right now links everything late which means that we don't allow 
> > > > these to be defined differently per-TU. This may be incorrect given 
> > > > this new method as each TU will have different things set. I can change 
> > > > OpenMP to use the `mlink` method after this patch which may be more 
> > > > strictly correct.
> > > > I see, `linkonce_odr` implies that these should all have the same value 
> > > > which isn't necessarily true after linking. I'll change it to use 
> > > > private linkage.
> > > > 
> > > > OpenMP right now links everything late which means that we don't allow 
> > > > these to be defined differently per-TU. This may be incorrect given 
> > > > this new method as each TU will have different things set. I can change 
> > > > OpenMP to use the `mlink` method after this patch which may be more 
> > > > strictly correct.
> > > 
> > > On second thoughts, the idea for letting clang to emit these control 
> > > variables might not work for HIP and OpenCL. The reason is that to 
> > > support per-TU control variables, these variables need to be internal or 
> > > private linkage, however, that means they cannot be used by other device 
> > > library functions which are expecting non-internal linkage for them. 
> > > Those device library functions will end up using control variables from 
> > > device library bitcode any way.
> > > 
> > > For OpenMP, it may be necessary to emit them as linkonce_odr, otherwise 
> > > device library functions may not find them.
> > > On second thoughts, the idea for letting clang to emit these control 
> > > variables might not work for HIP and OpenCL. The reason is that to 
> > > support per-TU control variables, these variables need to be internal or 
> > > private linkage, however, that means they cannot be used by other device 
> > > library functions which are expecting non-internal linkage for them. 
> > > Those device library functions will end up using control variables from 
> > > device library bitcode any way.
> > 
> > Right now we include each file per-TU using `-mlink-builtin-bitcode` which 
> > converts `linkonce_odr` to `private` linkage. Shouldn't this be equivalent? 
> > It may be possible to make some test showing a user of these constants to 
> > verify they get picked up correctly. If you're worried about these getting 
> > removed we may be able to stash them in `compiler.used`, that shouldn't 
> > impede the necessary constant propagation.
> > 
> > Side note, OpenCL seems to optimize these out without 
> > `-disable-llvm-optzns` while HIP will not. Does OpenCL use some mandatory 
> > 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-31 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

jhuber6 wrote:
> yaxunl wrote:
> > jhuber6 wrote:
> > > yaxunl wrote:
> > > > yaxunl wrote:
> > > > > jhuber6 wrote:
> > > > > > yaxunl wrote:
> > > > > > > This does not support per-TU control variables. Probably should 
> > > > > > > use internal linkage.
> > > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was 
> > > > > > the most appropriate here. It should mean that we can have multiple 
> > > > > > identical definitions and they don't clash. There's also no 
> > > > > > requirement for these to be emitted as symbols AFAIK.
> > > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was 
> > > > > > the most appropriate here. It should mean that we can have multiple 
> > > > > > identical definitions and they don't clash. There's also no 
> > > > > > requirement for these to be emitted as symbols AFAIK.
> > > > > 
> > > > > clang uses  -mlink-builtin-bitcode to link these device libraries for 
> > > > > HIP and OpenCL. Then the linkage of these variables becomes internal 
> > > > > linkage. That's why it works for per-TU control.
> > > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was 
> > > > > > the most appropriate here. It should mean that we can have multiple 
> > > > > > identical definitions and they don't clash. There's also no 
> > > > > > requirement for these to be emitted as symbols AFAIK.
> > > > > 
> > > > > clang uses  -mlink-builtin-bitcode to link these device libraries for 
> > > > > HIP and OpenCL. Then the linkage of these variables becomes internal 
> > > > > linkage. That's why it works for per-TU control.
> > > > 
> > > > You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use 
> > > > linkonce_odr since only HIP and OpenCL toolchain use 
> > > > -mlink-builtin-bitcode to link these device libraries 
> > > I see, `linkonce_odr` implies that these should all have the same value 
> > > which isn't necessarily true after linking. I'll change it to use private 
> > > linkage.
> > > 
> > > OpenMP right now links everything late which means that we don't allow 
> > > these to be defined differently per-TU. This may be incorrect given this 
> > > new method as each TU will have different things set. I can change OpenMP 
> > > to use the `mlink` method after this patch which may be more strictly 
> > > correct.
> > > I see, `linkonce_odr` implies that these should all have the same value 
> > > which isn't necessarily true after linking. I'll change it to use private 
> > > linkage.
> > > 
> > > OpenMP right now links everything late which means that we don't allow 
> > > these to be defined differently per-TU. This may be incorrect given this 
> > > new method as each TU will have different things set. I can change OpenMP 
> > > to use the `mlink` method after this patch which may be more strictly 
> > > correct.
> > 
> > On second thoughts, the idea for letting clang to emit these control 
> > variables might not work for HIP and OpenCL. The reason is that to support 
> > per-TU control variables, these variables need to be internal or private 
> > linkage, however, that means they cannot be used by other device library 
> > functions which are expecting non-internal linkage for them. Those device 
> > library functions will end up using control variables from device library 
> > bitcode any way.
> > 
> > For OpenMP, it may be necessary to emit them as linkonce_odr, otherwise 
> > device library functions may not find them.
> > On second thoughts, the idea for letting clang to emit these control 
> > variables might not work for HIP and OpenCL. The reason is that to support 
> > per-TU control variables, these variables need to be internal or private 
> > linkage, however, that means they cannot be used by other device library 
> > functions which are expecting non-internal linkage for them. Those device 
> > library functions will end up using control variables from device library 
> > bitcode any way.
> 
> Right now we include each file per-TU using `-mlink-builtin-bitcode` which 
> converts `linkonce_odr` to `private` linkage. Shouldn't this be equivalent? 
> It may be possible to make some test showing a user of these constants to 
> verify they get picked up correctly. If you're worried about these getting 
> removed we may be able to stash them in `compiler.used`, that shouldn't 
> impede the necessary constant propagation.
> 
> Side note, OpenCL seems to optimize these out without `-disable-llvm-optzns` 
> while HIP will not. Does OpenCL use some mandatory passes to ensure that 
> these control variables get handled? This method of using control constants 
> in general is somewhat problematic as it hides invalid code behind some 
> 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

yaxunl wrote:
> jhuber6 wrote:
> > yaxunl wrote:
> > > yaxunl wrote:
> > > > jhuber6 wrote:
> > > > > yaxunl wrote:
> > > > > > This does not support per-TU control variables. Probably should use 
> > > > > > internal linkage.
> > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > > > most appropriate here. It should mean that we can have multiple 
> > > > > identical definitions and they don't clash. There's also no 
> > > > > requirement for these to be emitted as symbols AFAIK.
> > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > > > most appropriate here. It should mean that we can have multiple 
> > > > > identical definitions and they don't clash. There's also no 
> > > > > requirement for these to be emitted as symbols AFAIK.
> > > > 
> > > > clang uses  -mlink-builtin-bitcode to link these device libraries for 
> > > > HIP and OpenCL. Then the linkage of these variables becomes internal 
> > > > linkage. That's why it works for per-TU control.
> > > > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > > > most appropriate here. It should mean that we can have multiple 
> > > > > identical definitions and they don't clash. There's also no 
> > > > > requirement for these to be emitted as symbols AFAIK.
> > > > 
> > > > clang uses  -mlink-builtin-bitcode to link these device libraries for 
> > > > HIP and OpenCL. Then the linkage of these variables becomes internal 
> > > > linkage. That's why it works for per-TU control.
> > > 
> > > You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use 
> > > linkonce_odr since only HIP and OpenCL toolchain use 
> > > -mlink-builtin-bitcode to link these device libraries 
> > I see, `linkonce_odr` implies that these should all have the same value 
> > which isn't necessarily true after linking. I'll change it to use private 
> > linkage.
> > 
> > OpenMP right now links everything late which means that we don't allow 
> > these to be defined differently per-TU. This may be incorrect given this 
> > new method as each TU will have different things set. I can change OpenMP 
> > to use the `mlink` method after this patch which may be more strictly 
> > correct.
> > I see, `linkonce_odr` implies that these should all have the same value 
> > which isn't necessarily true after linking. I'll change it to use private 
> > linkage.
> > 
> > OpenMP right now links everything late which means that we don't allow 
> > these to be defined differently per-TU. This may be incorrect given this 
> > new method as each TU will have different things set. I can change OpenMP 
> > to use the `mlink` method after this patch which may be more strictly 
> > correct.
> 
> On second thoughts, the idea for letting clang to emit these control 
> variables might not work for HIP and OpenCL. The reason is that to support 
> per-TU control variables, these variables need to be internal or private 
> linkage, however, that means they cannot be used by other device library 
> functions which are expecting non-internal linkage for them. Those device 
> library functions will end up using control variables from device library 
> bitcode any way.
> 
> For OpenMP, it may be necessary to emit them as linkonce_odr, otherwise 
> device library functions may not find them.
> On second thoughts, the idea for letting clang to emit these control 
> variables might not work for HIP and OpenCL. The reason is that to support 
> per-TU control variables, these variables need to be internal or private 
> linkage, however, that means they cannot be used by other device library 
> functions which are expecting non-internal linkage for them. Those device 
> library functions will end up using control variables from device library 
> bitcode any way.

Right now we include each file per-TU using `-mlink-builtin-bitcode` which 
converts `linkonce_odr` to `private` linkage. Shouldn't this be equivalent? It 
may be possible to make some test showing a user of these constants to verify 
they get picked up correctly. If you're worried about these getting removed we 
may be able to stash them in `compiler.used`, that shouldn't impede the 
necessary constant propagation.

Side note, OpenCL seems to optimize these out without `-disable-llvm-optzns` 
while HIP will not. Does OpenCL use some mandatory passes to ensure that these 
control variables get handled? This method of using control constants in 
general is somewhat problematic as it hides invalid code behind some mandatory 
CP and DCE passes. For OpenMP right now we just generate one version for each 
architecture, which is wasteful but somewhat easier to work with.
 


Repository:
  rG 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-30 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

jhuber6 wrote:
> yaxunl wrote:
> > yaxunl wrote:
> > > jhuber6 wrote:
> > > > yaxunl wrote:
> > > > > This does not support per-TU control variables. Probably should use 
> > > > > internal linkage.
> > > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > > most appropriate here. It should mean that we can have multiple 
> > > > identical definitions and they don't clash. There's also no requirement 
> > > > for these to be emitted as symbols AFAIK.
> > > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > > most appropriate here. It should mean that we can have multiple 
> > > > identical definitions and they don't clash. There's also no requirement 
> > > > for these to be emitted as symbols AFAIK.
> > > 
> > > clang uses  -mlink-builtin-bitcode to link these device libraries for HIP 
> > > and OpenCL. Then the linkage of these variables becomes internal linkage. 
> > > That's why it works for per-TU control.
> > > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > > most appropriate here. It should mean that we can have multiple 
> > > > identical definitions and they don't clash. There's also no requirement 
> > > > for these to be emitted as symbols AFAIK.
> > > 
> > > clang uses  -mlink-builtin-bitcode to link these device libraries for HIP 
> > > and OpenCL. Then the linkage of these variables becomes internal linkage. 
> > > That's why it works for per-TU control.
> > 
> > You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use 
> > linkonce_odr since only HIP and OpenCL toolchain use -mlink-builtin-bitcode 
> > to link these device libraries 
> I see, `linkonce_odr` implies that these should all have the same value which 
> isn't necessarily true after linking. I'll change it to use private linkage.
> 
> OpenMP right now links everything late which means that we don't allow these 
> to be defined differently per-TU. This may be incorrect given this new method 
> as each TU will have different things set. I can change OpenMP to use the 
> `mlink` method after this patch which may be more strictly correct.
> I see, `linkonce_odr` implies that these should all have the same value which 
> isn't necessarily true after linking. I'll change it to use private linkage.
> 
> OpenMP right now links everything late which means that we don't allow these 
> to be defined differently per-TU. This may be incorrect given this new method 
> as each TU will have different things set. I can change OpenMP to use the 
> `mlink` method after this patch which may be more strictly correct.

On second thoughts, the idea for letting clang to emit these control variables 
might not work for HIP and OpenCL. The reason is that to support per-TU control 
variables, these variables need to be internal or private linkage, however, 
that means they cannot be used by other device library functions which are 
expecting non-internal linkage for them. Those device library functions will 
end up using control variables from device library bitcode any way.

For OpenMP, it may be necessary to emit them as linkonce_odr, otherwise device 
library functions may not find them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 456520.
jhuber6 added a comment.

Remove unused code gen option.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c

Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_wavefrontsize64 = private local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_daz_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = private local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = private local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_wavefrontsize64 = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_daz_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_ISA_version = private local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = private local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = private local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: @__oclc_daz_opt = private local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable finite math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -ffinite-math-only \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FINITE-MATH
+// FINITE-MATH: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 1
+// FINITE-MATH: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 0
+
+// Check that we can enable unsafe math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -menable-unsafe-fp-math \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=UNSAFE-MATH
+// UNSAFE-MATH: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// UNSAFE-MATH: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can disable/enable correctly rounded square roots.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fno-hip-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CORRECT-SQRT
+// CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 0
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa -target-cpu gfx90a -cl-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -disable-llvm-optzns -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CL-CORRECT-SQRT
+// CL-CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 1
Index: clang/lib/CodeGen/TargetInfo.h
===
--- clang/lib/CodeGen/TargetInfo.h
+++ clang/lib/CodeGen/TargetInfo.h
@@ -76,6 +76,9 @@
   CodeGen::CodeGenModule ,
   const llvm::MapVector ) const {}
 
+  /// Provides a convenient hook to handle extra target-specific globals.
+  virtual void emitTargetGlobals(CodeGen::CodeGenModule ) const {}
+
   /// Any further codegen related checks that need to be done on a function call
   /// in a target specific manner.
   virtual void 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

If you want to overwrite them, weak/linkonce will work (no _odr). 
Private/internal will not be overwritten but existing uses will keep the 
private/internal version, IIRC. I assume you want the former.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 456450.
jhuber6 added a comment.

Changing to private linkage.

For OpenMP we could either make this use `weak_odr` so we have a single
definition surviving until link time for us to use. Or we could change OpenMP to
link in the bitcode libraries per-TU via `-mlink-builtin-bitcode`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c

Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_wavefrontsize64 = private local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_daz_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = private local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = private local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_wavefrontsize64 = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_daz_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_ISA_version = private local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = private local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = private local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: @__oclc_daz_opt = private local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable finite math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -ffinite-math-only \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FINITE-MATH
+// FINITE-MATH: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 1
+// FINITE-MATH: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 0
+
+// Check that we can enable unsafe math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -menable-unsafe-fp-math \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=UNSAFE-MATH
+// UNSAFE-MATH: @__oclc_finite_only_opt = private local_unnamed_addr addrspace(4) constant i8 0
+// UNSAFE-MATH: @__oclc_unsafe_math_opt = private local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can disable/enable correctly rounded square roots.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fno-hip-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CORRECT-SQRT
+// CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 0
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa -target-cpu gfx90a -cl-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -disable-llvm-optzns -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CL-CORRECT-SQRT
+// CL-CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = private local_unnamed_addr addrspace(4) constant i8 1
Index: clang/lib/CodeGen/TargetInfo.h
===
--- clang/lib/CodeGen/TargetInfo.h
+++ clang/lib/CodeGen/TargetInfo.h
@@ -76,6 +76,9 @@
   CodeGen::CodeGenModule ,
   const llvm::MapVector ) const {}
 
+  /// Provides a 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

yaxunl wrote:
> yaxunl wrote:
> > jhuber6 wrote:
> > > yaxunl wrote:
> > > > This does not support per-TU control variables. Probably should use 
> > > > internal linkage.
> > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > most appropriate here. It should mean that we can have multiple identical 
> > > definitions and they don't clash. There's also no requirement for these 
> > > to be emitted as symbols AFAIK.
> > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > most appropriate here. It should mean that we can have multiple identical 
> > > definitions and they don't clash. There's also no requirement for these 
> > > to be emitted as symbols AFAIK.
> > 
> > clang uses  -mlink-builtin-bitcode to link these device libraries for HIP 
> > and OpenCL. Then the linkage of these variables becomes internal linkage. 
> > That's why it works for per-TU control.
> > > The AMDGPU device libraries use `linkone_odr` so I figured it was the 
> > > most appropriate here. It should mean that we can have multiple identical 
> > > definitions and they don't clash. There's also no requirement for these 
> > > to be emitted as symbols AFAIK.
> > 
> > clang uses  -mlink-builtin-bitcode to link these device libraries for HIP 
> > and OpenCL. Then the linkage of these variables becomes internal linkage. 
> > That's why it works for per-TU control.
> 
> You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use 
> linkonce_odr since only HIP and OpenCL toolchain use -mlink-builtin-bitcode 
> to link these device libraries 
I see, `linkonce_odr` implies that these should all have the same value which 
isn't necessarily true after linking. I'll change it to use private linkage.

OpenMP right now links everything late which means that we don't allow these to 
be defined differently per-TU. This may be incorrect given this new method as 
each TU will have different things set. I can change OpenMP to use the `mlink` 
method after this patch which may be more strictly correct.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

yaxunl wrote:
> jhuber6 wrote:
> > yaxunl wrote:
> > > This does not support per-TU control variables. Probably should use 
> > > internal linkage.
> > The AMDGPU device libraries use `linkone_odr` so I figured it was the most 
> > appropriate here. It should mean that we can have multiple identical 
> > definitions and they don't clash. There's also no requirement for these to 
> > be emitted as symbols AFAIK.
> > The AMDGPU device libraries use `linkone_odr` so I figured it was the most 
> > appropriate here. It should mean that we can have multiple identical 
> > definitions and they don't clash. There's also no requirement for these to 
> > be emitted as symbols AFAIK.
> 
> clang uses  -mlink-builtin-bitcode to link these device libraries for HIP and 
> OpenCL. Then the linkage of these variables becomes internal linkage. That's 
> why it works for per-TU control.
> > The AMDGPU device libraries use `linkone_odr` so I figured it was the most 
> > appropriate here. It should mean that we can have multiple identical 
> > definitions and they don't clash. There's also no requirement for these to 
> > be emitted as symbols AFAIK.
> 
> clang uses  -mlink-builtin-bitcode to link these device libraries for HIP and 
> OpenCL. Then the linkage of these variables becomes internal linkage. That's 
> why it works for per-TU control.

You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use 
linkonce_odr since only HIP and OpenCL toolchain use -mlink-builtin-bitcode to 
link these device libraries 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

jhuber6 wrote:
> yaxunl wrote:
> > This does not support per-TU control variables. Probably should use 
> > internal linkage.
> The AMDGPU device libraries use `linkone_odr` so I figured it was the most 
> appropriate here. It should mean that we can have multiple identical 
> definitions and they don't clash. There's also no requirement for these to be 
> emitted as symbols AFAIK.
> The AMDGPU device libraries use `linkone_odr` so I figured it was the most 
> appropriate here. It should mean that we can have multiple identical 
> definitions and they don't clash. There's also no requirement for these to be 
> emitted as symbols AFAIK.

clang uses  -mlink-builtin-bitcode to link these device libraries for HIP and 
OpenCL. Then the linkage of these variables becomes internal linkage. That's 
why it works for per-TU control.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

yaxunl wrote:
> This does not support per-TU control variables. Probably should use internal 
> linkage.
The AMDGPU device libraries use `linkone_odr` so I figured it was the most 
appropriate here. It should mean that we can have multiple identical 
definitions and they don't clash. There's also no requirement for these to be 
emitted as symbols AFAIK.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 456441.
jhuber6 added a comment.

Updating. I realized all of the math-related ones are already covered by driver 
options for AMDGPU passing the appropriate fp contract to the frontend. This 
patch gets rid of most of that handling and just uses those directly. Also 
makes it easier to test.

We also check if the `+wavefront64` feature was explicitly turned on as part of 
@yaxunl's suggestion.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-control-constants.c

Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,49 @@
+// Check that we generate all the expected default features for the target.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9010
+// GFX90A: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// GFX1030: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+// GFX1030: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 10048
+// GFX1030: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
+
+// Check that we can override the wavefront features.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=WAVEFRONT
+// WAVEFRONT: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable denormalization at zero.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DENORM-AT-ZERO
+// DENORM-AT-ZERO: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can enable finite math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -ffinite-math-only \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FINITE-MATH
+// FINITE-MATH: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+// FINITE-MATH: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+
+// Check that we can enable unsafe math.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -menable-unsafe-fp-math \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=UNSAFE-MATH
+// UNSAFE-MATH: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// UNSAFE-MATH: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
+
+// Check that we can disable correctly rounded square roots.
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fno-hip-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CORRECT-SQRT
+// CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
+// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa -target-cpu gfx90a -cl-fp32-correctly-rounded-divide-sqrt \
+// RUN:   -disable-llvm-optzns -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CL-CORRECT-SQRT
+// CL-CORRECT-SQRT: 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-22 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9436
+CGM.getModule(), Type, true,
+llvm::GlobalValue::LinkageTypes::LinkOnceODRLinkage,
+llvm::ConstantInt::get(Type, Value), Name, nullptr,

This does not support per-TU control variables. Probably should use internal 
linkage.



Comment at: clang/lib/Frontend/CompilerInvocation.cpp:1679-1682
+  if (Args.hasArg(OPT_fgpu_flush_denormals_to_zero))
+Opts.AMDGPUDenormAtZero = true;
+  else if (Args.hasArg(OPT_fno_gpu_flush_denormals_to_zero))
+Opts.AMDGPUDenormAtZero = false;

For OpenCL, it should be determined by options::OPT_cl_denorms_are_zero



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:7
+// RUN: %clang_cc1 -x c -triple amdgcn-amd-amdhsa -target-cpu gfx908 
-funsafe-math-optimizations -S -emit-llvm -o - %s | FileCheck %s 
--check-prefix=UNSAFE-MATH
+
+// GFX90A: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1

need a test for -target-cpu gfx1030 -target-feature +wavefrontsize64 and check 
__oclc_wavefrontsize64 to be 1.



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:8
+
+// GFX90A: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 1, align 1

need an OpenCL test for -cl-denorms-are-zero



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:10
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 1, align 1
+// GFX90A: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1

need OpenCL tests for -cl-finite-math-only and -cl-fast-relaxed-math



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:11
+// GFX90A: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden 
local_unnamed_addr addrspace(4) constant i8 1, align 1

need OpenCL tests for -cl-unsafe-math-optimizations and -cl-fast-relaxed-math



Comment at: clang/test/CodeGen/amdgcn-control-constants.c:12
+// GFX90A: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden 
local_unnamed_addr addrspace(4) constant i8 1, align 1
+// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr 
addrspace(4) constant i32 9010, align 4

need an OpenCL test for -cl-fp32-correctly-rounded-divide-sqrt. If it needs 
CodeGenOpt you may need to re-use the option for HIP.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-08-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 453021.
jhuber6 added a comment.

Adjusting, adding code generation options for the other constants and changing 
to use linkonce ODR linkage.

I attempted to follow Jon's suggestion and group it with the existing code. but 
all the existing handling for this occurs in the driver. So I don't think 
there's a convenient way to drop in this functionality without adding a new 
function as in this patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGen/amdgcn-control-constants.c

Index: clang/test/CodeGen/amdgcn-control-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-control-constants.c
@@ -0,0 +1,54 @@
+// RUN: %clang_cc1 -x c -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX90A
+// RUN: %clang_cc1 -x c -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=GFX1030
+// RUN: %clang_cc1 -x c -triple amdgcn-amd-amdhsa -target-cpu gfx908 -ffast-math -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FAST
+// RUN: %clang_cc1 -x c -triple amdgcn-amd-amdhsa -target-cpu gfx908 -ffinite-math-only -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FINITE
+// RUN: %clang_cc1 -x c -triple amdgcn-amd-amdhsa -target-cpu gfx703 -fgpu-flush-denormals-to-zero -S -emit-llvm -o - %s | FileCheck %s --check-prefix=DAZ
+// RUN: %clang_cc1 -x c -triple amdgcn-amd-amdhsa -target-cpu gfx908 -funsafe-math-optimizations -S -emit-llvm -o - %s | FileCheck %s --check-prefix=UNSAFE-MATH
+
+// GFX90A: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// GFX90A: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9010, align 4
+// GFX90A: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400, align 4
+
+// GFX1030: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// GFX1030: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// GFX1030: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// GFX1030: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// GFX1030: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 10048, align 4
+// GFX1030: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400, align 4
+
+// FAST: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// FAST: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9008, align 4
+// FAST: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400, align 4
+
+// FINITE: @__oclc_daz_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// FINITE: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FINITE: @__oclc_finite_only_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FINITE: @__oclc_unsafe_math_opt = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// FINITE: @__oclc_correctly_rounded_sqrt32 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FINITE: @__oclc_ISA_version = linkonce_odr hidden 

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-20 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D130096#3666155 , @yaxunl wrote:

> The current patch does not consider HIP/OpenCL compile options, therefore the 
> value of these variables are not correct for OpenCL/HIP. They need to be 
> overridden by the variables with the same name in device libraries by clang 
> through -mlink-builtin-bitcode.
>
> If the patch check HIP/OpenCL compilation options to set the correct value 
> for these variables, then it does not need weak linkage.

Is we instead add it to `compiler.used` it should be propagated while staying 
alive for the linker https://godbolt.org/z/MG5n1MWWj. The downside is that this 
symbol will not be removed and a symbol to it will live in the binary. The 
symbol will have weak binding, so it won't cause any linker errors. But it's a 
little annoying to have things stick around like that. I'm considering making 
this code generation be controlled by a clang driver flag so we could 
potentially change behavior as needed there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-20 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9480
+  AddGlobal("__oclc_ISA_version", Minor + Major * 1000, 32);
+  AddGlobal("__oclc_ABI_version", 400, 32);
+}

jhuber6 wrote:
> yaxunl wrote:
> > should be determined by the code object version option.
> Yes I wasn't sure about this one. Could you elaborate where we derive that?
> Yes I wasn't sure about this one. Could you elaborate where we derive that?


CGM.getTarget().getTargetOpts().CodeObjectVersion


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-20 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D130096#3663411 , @arsenm wrote:

> In D130096#3663398 , @jhuber6 wrote:
>
>> In D130096#3663295 , @yaxunl wrote:
>>
>>> There is no constant propagation for globals with weak linage, right? 
>>> Otherwise, it won't work. My concern is that there may be optimization 
>>> passes which do not respect the weak linkage and uses the incorrect default 
>>> value for OpenCL or HIP. Therefore I am not very confident to enable this 
>>> for OpenCL or HIP unless these variables have the correct value based on 
>>> the compilation options.
>>
>> Instead of `weak_odr` we could probably use add this to compiler used 
>> instead if that's an issue.
>
> the libraries get internalized as-is. Why does this need to be weak_odr?

The current patch does not consider HIP/OpenCL compile options, therefore the 
value of these variables are not correct for OpenCL/HIP. They need to be 
overridden by the variables with the same name in device libraries by clang 
through -mlink-builtin-bitcode.

If the patch check HIP/OpenCL compilation options to set the correct value for 
these variables, then it does not need weak linkage.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D130096#3663062 , @JonChesterfield 
wrote:

> A safer bet is to use the current control flow that links in specific bitcode 
> files, but create the global directly instead of linking in the file. That'll 
> give us zero semantic change and a clang that ignores those bitcode files if 
> present.

I think I understand what you're saying better now. We should instead have this 
controlled as a flag via `clang` that the driver will add. This will just tell 
us to trigger some backend utility to emit the same code. I can look into doing 
that, will make it easier to just have the clang driver state that we should 
emit this for HIP / OpenMP unless `nogpulib` is passed for example.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D130096#3663411 , @arsenm wrote:

> In D130096#3663398 , @jhuber6 wrote:
>
>> In D130096#3663295 , @yaxunl wrote:
>>
>>> There is no constant propagation for globals with weak linage, right? 
>>> Otherwise, it won't work. My concern is that there may be optimization 
>>> passes which do not respect the weak linkage and uses the incorrect default 
>>> value for OpenCL or HIP. Therefore I am not very confident to enable this 
>>> for OpenCL or HIP unless these variables have the correct value based on 
>>> the compilation options.
>>
>> Instead of `weak_odr` we could probably use add this to compiler used 
>> instead if that's an issue.
>
> the libraries get internalized as-is. Why does this need to be weak_odr?

It depends where we want to do the linking. For my purposes I'd like to be able 
to link in these libraries at link time. This allows us to link in target 
specific libraries as-needed so we can make generated code more generic until 
linking or the backend. The problem with `linkonce_odr` is that it does not 
need to emit a symbol, so it will usually be optimized out by clang. E.g. the 
following won't work because these generated globals will be optimized out 
completely before we have any library to use them.

  clang amdgpu.c -c -O3
  clang amdgpu.o 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D130096#3663398 , @jhuber6 wrote:

> In D130096#3663295 , @yaxunl wrote:
>
>> There is no constant propagation for globals with weak linage, right? 
>> Otherwise, it won't work. My concern is that there may be optimization 
>> passes which do not respect the weak linkage and uses the incorrect default 
>> value for OpenCL or HIP. Therefore I am not very confident to enable this 
>> for OpenCL or HIP unless these variables have the correct value based on the 
>> compilation options.
>
> Instead of `weak_odr` we could probably use add this to compiler used instead 
> if that's an issue.

the libraries get internalized as-is. Why does this need to be weak_odr?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D130096#3663295 , @yaxunl wrote:

> There is no constant propagation for globals with weak linage, right? 
> Otherwise, it won't work. My concern is that there may be optimization passes 
> which do not respect the weak linkage and uses the incorrect default value 
> for OpenCL or HIP. Therefore I am not very confident to enable this for 
> OpenCL or HIP unless these variables have the correct value based on the 
> compilation options.

Instead of `weak_odr` we could probably use add this to compiler used instead 
if that's an issue.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 marked an inline comment as done.
jhuber6 added a comment.

In D130096#3663295 , @yaxunl wrote:

> There is no constant propagation for globals with weak linage, right? 
> Otherwise, it won't work. My concern is that there may be optimization passes 
> which do not respect the weak linkage and uses the incorrect default value 
> for OpenCL or HIP. Therefore I am not very confident to enable this for 
> OpenCL or HIP unless these variables have the correct value based on the 
> compilation options.

Yes, the problem is that `linkonce_odr` can be removed and as-such isn't usable 
for linking libraries late like we want to. You are correct that `weak_odr` 
normally cannot be propagated as another TU could potentially change it, but if 
we're linking this via LTO like AMDGPU does it should always be internalized in 
the linker. The OpenMP runtime has a similar `weak_odr` variable that gets 
internalized when we do LTO so it should apply here as well. Although my 
assumption is that AMDGPU always feeds bitcode directly to either `lld` or 
`clang-linker-wrapper` without invoking `llc` manually, I may be wrong there.




Comment at: clang/lib/CodeGen/TargetInfo.cpp:9480
+  AddGlobal("__oclc_ISA_version", Minor + Major * 1000, 32);
+  AddGlobal("__oclc_ABI_version", 400, 32);
+}

yaxunl wrote:
> should be determined by the code object version option.
Yes I wasn't sure about this one. Could you elaborate where we derive that?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

There is no constant propagation for globals with weak linage, right? 
Otherwise, it won't work. My concern is that there may be optimization passes 
which do not respect the weak linkage and uses the incorrect default value for 
OpenCL or HIP. Therefore I am not very confident to enable this for OpenCL or 
HIP unless these variables have the correct value based on the compilation 
options.




Comment at: clang/lib/CodeGen/TargetInfo.cpp:9480
+  AddGlobal("__oclc_ISA_version", Minor + Major * 1000, 32);
+  AddGlobal("__oclc_ABI_version", 400, 32);
+}

should be determined by the code object version option.



Comment at: clang/test/CodeGen/amdgcn-occl-constants.c:8
+// CHECK: @__oclc_daz_opt = weak_odr hidden local_unnamed_addr addrspace(4) 
constant i8 0, align 1
+// CHECK: @__oclc_wavefrontsize64 = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i8 1, align 1
+// CHECK: @__oclc_finite_only_opt = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i8 0, align 1

need a check for __oclc_wavefrontsize64=0 for gfx1030


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D130096#3663062 , @JonChesterfield 
wrote:

> A safer bet is to use the current control flow that links in specific bitcode 
> files, but create the global directly instead of linking in the file. That'll 
> give us zero semantic change and a clang that ignores those bitcode files if 
> present.

Do we expect those libraries to be linked per-TU via `-mlink-builtin-bitcode`? 
The usage I see passes them to `lld` directly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

A safer bet is to use the current control flow that links in specific bitcode 
files, but create the global directly instead of linking in the file. That'll 
give us zero semantic change and a clang that ignores those bitcode files if 
present.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9456
+llvm::ConstantInt::get(Type, Value), Name, nullptr,
+llvm::GlobalValue::ThreadLocalMode::NotThreadLocal, 4);
+GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Local);

Should use the address space enum



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9463
+  // TODO: Add flags to toggle these as-needed.
+  bool DenormAtZero = !((Features & llvm::AMDGPU::FEATURE_FAST_FMA_F32) &&
+(Features & llvm::AMDGPU::FEATURE_FAST_DENORMAL_F32));

Typo At


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D130096#3663010 , @JonChesterfield 
wrote:

> Tagging Brian as the code owner of rocm device libs - emitting these in clang 
> would simplify that library.
>
> Currently clang reads these commandline flags and conditionally links in 
> bitcode files to introduce these symbols. There's existing command line flags 
> for controlling which files are linked. I think this patch should probably 
> use the existing flags to choose which values to set and delete the existing 
> handling.
>
> As written I think this is a no op, in that the libraries will currently be 
> linked anyway and override the symbols clang has injected

Yeah, I wasn't sure if I should do some scan to check if we actually need 
these. Basically just check if any function declarations start with `__ocml`. 
But that might untenable in the future as we try to move to a generic math 
library that doesn't eagerly emit target specific declarations in clang.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

I've thought that directly emitting these constants would be better. This will 
also make it so you can't try to continue using llvm-link for these libraries, 
which is a plus since it doesn't have the same necessary attribute propagation 
clang does when linking these


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a reviewer: b-sumner.
JonChesterfield added a comment.

Tagging Brian as the code owner of rocm device libs - emitting these in clang 
would simplify that library.

Currently clang reads these commandline flags and conditionally links in 
bitcode files to introduce these symbols. There's existing command line flags 
for controlling which files are linked. I think this patch should probably use 
the existing flags to choose which values to set and delete the existing 
handling.

As written I think this is a no op, in that the libraries will currently be 
linked anyway and override the symbols clang has injected


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Let me know if I should move this code somewhere else, or if there are 
problems. One change I made is that the constant is `weak_odr` and `hidden` 
instead of `linkonce_odr` and `protected`. This is so this constant is alive 
until link time, AMDGPU pretty much always uses LTO so these should be 
optimized out when we internalize symbols. I'm assuming we don't need 
`protected` visibility as these shouldn't be read from another executable (e.g. 
the host).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: JonChesterfield, yaxunl, saiislam, arsenm, 
carlo.bertolli, MaskRay, jdoerfert, tianshilei1992.
Herald added subscribers: kosarev, StephenFan, t-tye, tpr, dstuttard, jvesely, 
kzhuravl.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

The AMDGPU library uses several control constants to change code paths
for the math functions and intrinsics. These are normally included using
several individual bitcode libraries at link time. However, this is
problematic because it requires us to know the AMDGPU architecture at
link time which should not be strictly necessary. This patch adds new
code that emits the constants that would normally be included by the
bitcode libraries. This removes around six libraries we would otherwise
need to include and now we can link these libraries in unconditionally
like we do with libdevice.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D130096

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGen/amdgcn-occl-constants.c

Index: clang/test/CodeGen/amdgcn-occl-constants.c
===
--- /dev/null
+++ clang/test/CodeGen/amdgcn-occl-constants.c
@@ -0,0 +1,21 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fgpu-fast-relaxed-math \
+// RUN:   -S -emit-llvm -o - %s | FileCheck %s --check-prefix=FAST
+
+void foo() {}
+
+// CHECK: @__oclc_daz_opt = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// CHECK: @__oclc_wavefrontsize64 = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// CHECK: @__oclc_finite_only_opt = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// CHECK: @__oclc_unsafe_math_opt = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// CHECK: @__oclc_correctly_rounded_sqrt32 = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// CHECK: @__oclc_ISA_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 9010, align 4
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400, align 4
+
+// FAST: @__oclc_daz_opt = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 0, align 1
+// FAST: @__oclc_wavefrontsize64 = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_finite_only_opt = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_unsafe_math_opt = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_correctly_rounded_sqrt32 = weak_odr hidden local_unnamed_addr addrspace(4) constant i8 1, align 1
+// FAST: @__oclc_ISA_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 9010, align 4
+// FAST: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400, align 4
Index: clang/lib/CodeGen/TargetInfo.h
===
--- clang/lib/CodeGen/TargetInfo.h
+++ clang/lib/CodeGen/TargetInfo.h
@@ -63,6 +63,9 @@
   CodeGen::CodeGenModule ,
   const llvm::MapVector ) const {}
 
+  /// Provides a convenient hook to handle extra target-specific globals.
+  virtual void emitTargetGlobals(CodeGen::CodeGenModule ) const {}
+
   /// Any further codegen related checks that need to be done on a function call
   /// in a target specific manner.
   virtual void checkFunctionCallABI(CodeGenModule , SourceLocation CallLoc,
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -34,6 +34,7 @@
 #include "llvm/IR/IntrinsicsS390.h"
 #include "llvm/IR/Type.h"
 #include "llvm/Support/MathExtras.h"
+#include "llvm/Support/TargetParser.h"
 #include "llvm/Support/raw_ostream.h"
 #include  // std::sort
 
@@ -9307,6 +9308,8 @@
   void setFunctionDeclAttributes(const FunctionDecl *FD, llvm::Function *F,
  CodeGenModule ) const;
 
+  void emitTargetGlobals(CodeGen::CodeGenModule ) const override;
+
   void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
CodeGen::CodeGenModule ) const override;
   unsigned getOpenCLKernelCallingConv() const override;
@@ -9422,6 +9425,61 @@
   }
 }
 
+void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
+CodeGen::CodeGenModule ) const {
+  if (!CGM.getTriple().isAMDGCN())
+return;
+  StringRef CPU = CGM.getTarget().getTargetOpts().CPU;
+  // Check if we have any function declarations of