[PATCH] D141700: AMDGPU: Move enqueued block handling into clang

2023-11-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 558095.
arsenm added a comment.

Drop bitcode auto upgrade handling


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141700/new/

https://reviews.llvm.org/D141700

Files:
  clang/lib/CodeGen/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel-linking.cl
  clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
  llvm/docs/AMDGPUUsage.rst
  llvm/lib/IR/AutoUpgrade.cpp
  llvm/lib/IR/CMakeLists.txt
  llvm/lib/Target/AMDGPU/AMDGPU.h
  llvm/lib/Target/AMDGPU/AMDGPUExportKernelRuntimeHandles.cpp
  llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
  llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h
  llvm/lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp
  llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
  llvm/lib/Target/AMDGPU/CMakeLists.txt
  llvm/test/CodeGen/AMDGPU/amdgpu-export-kernel-runtime-handles.ll
  llvm/test/CodeGen/AMDGPU/enqueue-kernel.ll
  llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
===
--- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -37,7 +37,7 @@
 ; GCN-O0-NEXT:Dominator Tree Construction
 ; GCN-O0-NEXT:Basic Alias Analysis (stateless AA impl)
 ; GCN-O0-NEXT:Function Alias Analysis Results
-; GCN-O0-NEXT:Lower OpenCL enqueued blocks
+; GCN-O0-NEXT:Externalize enqueued block runtime handles
 ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions
 ; GCN-O0-NEXT:FunctionPass Manager
 ; GCN-O0-NEXT:  Expand Atomic instructions
@@ -178,7 +178,7 @@
 ; GCN-O1-NEXT:Dominator Tree Construction
 ; GCN-O1-NEXT:Basic Alias Analysis (stateless AA impl)
 ; GCN-O1-NEXT:Function Alias Analysis Results
-; GCN-O1-NEXT:Lower OpenCL enqueued blocks
+; GCN-O1-NEXT:Externalize enqueued block runtime handles
 ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions
 ; GCN-O1-NEXT:AMDGPU Attributor
 ; GCN-O1-NEXT:  FunctionPass Manager
@@ -445,7 +445,7 @@
 ; GCN-O1-OPTS-NEXT:Dominator Tree Construction
 ; GCN-O1-OPTS-NEXT:Basic Alias Analysis (stateless AA impl)
 ; GCN-O1-OPTS-NEXT:Function Alias Analysis Results
-; GCN-O1-OPTS-NEXT:Lower OpenCL enqueued blocks
+; GCN-O1-OPTS-NEXT:Externalize enqueued block runtime handles
 ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions
 ; GCN-O1-OPTS-NEXT:AMDGPU Attributor
 ; GCN-O1-OPTS-NEXT:  FunctionPass Manager
@@ -736,7 +736,7 @@
 ; GCN-O2-NEXT:Dominator Tree Construction
 ; GCN-O2-NEXT:Basic Alias Analysis (stateless AA impl)
 ; GCN-O2-NEXT:Function Alias Analysis Results
-; GCN-O2-NEXT:Lower OpenCL enqueued blocks
+; GCN-O2-NEXT:Externalize enqueued block runtime handles
 ; GCN-O2-NEXT:Lower uses of LDS variables from non-kernel functions
 ; GCN-O2-NEXT:AMDGPU Attributor
 ; GCN-O2-NEXT:  FunctionPass Manager
@@ -1037,7 +1037,7 @@
 ; GCN-O3-NEXT:Dominator Tree Construction
 ; GCN-O3-NEXT:Basic Alias Analysis (stateless AA impl)
 ; GCN-O3-NEXT:Function Alias Analysis Results
-; GCN-O3-NEXT:Lower OpenCL enqueued blocks
+; GCN-O3-NEXT:Externalize enqueued block runtime handles
 ; GCN-O3-NEXT:Lower uses of LDS variables from non-kernel functions
 ; GCN-O3-NEXT:AMDGPU Attributor
 ; GCN-O3-NEXT:  FunctionPass Manager
Index: llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll
===
--- llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll
+++ llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll
@@ -14,7 +14,8 @@
 %struct.B = type { ptr addrspace(1) }
 %opencl.clk_event_t = type opaque
 
-@__test_block_invoke_kernel_runtime_handle = external addrspace(1) externally_initialized constant ptr addrspace(1)
+@__test_block_invoke_kernel_runtime_handle = external addrspace(1) externally_initialized constant ptr addrspace(1), section ".amdgpu.kernel.runtime.handle"
+@not.a.handle = external addrspace(1) externally_initialized constant ptr addrspace(1)
 
 ; CHECK:  ---
 ; CHECK-NEXT: amdhsa.kernels:
@@ -1678,7 +1679,7 @@
 ; CHECK:  .name:   __test_block_invoke_kernel
 ; CHECK:  .symbol: __test_block_invoke_kernel.kd
 define amdgpu_kernel void @__test_block_invoke_kernel(
-<{ i32, i32, ptr, ptr addrspace(1), i8 }> %arg) #1
+<{ i32, i32, ptr, ptr addrspace(1), i8 }> %arg) #1 !associated !112
 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !110
 !kernel_arg_base_type !110 !kernel_arg_type_qual !4 {
   ret void
@@ -1734,6 +1735,29 @@
   ret void
 }
 
+; Make sure the device_enqueue_symbol is not reported
+; CHECK: - .args:   []
+; CHECK-NEXT: .group_segment_fixed_size: 0
+; CHECK-NEXT: 

[PATCH] D141700: AMDGPU: Move enqueued block handling into clang

2023-11-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/IR/CMakeLists.txt:84
   Demangle
+  TransformUtils
+

This introduces a circular dependency between LLVMCore and TransformUtils. 
Options are:

1. Move appendToUsed into Module
2. Don't bother with bitcode compatibility for this
3. Avoid depending on llvm.used. I know I tried to do this but it was so long 
ago I don't remember how I ended up on this solution 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141700/new/

https://reviews.llvm.org/D141700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138507: HIP: Directly use sqrt builtins instead of calling ocml (f32 case)

2023-09-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm abandoned this revision.
arsenm added a comment.

reposted D158131 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138507/new/

https://reviews.llvm.org/D138507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158131: HIP: Directly use f32 sqrt intrinsic

2023-09-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

bca125569f33bd6a27c4c54815697966a823254e 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158131/new/

https://reviews.llvm.org/D158131

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-09-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

15e0fe0b6122e32657b98daf74a1fce028d2e5bf 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-09-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156743#4644285 , @Anastasia wrote:

> If we think there are no better alternatives and implementation is generic 
> enough for every vendor, LGTM!

You could argue annotating the raw callsite is better but I don't know how to 
implement that


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156989: FloatingPointMode: Use -1 for "Dynamic"

2023-09-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

ping. This enum should just match FLT_ROUNDS and designing ABI around whatever 
this was doing doesn't really make sense


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156989/new/

https://reviews.llvm.org/D156989

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-09-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157911: clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic

2023-09-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

6a08cf12d9cbc960159bf40e47078a882ca510ce 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157911/new/

https://reviews.llvm.org/D157911

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157911: clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic

2023-09-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 556343.
arsenm added a comment.

Release notes


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157911/new/

https://reviews.llvm.org/D157911

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/constrained-math-builtins.c
  clang/test/CodeGen/math-builtins.c
  clang/test/CodeGenOpenCL/builtins-f16.cl

Index: clang/test/CodeGenOpenCL/builtins-f16.cl
===
--- clang/test/CodeGenOpenCL/builtins-f16.cl
+++ clang/test/CodeGenOpenCL/builtins-f16.cl
@@ -24,6 +24,9 @@
   // CHECK: call half @llvm.exp2.f16(half %h0)
   res = __builtin_exp2f16(h0);
 
+  // CHECK: call half @llvm.exp10.f16(half %h0)
+  res = __builtin_exp10f16(h0);
+
   // CHECK: call half @llvm.floor.f16(half %h0)
   res = __builtin_floorf16(h0);
 
Index: clang/test/CodeGen/math-builtins.c
===
--- clang/test/CodeGen/math-builtins.c
+++ clang/test/CodeGen/math-builtins.c
@@ -318,6 +318,17 @@
 // HAS_ERRNO: declare x86_fp80 @exp2l(x86_fp80 noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare fp128 @exp2f128(fp128 noundef) [[NOT_READNONE]]
 
+__builtin_exp10(f);   __builtin_exp10f(f);  __builtin_exp10l(f); __builtin_exp10f128(f);
+
+// NO__ERRNO: declare double @llvm.exp10.f64(double) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.exp10.f32(float) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.exp10.f80(x86_fp80) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare fp128 @llvm.exp10.f128(fp128) [[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare double @exp10(double noundef) [[NOT_READNONE]]
+// HAS_ERRNO: declare float @exp10f(float noundef) [[NOT_READNONE]]
+// HAS_ERRNO: declare x86_fp80 @exp10l(x86_fp80 noundef) [[NOT_READNONE]]
+// HAS_ERRNO: declare fp128 @exp10f128(fp128 noundef) [[NOT_READNONE]]
+
 __builtin_expm1(f);  __builtin_expm1f(f); __builtin_expm1l(f); __builtin_expm1f128(f);
 
 // NO__ERRNO: declare double @expm1(double noundef) [[READNONE]]
Index: clang/test/CodeGen/constrained-math-builtins.c
===
--- clang/test/CodeGen/constrained-math-builtins.c
+++ clang/test/CodeGen/constrained-math-builtins.c
@@ -64,6 +64,13 @@
 // CHECK: call x86_fp80 @llvm.experimental.constrained.exp2.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 // CHECK: call fp128 @llvm.experimental.constrained.exp2.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
+  __builtin_exp10(f);   __builtin_exp10f(f);  __builtin_exp10l(f); __builtin_exp10f128(f);
+
+// CHECK: call double @exp10(double noundef %{{.*}})
+// CHECK: call float @exp10f(float noundef %{{.*}})
+// CHECK: call x86_fp80 @exp10l(x86_fp80 noundef %{{.*}})
+// CHECK: call fp128 @exp10f128(fp128 noundef %{{.*}})
+
   __builtin_floor(f);  __builtin_floorf(f); __builtin_floorl(f); __builtin_floorf128(f);
 
 // CHECK: call double @llvm.experimental.constrained.floor.f64(double %{{.*}}, metadata !"fpexcept.strict")
@@ -223,6 +230,11 @@
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.exp2.f80(x86_fp80, metadata, metadata)
 // CHECK: declare fp128 @llvm.experimental.constrained.exp2.f128(fp128, metadata, metadata)
 
+// CHECK: declare double @exp10(double noundef)
+// CHECK: declare float @exp10f(float noundef)
+// CHECK: declare x86_fp80 @exp10l(x86_fp80 noundef)
+// CHECK: declare fp128 @exp10f128(fp128 noundef)
+
 // CHECK: declare double @llvm.experimental.constrained.floor.f64(double, metadata)
 // CHECK: declare float @llvm.experimental.constrained.floor.f32(float, metadata)
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.floor.f80(x86_fp80, metadata)
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -2433,7 +2433,16 @@
   return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
Intrinsic::exp2,
Intrinsic::experimental_constrained_exp2));
-
+case Builtin::BI__builtin_exp10:
+case Builtin::BI__builtin_exp10f:
+case Builtin::BI__builtin_exp10f16:
+case Builtin::BI__builtin_exp10l:
+case Builtin::BI__builtin_exp10f128: {
+  // TODO: strictfp support
+  if (Builder.getIsFPConstrained())
+break;
+  return RValue::get(emitUnaryBuiltin(*this, E, Intrinsic::exp10));
+}
 case Builtin::BIfabs:
 case Builtin::BIfabsf:
 case Builtin::BIfabsl:
Index: clang/include/clang/Basic/Builtins.def
===
--- clang/include/clang/Basic/Builtins.def
+++ clang/include/clang/Basic/Builtins.def
@@ -245,6 +245,11 @@
 BUILTIN(__builtin_exp2f16, "hh"  , "Fne")
 

[PATCH] D159206: [Clang] Propagate target-features if compatible when using mlink-builtin-bitcode

2023-09-05 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:2030-2031
+  bool EnabledForTarget = TEntry->second;
+  if (EnabledForTarget != EnabledForFunc)
+return;
+}

jmmartinez wrote:
> arsenm wrote:
> > Early return breaks the other features
> I did not understand this remark.
> 
> If the features are not compatible, we do not add a "target-features" entry 
> in the new "FuncAttrs". Then, the old "target-features" entry is kept in the 
> Function coming from the builtin.
> 
> If you think it would be better to set the target-features in FuncAttrs to 
> the old value in any case. If that's the case I've added the following code:
> 
> if (EnabledForTarget != EnabledForFunc) {
> FuncAttr.addAttribute(FFeatures);
> return;
> }
You find an incompatible feature and then discontinue processing any further 
features by early exiting. I expect this to act like an append to any features 
already present. The incompatibility is at an individual feature level, not the 
group 



Comment at: clang/lib/CodeGen/CGCall.cpp:2034
+  }
+
+  FuncAttr.addAttribute("target-features", llvm::join(MergedFeatures, ","));

jmmartinez wrote:
> arsenm wrote:
> > Really it would be less bad if the incompatible functions were not imported 
> > rather than the backend pass
> I thought it was possible to have functions with incompatible features in the 
> same module. 
> e.g. one function compiled with some instruction set support, one without, 
> and an ifunc that resolves to one or the other.
> 
> Maybe it's not the case in the context of `-mlink-builtin-bitcode`?
The truth is this system isn't really well considered. We don't have real ifunc 
support and we probably shouldn't be using subtargets for cases with 
incompatible encodings 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159206/new/

https://reviews.llvm.org/D159206

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-09-05 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159206: [Clang] Propagate target-features if compatible when using mlink-builtin-bitcode

2023-08-31 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:2017
+for (StringRef Feature : llvm::split(FFeatures.getValueAsString(), ',')) {
+  bool EnabledForFunc = Feature[0] == '+';
+  StringRef Name = Feature.substr(1);

Do you need to guard against empty string?



Comment at: clang/lib/CodeGen/CGCall.cpp:2018
+  bool EnabledForFunc = Feature[0] == '+';
+  StringRef Name = Feature.substr(1);
+  auto TEntry = TFeatures.find(Name);

consume_front



Comment at: clang/lib/CodeGen/CGCall.cpp:2021
+
+  // if the feature is not set for the target-opts, it must be preserved
+  if (TEntry == TFeatures.end()) {

Capitalize



Comment at: clang/lib/CodeGen/CGCall.cpp:2027
+
+  // if the feature is enabled for one and disabled for the other, they are
+  // not compatible

Capitalize



Comment at: clang/lib/CodeGen/CGCall.cpp:2030-2031
+  bool EnabledForTarget = TEntry->second;
+  if (EnabledForTarget != EnabledForFunc)
+return;
+}

Early return breaks the other features


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159206/new/

https://reviews.llvm.org/D159206

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159206: [Clang] Propagate target-features if compatible when using mlink-builtin-bitcode

2023-08-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.h:398-401
+/// If \p F "target-features" are incompatible with the \p TargetOpts features,
+/// it is correct to drop the function. \return true if \p F is dropped
+bool dropFunctionWithIncompatibleAttributes(llvm::Function ,
+const TargetOptions );

i think this should be done in a separate patch, just propagate + append for 
step 1. There are other edge cases I'm worried about handling with this


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159206/new/

https://reviews.llvm.org/D159206

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-08-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

ping

The alternative is to directly put the !fpmath on the sqrt call sites but I 
have no idea how to do that


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159206: [Clang] Propagate target-features if compatible when using mlink-builtin-bitcode

2023-08-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:2035
+
+  FuncAttr.addAttribute("target-features", llvm::join(MergedFeatures, ","));
+}

do you need to guard against adding the empty attribute? I don't want to see 
"target-features"=""


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159206/new/

https://reviews.llvm.org/D159206

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159206: [Clang] Propagate target-features if compatible when using mlink-builtin-bitcode

2023-08-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:2034
+  }
+
+  FuncAttr.addAttribute("target-features", llvm::join(MergedFeatures, ","));

Really it would be less bad if the incompatible functions were not imported 
rather than the backend pass



Comment at: clang/test/CodeGen/link-builtin-bitcode.c:17
+int __attribute__((target("extended-image-insts"))) attr_not_in_target(void) { 
return 42; }
+int __attribute__((target("no-gfx9-insts"))) attr_uncompatible(void) { return 
42; }
 int x = 12;

This isn't a real target feature (do we not have a warning on this?)

s/uncompatible/incompatible


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159206/new/

https://reviews.llvm.org/D159206

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158695: [clang] Fix missing contract flag in sqrt intrinsic

2023-08-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/test/CodeGen/fp-contract-fast-pragma.cpp:77
+// CHECK: _Z13fp_contract_7f
+// CHECK: tail call contract float @llvm.sqrt.f32(float %a)
+  return __builtin_sqrtf(a);

This isn't demonstrating the strict support, probably need to add the pragma 
fenv_access for the -fexperimental-strict-floating-point  run line


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158695/new/

https://reviews.llvm.org/D158695

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158695: [clang] Fix missing contract flag in sqrt intrinsic

2023-08-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:501
   if (CGF.Builder.getIsFPConstrained()) {
 CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
 Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, 
Src0->getType());

rjmccall wrote:
> Is this existing condition not good enough, and why?
It's only in the strictfp branch for some reason, I don't think both would be 
needed



Comment at: clang/test/CodeGen/fp-contract-fast-pragma.cpp:2
 // RUN: %clang_cc1 -O3 -triple %itanium_abi_triple -emit-llvm -o - %s | 
FileCheck %s
 
 // Is FP_CONTRACT honored in a simple case?

Should also check constrained run line


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158695/new/

https://reviews.llvm.org/D158695

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158695: [clang] Fix missing contract flag in sqrt intrinsic

2023-08-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/test/CodeGen/fp-contract-fast-pragma.cpp:11
 #pragma clang fp contract(fast)
-  return a * b + c;
+  return a * b + c + __builtin_sqrtf(a);
 }

Should leave the existing test function alone and add a new one. Also can you 
test some cases with nested different values


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158695/new/

https://reviews.llvm.org/D158695

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-23 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

Codegen parts LGTM, questions with the driver parts




Comment at: clang/lib/Driver/ToolChain.cpp:1368
 if (A->getOption().matches(options::OPT_m_Group)) {
-  if (SameTripleAsHost)
+  // Pass code objection version to device toolchain
+  // to correctly set meta-data in intermediate files.

Typos



Comment at: clang/lib/Driver/ToolChain.cpp:1369
+  // Pass code objection version to device toolchain
+  // to correctly set meta-data in intermediate files.
+  if (SameTripleAsHost ||





Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8649-8650
 
+  // code-object-version=X needs to be passed to clang-linker-wrapper to ensure
+  // that it is used by lld.
+  if (const Arg *A = Args.getLastArg(options::OPT_mcode_object_version_EQ)) {

so device rtl is linked once as a normal library?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8653-8654
+CmdArgs.push_back(Args.MakeArgString("-mllvm"));
+CmdArgs.push_back(Args.MakeArgString(
+Twine("--amdhsa-code-object-version=") + A->getValue()));
+  }

Why do you need this? The code object version is supposed to come from a module 
flag. We should be getting rid of the command line argument for it



Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:406-410
+  // pass on -mllvm options to the clang
+  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm)) {
+CmdArgs.push_back("-mllvm");
+CmdArgs.push_back(Arg->getValue());
+  }

Shouldn't need this?



Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:417
 CmdArgs.push_back("-save-temps");
+// CmdArgs.push_back(Args.MakeArgString("--amdhsa-code-object-version=5"));
+  }

Commented out code


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17067
+
+Value *Iscov5 = CGF.Builder.CreateICmpSGE(
+ABIVersion,

Capitalization is weird, IsCOV5?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17082-17083
+Value *DispatchPtr = EmitAMDGPUDispatchPtr(CGF);
+auto *DispatchGEP =
+CGF.Builder.CreateGEP(CGF.Int8Ty, DispatchPtr, DispatchOffset);
+

CreateConstInBoundsGEP1_64



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17100
+}
+auto *GEP = CGF.Builder.CreateGEP(CGF.Int8Ty, ArgPtr, Offset);
+LD = CGF.Builder.CreateLoad(

CreateConstInBoundsGEP1_64



Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:364
+CodeGen::CodeGenModule ) const {
+  auto AddGlobal = [&](StringRef Name,
+   clang::TargetOptions::CodeObjectVersionKind Value,

Single use lamdba, just make this the function body



Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:381
+GV->setVisibility(llvm::GlobalValue::VisibilityTypes::HiddenVisibility);
+GV->setAlignment(CGM.getDataLayout().getABITypeAlign(Type));
+  };

No real point setting the alignment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158367: [AMDGPU] Add target feature gds/gws to clang

2023-08-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/TargetParser/TargetParser.cpp:289
   Features["image-insts"] = true;
+  Features["gds"] = true;
+  Features["gws"] = true;

yaxunl wrote:
> arsenm wrote:
> > Gds feature is unused 
> I am thinking to keep it in case we need it for newly added builtins or want 
> to diagnose ops requires gds.
But it's easy to add back in if and when that happens


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158367/new/

https://reviews.llvm.org/D158367

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158367: [AMDGPU] Add target feature gds/gws to clang

2023-08-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/TargetParser/TargetParser.cpp:289
   Features["image-insts"] = true;
+  Features["gds"] = true;
+  Features["gws"] = true;

Gds feature is unused 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158367/new/

https://reviews.llvm.org/D158367

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:145
+for (Function  : llvm::make_early_inc_range(M))
+  if (Apply || canTransformFunctionInIsolation(F))
+Changed |= runOnFunction(F);

I think you need to guard against calls to variadic intrinsics


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:296
+// Note - same attribute handling as DeadArgumentElimination
+NF->copyAttributesFrom();
+NF->setComdat(F.getComdat());

This might be missing copying the linkage


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:74-77
+Value *Mask = ConstantInt::get(IntPtrTy, ~(DataAlignMinusOne));
+Value *vaListAligned = Builder.CreateIntToPtr(
+Builder.CreateAnd(Builder.CreatePtrToInt(Incr, IntPtrTy), Mask),
+Incr->getType());

Can you use ptrmask?



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:142
+  bool runOnModule(Module ) override {
+bool Apply = ApplicableToAllDefault | ApplyToAllOverride;
+bool Changed = false;

This would be better as a pass parameter. Don't see much point to specifying 
individual functions by name


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:297
+NF->copyAttributesFrom();
+NF->setComdat(F.getComdat());
+F.getParent()->getFunctionList().insert(F.getIterator(), NF);

Test the comdat? Weird that copyAttributesFrom seems to not cover it



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:339
+// This fails to update call instructions, unfortunately
+// It may therefore also fail to update globals
+F.replaceAllUsesWith(NF);

Add a test with a def and decl in a global initializer, and with a constantexpr 
cast in the initializer. I think this should also handle blockaddress


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:208-209
+
+StructType *VarargsTy = StructType::create(
+Ctx, LocalVarTypes, (Twine(NF->getName()) + ".vararg").str());
+

Should we go for a packed struct forced to align 4? There's no upside to stack 
values with > 4 align


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:22
+// 5/ Delete the remaining parts of the original functions
+//
+//===--===//

Can you expand on the ABI requirements of this in the comment? If we make this 
be the way the ABI works for AMDGPU, we don't need any conditions. Should there 
be some control for the targets with established ABIs to apply this to internal 
functions?



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:38
+
+#include 
+

Don't need cstdio



Comment at: llvm/lib/CodeGen/DesugarVariadics.cpp:102-103
+Value *src = Inst->getSrc();
+Value *ld = Builder.CreateLoad(src->getType(), src, "vacopy");
+Builder.CreateStore(ld, dst);
+Inst->eraseFromParent();

Can you use nontrivial types with this? Might be better to just start with 
memcpy



Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:621
 }
+if (PassName == "desugar-variadics") {
+  PM.addPass(DesugarVariadicsPass());

desugar is a weird name for a lowering pass



Comment at: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll:188
 ; GCN-O1-NEXT:Cycle Info Analysis
+; GCN-O1-NEXT:Desugar Variadics
 ; GCN-O1-NEXT:FunctionPass Manager

This is rather late, I'd think we'd be better off with an earlier run (e.g. as 
part of the initial module passes, along with sanitizers)



Comment at: llvm/test/CodeGen/AMDGPU/unsupported-calls.ll:57-58
 
 ; GCN: in function test_tail_call_bitcast_extern_variadic{{.*}}: unsupported 
required tail call to function extern_variadic
-; R600: in function test_tail_call_bitcast_extern_variadic{{.*}}: unsupported 
call to function extern_variadic
 define i32 @test_tail_call_bitcast_extern_variadic(<4 x float> %arg0, <4 x 
float> %arg1, i32 %arg2) {

Deleted the wrong error?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: libc/config/gpu/entrypoints.txt:84-85
 # stdio.h entrypoints
+libc.src.stdio.snprintf
+libc.src.stdio.vsnprintf
 libc.src.stdio.puts

Split of the libc stuff into a separate patch, the lowering pass should be a 
standalone change


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:40-43
+__device__ void bar(int *out)
+{
+  *out = __builtin_amdgcn_workgroup_size_x();
+}

test all the builtins?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17057
+  Constant *Offset, *OffsetOld;
+  Value *DP, *DP1;
+

Spell out to DispatchPtr?



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:1206-1208
+  getTargetCodeGenInfo().emitTargetGlobals(*this);
+
   getTargetCodeGenInfo().emitTargetMetadata(*this, MangledDeclNames);

These could be one combined hook? this isn't really different from metadata



Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:369-386
+if (CGM.getModule().getNamedGlobal(Name))
+  return;
+
+auto *Type =
+llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size);
+auto *GV = new llvm::GlobalVariable(
+CGM.getModule(), Type, true, Linkage,

You moved GetOrCreateLLVMGlobal but don't use it? 

The lamdba is unnecessary for a single local use



Comment at: clang/lib/Driver/ToolChain.cpp:1369
 if (A->getOption().matches(options::OPT_m_Group)) {
-  if (SameTripleAsHost)
+  // pass code objection version to device toolchain
+  // to correctly set meta-data in intermediate files

Capitalize



Comment at: clang/lib/Driver/ToolChain.cpp:1372
+  if (SameTripleAsHost ||
+  A->getOption().matches(options::OPT_mcode_object_version_EQ))
 DAL->append(A);

Don't understand why this is necessary


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll:76
+}
+
+

arsenm wrote:
> arsenm wrote:
> > Needs some indirect variadic call tests
> Also some metadata and signext/zeroext preservation tests
Also a case where the user isn't the call operand


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll:76
+}
+
+

arsenm wrote:
> Needs some indirect variadic call tests
Also some metadata and signext/zeroext preservation tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:38
+
+#include 
+

Don't need



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:44-47
+static cl::opt
+ApplyToAllOverride(DEBUG_TYPE "-all", cl::init(false),
+   cl::desc("Expand VA intrinsics in all functions"),
+   cl::Hidden);

Don't understand the point of this



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:151
+  bool canTransformFunctionInIsolation(Function ) {
+if (!F.isVarArg() || F.isDeclaration() || !F.hasLocalLinkage() ||
+F.hasAddressTaken() || F.hasFnAttribute(Attribute::Naked)) {

isDefinitionExact?

If this is just how the ABI is going to lower, I don't see why you need to 
worry about any such restrictions



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:181
+
+  bool isByVal = CB->paramHasAttr(I, Attribute::ByVal);
+  if (isByVal)

Probably should defend against sret and the other ABI attributes



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:209
+StructType *VarargsTy = StructType::create(
+Ctx, LocalVarTypes, (Twine(NF->getName()) + ".vararg").str());
+

How is StructType::create not using Twine



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:215-217
+auto alloced = Builder.Insert(
+new AllocaInst(VarargsTy, DL.getAllocaAddrSpace(), nullptr,
+   std::max(MaxFieldAlign, assumedStructAlignment(DL))),

What's wrong with just Builder.CreateAlloca?



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:217
+new AllocaInst(VarargsTy, DL.getAllocaAddrSpace(), nullptr,
+   std::max(MaxFieldAlign, assumedStructAlignment(DL))),
+"vararg_buffer");

what's wrong with the abi type alignment? why does the stack alignment matter?



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:226
+  Builder.CreateStore(Varargs[i].first,
+  r); // alignment info could be better
+}

alignTo shouldn't be difficult



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:229-230
+
+Args.push_back(Builder.CreatePointerBitCastOrAddrSpaceCast(
+alloced, Type::getInt8PtrTy(Ctx)));
+

just CreateAddrSpaceCast should work?



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:253
+  cast(CB)->getTailCallKind());
+}
+

there are more CallBase types than call and invoke now. Can't you just mutate 
the call operand in place?



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:258
+NewCB->setCallingConv(CB->getCallingConv());
+NewCB->copyMetadata(*CB, {LLVMContext::MD_prof, LLVMContext::MD_dbg});
+

I'd assume all metadata would be preservable if this is just how the ABI works



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:272-276
+if (knownNaturalStackAlignment) {
+  return DL.getStackAlignment();
+} else {
+  return {};
+}

no return after else, 



Comment at: llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -expand-va-intrinsics -expand-va-intrinsics-all=true -S < %s | 
FileCheck %s
+

-passes.

Also "generic" codegen tests are a fiction, use a real triple



Comment at: llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll:48
+  call void @llvm.va_start(ptr nonnull %va)
+  %0 = va_arg ptr %va, i32
+  %1 = va_arg ptr %va, double

Use named values



Comment at: llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll:76
+}
+
+

Needs some indirect variadic call tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145648: [clang][Driver] recognize `-ffp-contract=fast-honor-pragmas`

2023-08-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision.
arsenm added a comment.
This revision is now accepted and ready to land.
Herald added a subscriber: wdng.

LGTM, not recognizing this in the driver is incomplete


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145648/new/

https://reviews.llvm.org/D145648

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D76283: [IRBuilder] Use preferred target type for len argument of memory intrinsic functions

2023-08-17 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/include/llvm/IR/IRBuilder.h:438-446
+  ConstantInt *getIntPtrSize(Value *Ptr, uint64_t Size) {
+assert(BB && "Must have a basic block to retrieve the module!");
+
+Module *M = BB->getParent()->getParent();
+auto *PtrType = Ptr->getType();
+unsigned PtrSize = M->getDataLayout().getPointerSizeInBits(
+PtrType->getPointerAddressSpace());

This change on its own might be ok but in the context of what you are trying to 
solve it is not


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76283/new/

https://reviews.llvm.org/D76283

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D76283: [IRBuilder] Use preferred target type for len argument of memory intrinsic functions

2023-08-17 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision.
arsenm added a comment.
This revision now requires changes to proceed.
Herald added a project: All.

I think any size type should be valid for the intrinsic. Legalization should 
have to cast the type to the target libcall if that's how it chooses to 
implement the lowering


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76283/new/

https://reviews.llvm.org/D76283

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157917: clang/HIP: Use abs builtins instead of implementing them

2023-08-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

43f314f5e6cebe02ff63d5197c8e5c25204b20d2 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157917/new/

https://reviews.llvm.org/D157917

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntime.cpp:1996-1997
+llvm::GlobalValue *GV) {
+  std::optional ActiveAttr =
+  OMPDeclareTargetDeclAttr::getActiveAttr(FD);
+

not a huge fan of std::optional



Comment at: clang/lib/CodeGen/CGOpenMPRuntime.cpp:2017
+/*isConstant=*/true,
+llvm::GlobalValue::ExternalLinkage, GV, 
Name);
+Addr->setVisibility(llvm::GlobalValue::ProtectedVisibility);

target global address space



Comment at: clang/lib/CodeGen/CGOpenMPRuntime.cpp:2022
+  OMPBuilder.OffloadInfoManager.registerDeviceGlobalVarEntryInfo(
+  Name, Addr, CGM.getContext().getTypeSize(CGM.getContext().VoidPtrTy) / 8,
+  llvm::OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect,

isn't there a store size?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157917: clang/HIP: Use abs builtins instead of implementing them

2023-08-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
arsenm added reviewers: yaxunl, AlexVlx, JonChesterfield, jhuber6, doru1004.
Herald added a project: All.
arsenm requested review of this revision.
Herald added a subscriber: wdng.

InstCombine already put these back together so there's no visible
change in the -O1 test for the header.


https://reviews.llvm.org/D157917

Files:
  clang/lib/Headers/__clang_hip_math.h


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -315,18 +315,15 @@
 #if defined(__cplusplus)
 __DEVICE__
 int abs(int __x) {
-  int __sgn = __x >> (sizeof(int) * CHAR_BIT - 1);
-  return (__x ^ __sgn) - __sgn;
+  return __builtin_abs(__x);
 }
 __DEVICE__
 long labs(long __x) {
-  long __sgn = __x >> (sizeof(long) * CHAR_BIT - 1);
-  return (__x ^ __sgn) - __sgn;
+  return __builtin_labs(__x);
 }
 __DEVICE__
 long long llabs(long long __x) {
-  long long __sgn = __x >> (sizeof(long long) * CHAR_BIT - 1);
-  return (__x ^ __sgn) - __sgn;
+  return __builtin_llabs(__x);
 }
 #endif
 


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -315,18 +315,15 @@
 #if defined(__cplusplus)
 __DEVICE__
 int abs(int __x) {
-  int __sgn = __x >> (sizeof(int) * CHAR_BIT - 1);
-  return (__x ^ __sgn) - __sgn;
+  return __builtin_abs(__x);
 }
 __DEVICE__
 long labs(long __x) {
-  long __sgn = __x >> (sizeof(long) * CHAR_BIT - 1);
-  return (__x ^ __sgn) - __sgn;
+  return __builtin_labs(__x);
 }
 __DEVICE__
 long long llabs(long long __x) {
-  long long __sgn = __x >> (sizeof(long long) * CHAR_BIT - 1);
-  return (__x ^ __sgn) - __sgn;
+  return __builtin_llabs(__x);
 }
 #endif
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157750: Properly handle -fsplit-machine-functions for fatbinary compilation

2023-08-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1281-1282
+else
+  WithColor::warning()
+  << "-fsplit-machine-functions is only valid for X86.\n";
   }

shenhan wrote:
> arsenm wrote:
> > You cannot spam warnings here. The other instance of printing here looks 
> > like a new addition and should be removed
> Thanks. Do you suggest moving the warnings to the underlying pass? (Although 
> that means we create passes that only issue warnings.)
Move it to the pass, and use a backend remark, not directly print to the 
console (e.g. DiagnosticInfoUnsupported)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157750/new/

https://reviews.llvm.org/D157750

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157911: clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic

2023-08-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
arsenm added reviewers: jcranmer-intel, kpn, sepavloff, andrew.w.kaylor, foad, 
bob80905.
Herald added a subscriber: StephenFan.
Herald added a project: All.
arsenm requested review of this revision.
Herald added a subscriber: wdng.

https://reviews.llvm.org/D157911

Files:
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/constrained-math-builtins.c
  clang/test/CodeGen/math-builtins.c
  clang/test/CodeGenOpenCL/builtins-f16.cl

Index: clang/test/CodeGenOpenCL/builtins-f16.cl
===
--- clang/test/CodeGenOpenCL/builtins-f16.cl
+++ clang/test/CodeGenOpenCL/builtins-f16.cl
@@ -24,6 +24,9 @@
   // CHECK: call half @llvm.exp2.f16(half %h0)
   res = __builtin_exp2f16(h0);
 
+  // CHECK: call half @llvm.exp10.f16(half %h0)
+  res = __builtin_exp10f16(h0);
+
   // CHECK: call half @llvm.floor.f16(half %h0)
   res = __builtin_floorf16(h0);
 
Index: clang/test/CodeGen/math-builtins.c
===
--- clang/test/CodeGen/math-builtins.c
+++ clang/test/CodeGen/math-builtins.c
@@ -318,6 +318,17 @@
 // HAS_ERRNO: declare x86_fp80 @exp2l(x86_fp80 noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare fp128 @exp2f128(fp128 noundef) [[NOT_READNONE]]
 
+__builtin_exp10(f);   __builtin_exp10f(f);  __builtin_exp10l(f); __builtin_exp10f128(f);
+
+// NO__ERRNO: declare double @llvm.exp10.f64(double) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.exp10.f32(float) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.exp10.f80(x86_fp80) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare fp128 @llvm.exp10.f128(fp128) [[READNONE_INTRINSIC]]
+// HAS_ERRNO: declare double @exp10(double noundef) [[NOT_READNONE]]
+// HAS_ERRNO: declare float @exp10f(float noundef) [[NOT_READNONE]]
+// HAS_ERRNO: declare x86_fp80 @exp10l(x86_fp80 noundef) [[NOT_READNONE]]
+// HAS_ERRNO: declare fp128 @exp10f128(fp128 noundef) [[NOT_READNONE]]
+
 __builtin_expm1(f);  __builtin_expm1f(f); __builtin_expm1l(f); __builtin_expm1f128(f);
 
 // NO__ERRNO: declare double @expm1(double noundef) [[READNONE]]
Index: clang/test/CodeGen/constrained-math-builtins.c
===
--- clang/test/CodeGen/constrained-math-builtins.c
+++ clang/test/CodeGen/constrained-math-builtins.c
@@ -64,6 +64,13 @@
 // CHECK: call x86_fp80 @llvm.experimental.constrained.exp2.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 // CHECK: call fp128 @llvm.experimental.constrained.exp2.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
+  __builtin_exp10(f);   __builtin_exp10f(f);  __builtin_exp10l(f); __builtin_exp10f128(f);
+
+// CHECK: call double @exp10(double noundef %{{.*}})
+// CHECK: call float @exp10f(float noundef %{{.*}})
+// CHECK: call x86_fp80 @exp10l(x86_fp80 noundef %{{.*}})
+// CHECK: call fp128 @exp10f128(fp128 noundef %{{.*}})
+
   __builtin_floor(f);  __builtin_floorf(f); __builtin_floorl(f); __builtin_floorf128(f);
 
 // CHECK: call double @llvm.experimental.constrained.floor.f64(double %{{.*}}, metadata !"fpexcept.strict")
@@ -223,6 +230,11 @@
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.exp2.f80(x86_fp80, metadata, metadata)
 // CHECK: declare fp128 @llvm.experimental.constrained.exp2.f128(fp128, metadata, metadata)
 
+// CHECK: declare double @exp10(double noundef)
+// CHECK: declare float @exp10f(float noundef)
+// CHECK: declare x86_fp80 @exp10l(x86_fp80 noundef)
+// CHECK: declare fp128 @exp10f128(fp128 noundef)
+
 // CHECK: declare double @llvm.experimental.constrained.floor.f64(double, metadata)
 // CHECK: declare float @llvm.experimental.constrained.floor.f32(float, metadata)
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.floor.f80(x86_fp80, metadata)
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -2341,7 +2341,16 @@
   return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
Intrinsic::exp2,
Intrinsic::experimental_constrained_exp2));
-
+case Builtin::BI__builtin_exp10:
+case Builtin::BI__builtin_exp10f:
+case Builtin::BI__builtin_exp10f16:
+case Builtin::BI__builtin_exp10l:
+case Builtin::BI__builtin_exp10f128: {
+  // TODO: strictfp support
+  if (Builder.getIsFPConstrained())
+break;
+  return RValue::get(emitUnaryBuiltin(*this, E, Intrinsic::exp10));
+}
 case Builtin::BIfabs:
 case Builtin::BIfabsf:
 case Builtin::BIfabsl:
Index: clang/include/clang/Basic/Builtins.def
===
--- clang/include/clang/Basic/Builtins.def
+++ clang/include/clang/Basic/Builtins.def
@@ 

[PATCH] D156737: clang: Add __builtin_elementwise_sqrt

2023-08-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

9e3d9c9eae03910d93e2312e1e0845433c779998 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156737/new/

https://reviews.llvm.org/D156737

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157750: Properly handle -fsplit-machine-functions for fatbinary compilation

2023-08-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1281-1282
+else
+  WithColor::warning()
+  << "-fsplit-machine-functions is only valid for X86.\n";
   }

You cannot spam warnings here. The other instance of printing here looks like a 
new addition and should be removed


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157750/new/

https://reviews.llvm.org/D157750

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156737: clang: Add __builtin_elementwise_sqrt

2023-08-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 549490.
arsenm added a comment.

Release note


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156737/new/

https://reviews.llvm.org/D156737

Files:
  clang/docs/LanguageExtensions.rst
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaChecking.cpp
  clang/test/CodeGen/builtins-elementwise-math.c
  clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
  clang/test/CodeGenCUDA/correctly-rounded-div.cu
  clang/test/CodeGenOpenCL/fpmath.cl
  clang/test/Sema/builtins-elementwise-math.c
  clang/test/SemaCXX/builtins-elementwise-math.cpp

Index: clang/test/SemaCXX/builtins-elementwise-math.cpp
===
--- clang/test/SemaCXX/builtins-elementwise-math.cpp
+++ clang/test/SemaCXX/builtins-elementwise-math.cpp
@@ -111,6 +111,13 @@
   static_assert(!is_const::value);
 }
 
+void test_builtin_elementwise_sqrt() {
+  const float a = 42.0;
+  float b = 42.3;
+  static_assert(!is_const::value);
+  static_assert(!is_const::value);
+}
+
 void test_builtin_elementwise_log() {
   const float a = 42.0;
   float b = 42.3;
Index: clang/test/Sema/builtins-elementwise-math.c
===
--- clang/test/Sema/builtins-elementwise-math.c
+++ clang/test/Sema/builtins-elementwise-math.c
@@ -601,6 +601,27 @@
   // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned4' (vector of 4 'unsigned int' values))}}
 }
 
+void test_builtin_elementwise_sqrt(int i, float f, double d, float4 v, int3 iv, unsigned u, unsigned4 uv) {
+
+  struct Foo s = __builtin_elementwise_sqrt(f);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'float'}}
+
+  i = __builtin_elementwise_sqrt();
+  // expected-error@-1 {{too few arguments to function call, expected 1, have 0}}
+
+  i = __builtin_elementwise_sqrt(i);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'int')}}
+
+  i = __builtin_elementwise_sqrt(f, f);
+  // expected-error@-1 {{too many arguments to function call, expected 1, have 2}}
+
+  u = __builtin_elementwise_sqrt(u);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned int')}}
+
+  uv = __builtin_elementwise_sqrt(uv);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned4' (vector of 4 'unsigned int' values))}}
+}
+
 void test_builtin_elementwise_trunc(int i, float f, double d, float4 v, int3 iv, unsigned u, unsigned4 uv) {
 
   struct Foo s = __builtin_elementwise_trunc(f);
Index: clang/test/CodeGenOpenCL/fpmath.cl
===
--- clang/test/CodeGenOpenCL/fpmath.cl
+++ clang/test/CodeGenOpenCL/fpmath.cl
@@ -28,6 +28,21 @@
   return __builtin_sqrtf(a);
 }
 
+float elementwise_sqrt_f32(float a) {
+  // CHECK-LABEL: @elementwise_sqrt_f32
+  // NODIVOPT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // DIVOPT: call float @llvm.sqrt.f32(float %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+float4 elementwise_sqrt_v4f32(float4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f32
+  // NODIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // DIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+
 #if __OPENCL_C_VERSION__ >=120
 void printf(constant char* fmt, ...);
 
@@ -61,6 +76,18 @@
   return __builtin_sqrt(a);
 }
 
+double elementwise_sqrt_f64(double a) {
+  // CHECK-LABEL: @elementwise_sqrt_f64
+  // CHECK: call double @llvm.sqrt.f64(double %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+double4 elementwise_sqrt_v4f64(double4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f64
+  // CHECK: call <4 x double> @llvm.sqrt.v4f64(<4 x double> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
 #endif
 
 // NODIVOPT: ![[MD_FDIV]] = !{float 2.50e+00}
Index: clang/test/CodeGenCUDA/correctly-rounded-div.cu
===
--- clang/test/CodeGenCUDA/correctly-rounded-div.cu
+++ clang/test/CodeGenCUDA/correctly-rounded-div.cu
@@ -46,4 +46,18 @@
   return __builtin_sqrt(a);
 }
 
+// COMMON-LABEL: @_Z28test_builtin_elementwise_f32f
+// NCRDIV: call contract float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD:[0-9]+]]
+// CRDIV: call contract float @llvm.sqrt.f32(float %{{.+}}){{$}}
+__device__ float test_builtin_elementwise_f32(float a) {
+  return __builtin_elementwise_sqrt(a);
+}
+
+// COMMON-LABEL: @_Z28test_builtin_elementwise_f64d
+// COMMON: call contract double @llvm.sqrt.f64(double %{{.+}}){{$}}
+// COMMON-NOT: !fpmath
+__device__ double test_builtin_elementwise_f64(double a) {
+  return __builtin_elementwise_sqrt(a);
+}
+
 // NCRSQRT: ![[MD]] = !{float 2.50e+00}
Index: 

[PATCH] D155773: [llvm][MemoryBuiltins] Add alloca support to getInitialValueOfAllocation

2023-08-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp:809-811
+  Updater.AddAvailableValue(
+  Alloca.getParent(),
+  getInitialValueOfAllocation(, nullptr, VectorTy));

This is very specifically handling alloca, not any random allocation like 
function


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155773/new/

https://reviews.llvm.org/D155773

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156737: clang: Add __builtin_elementwise_sqrt

2023-08-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156737/new/

https://reviews.llvm.org/D156737

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-10 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:5237-5238
+
+  if (VTy->isPointerTy() &&
+  VTy->getPointerAddressSpace() != IRTy->getPointerAddressSpace()) 
{
+// In the case of targets that use a non-default address space for

you can also just unconditionally call CreateAddrSpaceCast and let it no-op if 
the types match


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision.
arsenm added a comment.
This revision now requires changes to proceed.

Probably should just wrap uses in macros for now


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157438: [OpenMP] Ensure wrapper headers are included on both host and device

2023-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1190-1191
 // the resource directory at clang/lib/Headers/llvm_libc_wrappers.
-if (C.getActiveOffloadKinds() == Action::OFK_None) {
+if ((getToolChain().getTriple().isNVPTX() ||
+ getToolChain().getTriple().isAMDGCN()) &&
+C.getActiveOffloadKinds() == Action::OFK_None) {

can we do something better than this NVPTX||AMDGCN checks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157438/new/

https://reviews.llvm.org/D157438

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/BackendUtil.cpp:1101-1102
+MPM.addPass(StdParAcceleratorCodeSelectionPass());
+}
+else if (LangOpts.HIPStdParInterposeAlloc) {
+  MPM.addPass(StdParAllocationInterpositionPass());

Formatting


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-08-07 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/Headers/opencl-c-base.h:832
+
+inline float __ovld __cnfn sqrt(float __x) {
+  return __builtin_elementwise_sqrt(__x);

Anastasia wrote:
> arsenm wrote:
> > svenvh wrote:
> > > Anastasia wrote:
> > > > Is this a generic implementation enough? Would some targets not need to 
> > > > do something different for this built-in?
> > > > 
> > > > Ideally this header is to be kept light so I am a bit worried about 
> > > > adding definitions of the functions here. Otherwise we will end up in 
> > > > the same situation as we one day were with opencl-c.h. So could these 
> > > > be left there instead? It might be good to check with @svenvh if 
> > > > TableGen header has already a way to do this function forwarding or can 
> > > > be extended to do such a thing. Then it would be implementable in the 
> > > > both header mechanisms. I don't know if Sven has some other ideas or 
> > > > opinions...
> > > We did already discuss this a bit on the GitHub issue: 
> > > https://github.com/llvm/llvm-project/issues/64264
> > As I mentioned on the ticket, it's only this one case so I'm not worried 
> > about adding a lot more to the base header. I think we can start by 
> > assuming llvm.sqrt always works correctly, I don't want to add more 
> > complexity to handle this case without a specific reason
> > As I mentioned on the ticket, it's only this one case so I'm not worried 
> > about adding a lot more to the base header.
> 
> This is how things normally start. Someone else might want to continue this 
> approach because it is already there.
> 
> >I think we can start by assuming llvm.sqrt always works correctly, I don't 
> >want to add more complexity to handle this case without a specific reason
> 
> Do you mean it would apply to all implementations? What I am missing here is 
> why it is required to be in the headers? Is this because it needs to be 
> inlined or is it because the compiler must see `__builtin_elementwise_sqrt` 
> with the surrounding code where it is called from?
Yes, I would expect any llvm consumer to correctly lower llvm.sqrt, and such an 
implementation would be correctly rounded and pass conformance with 
-cl-fp32-correctly-rounded-divide-sqrt

The goal is to get !fpmath attached to an llvm.sqrt call. The way this 
currently happens is based on the language, it gets emitted from the various 
__builtin_sqrt* calls


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-07 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17143-17145
+  llvm::LoadInst *LD;
+  Constant *Offset, *Offset1;
+  Value *DP, *DP1;

Move down to define and initialize



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17163-17165
+BasicBlock *NewABI = CGF.createBasicBlock("amdgcn.abi.cov5", TheFunction);
+BasicBlock *OldABI = CGF.createBasicBlock("amdgcn.abi.cov4", nullptr);
+BasicBlock *End = CGF.createBasicBlock("amdgcn.abi.end", nullptr);

You could write all of this in terms of selects and avoid introducing all these 
blocks



Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:358
+CodeGen::CodeGenModule ) const {
+  if (!CGM.getTriple().isAMDGCN())
+return;

Don't need this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156989: FloatingPointMode: Use -1 for "Dynamic"

2023-08-04 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156989#4558486 , @sepavloff wrote:

> Support of rounding mode in C standard is based on IEEE-754 model, where 
> rounding mode is a global state and affects all FP operations. The case of 
> two rounding modes does not fit this model. So in C/C++ you anyway need to 
> invent special tools for setting this or that rounding mode or reading them. 
> If static rounding mode is not needed, IEEE-754 rounding mode could be 
> represented by `Dynamic` value.

Correct, I'm not planning on creating special "standard looking" tools. I just 
want the value in the target defined range rather than a black box.

> In IR there are more possibilities to represent many rounding modes. Each 
> constrained intrinsic call contains rounding mode and that mode may be 
> different for different FP types. Actually this model can support the general 
> case. For example, rounding mode for one type can be static but for the other 
> type it can be dynamic. There must be intrinsic functions that set/get 
> rounding mode for different types.

The raw target intrinsic to read the mode register works. Once you're in the 
target range you know you have to do something with target operations

> It looks like adding special bultin functions to get/set rounding mode for 
> different types is enough to support rounding in AMDGPU. In any case IEEE-754 
> rounding mode should be honored, which means that `fegetround` and 
> `FLT_ROUNDS` probably should return negative value, and `fesetround` probably 
> should set all rounding modes. The difference between `Dynamic` and -1 does 
> not matter because `Dynamic` can never be an argument of rounding mode type 
> and `Invalid` (-1) is an error indicator and must not be treated as rounding 
> mode.

My interpretation is fesetround of standard values would set all modes to be 
the same. Once you're outside of the range  it would work correctly to consume 
the same target defined values. "Could not determine" is the same as dynamic, 
so this should just use the standard -1 value instead of 7 just to fit in a 
clang bitfield


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156989/new/

https://reviews.llvm.org/D156989

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-04 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156928#4561811 , @JonChesterfield 
wrote:

> What does code objects version= none mean?

Handle any version


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-04 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D139730#4561619 , @arsenm wrote:

> In D139730#4561575 , @jhuber6 wrote:
>
>> In D139730#4561573 , @arsenm wrote:
>>
>>> In D139730#4561540 , @jhuber6 
>>> wrote:
>>>
 Could you explain briefly what the approach here is? I'm confused as to 
 what's actually changed and how we're handling this difference. I thought 
 if this was just the definition of some builtin function we could just 
 rely on the backend to figure it out. Why do we need to know the code 
 object version inside the device RTL?
>>>
>>> The build is called in the device rtl, so the device RTL needs to contain 
>>> both implementations. The "backend figuring it out" is dead code elimination
>>
>> Okay, do we expect to re-use this interface anywhere? If it's just for 
>> OpenMP then we should probably copy the approach taken for 
>> `__omp_rtl_debug_kind`, which is a global created on the GPU by 
>> `CGOpenMPRuntimeGPU`'s constructor and does more or less the same thing.
>
> device libs replicates the same scheme using its own copy of an equivalent 
> variable. Trying to merge those two together

Although I guess that doesn't really need the builtin changes?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-04 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D139730#4561575 , @jhuber6 wrote:

> In D139730#4561573 , @arsenm wrote:
>
>> In D139730#4561540 , @jhuber6 
>> wrote:
>>
>>> Could you explain briefly what the approach here is? I'm confused as to 
>>> what's actually changed and how we're handling this difference. I thought 
>>> if this was just the definition of some builtin function we could just rely 
>>> on the backend to figure it out. Why do we need to know the code object 
>>> version inside the device RTL?
>>
>> The build is called in the device rtl, so the device RTL needs to contain 
>> both implementations. The "backend figuring it out" is dead code elimination
>
> Okay, do we expect to re-use this interface anywhere? If it's just for OpenMP 
> then we should probably copy the approach taken for `__omp_rtl_debug_kind`, 
> which is a global created on the GPU by `CGOpenMPRuntimeGPU`'s constructor 
> and does more or less the same thing.

device libs replicates the same scheme using its own copy of an equivalent 
variable. Trying to merge those two together


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-04 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D139730#4561540 , @jhuber6 wrote:

> Could you explain briefly what the approach here is? I'm confused as to 
> what's actually changed and how we're handling this difference. I thought if 
> this was just the definition of some builtin function we could just rely on 
> the backend to figure it out. Why do we need to know the code object version 
> inside the device RTL?

The build is called in the device rtl, so the device RTL needs to contain both 
implementations. The "backend figuring it out" is dead code elimination


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-04 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17146
+Value *ABIVersion;
+if (ABIVersionC) {
+  ABIVersion = CGF.Builder.CreateAlignedLoad(CGF.Int32Ty, ABIVersionC,

this must always pass



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:3037
+  if (getImplicitArgsSize() < utils::COV5_SIZE) {
+DP("Setting fields of ImplicitArgs for COV4\n");
+  } else {

This isn't doing anything?



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:36
 
-// The implicit arguments of AMDGPU kernels.
-struct AMDGPUImplicitArgsTy {
-  uint64_t OffsetX;
-  uint64_t OffsetY;
-  uint64_t OffsetZ;
-  uint64_t HostcallPtr;
-  uint64_t Unused0;
-  uint64_t Unused1;
-  uint64_t Unused2;
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,

This is getting duplicated a few places, should it move to a support header?

I don't love the existing APIs for this, I think a struct definition makes more 
sense


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156989: FloatingPointMode: Use -1 for "Dynamic"

2023-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156989#4558133 , @sepavloff wrote:

> Rounding mode is presented in FPOptions with 3 bits, so there is only 8 
> values available for particular modes. 5 of them, which are specified in 
> IEEE-754, are listed in `RoundingMode`. `Dynamic` (which is -1 in 3-bit 
> numbers) is not a real rounding mode,

But it is a spec'd value as -1 for FLT_ROUNDS

> `RoundingMode::Invalid` is not a mode at all, it is used to represent 
> unspecified value at compile-time and can be eliminated by using things like 
> `std::optional`. In 3 bits it would have the same value as `Dynamic`, but it 
> is not a problem, because `Invalid` never appears in AST and IR.

Right it's just filler here

> Probably `Dynamic` is what you need. It prevents from constant folding and 
> other transformations that rely on particular rounding mode and does not 
> restrict actual rounding modes used in runtime. What  prevents from using 
> this mode for your case?

I can do better by reporting something meaningful, two different modes is not 
unknown. The enum here should just be exactly equal to the FLT_ROUNDS values 
and not pick a random other number, I just need the wrong value for Dynamic to 
get out of the way to avoid creating additional wrappers


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156989/new/

https://reviews.llvm.org/D156989

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156989: FloatingPointMode: Use -1 for "Dynamic"

2023-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 546815.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156989/new/

https://reviews.llvm.org/D156989

Files:
  clang/include/clang/Basic/FPOptions.def
  clang/include/clang/Basic/LangOptions.h
  clang/lib/AST/JSONNodeDumper.cpp
  clang/lib/AST/TextNodeDumper.cpp
  clang/lib/Basic/LangOptions.cpp
  llvm/include/llvm/ADT/FloatingPointMode.h

Index: llvm/include/llvm/ADT/FloatingPointMode.h
===
--- llvm/include/llvm/ADT/FloatingPointMode.h
+++ llvm/include/llvm/ADT/FloatingPointMode.h
@@ -35,18 +35,28 @@
 /// rounding mode value, so it does not need to fit the bit fields.
 ///
 enum class RoundingMode : int8_t {
+  // Special values.
+  Invalid = -2, ///< Denotes invalid value.
+  Dynamic = -1, ///< Denotes mode unknown at compile time.
+
   // Rounding mode defined in IEEE-754.
   TowardZero= 0,///< roundTowardZero.
   NearestTiesToEven = 1,///< roundTiesToEven.
   TowardPositive= 2,///< roundTowardPositive.
   TowardNegative= 3,///< roundTowardNegative.
-  NearestTiesToAway = 4,///< roundTiesToAway.
-
-  // Special values.
-  Dynamic = 7,///< Denotes mode unknown at compile time.
-  Invalid = -1///< Denotes invalid value.
+  NearestTiesToAway = 4 ///< roundTiesToAway.
 };
 
+/// Encode a RoundingMode value into a form suitable for a bitfield.
+constexpr int8_t encodeRoundingMode(RoundingMode RM) {
+  return static_cast(RM) + 1;
+}
+
+/// Decode a RoundingMode value from a form suitable for a bitfield.
+constexpr RoundingMode decodeRoundingMode(int8_t Val) {
+  return static_cast(Val - 1);
+}
+
 /// Returns text representation of the given rounding mode.
 inline StringRef spell(RoundingMode RM) {
   switch (RM) {
Index: clang/lib/Basic/LangOptions.cpp
===
--- clang/lib/Basic/LangOptions.cpp
+++ clang/lib/Basic/LangOptions.cpp
@@ -215,7 +215,7 @@
 
 FPOptionsOverride FPOptions::getChangesSlow(const FPOptions ) const {
   FPOptions::storage_type OverrideMask = 0;
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   if (get##NAME() != Base.get##NAME()) \
 OverrideMask |= NAME##Mask;
 #include "clang/Basic/FPOptions.def"
@@ -223,14 +223,14 @@
 }
 
 LLVM_DUMP_METHOD void FPOptions::dump() {
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   llvm::errs() << "\n " #NAME " " << get##NAME();
 #include "clang/Basic/FPOptions.def"
   llvm::errs() << "\n";
 }
 
 LLVM_DUMP_METHOD void FPOptionsOverride::dump() {
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   if (has##NAME##Override())   \
 llvm::errs() << "\n " #NAME " Override is " << get##NAME##Override();
 #include "clang/Basic/FPOptions.def"
Index: clang/lib/AST/TextNodeDumper.cpp
===
--- clang/lib/AST/TextNodeDumper.cpp
+++ clang/lib/AST/TextNodeDumper.cpp
@@ -756,7 +756,7 @@
 }
 
 void TextNodeDumper::printFPOptions(FPOptionsOverride FPO) {
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   if (FPO.has##NAME##Override())   \
 OS << " " #NAME "=" << FPO.get##NAME##Override();
 #include "clang/Basic/FPOptions.def"
Index: clang/lib/AST/JSONNodeDumper.cpp
===
--- clang/lib/AST/JSONNodeDumper.cpp
+++ clang/lib/AST/JSONNodeDumper.cpp
@@ -1740,7 +1740,7 @@
 
 llvm::json::Object JSONNodeDumper::createFPOptions(FPOptionsOverride FPO) {
   llvm::json::Object Ret;
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   if (FPO.has##NAME##Override())   \
 Ret.try_emplace(#NAME, static_cast(FPO.get##NAME##Override()));
 #include "clang/Basic/FPOptions.def"
Index: clang/include/clang/Basic/LangOptions.h
===
--- clang/include/clang/Basic/LangOptions.h
+++ clang/include/clang/Basic/LangOptions.h
@@ -670,7 +670,7 @@
   // Define a fake option named "First" so that we have a PREVIOUS even for the
   // real first option.
   static constexpr storage_type FirstShift = 0, FirstWidth = 0;
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, ENCODE, DECODE, TYPE, WIDTH, 

[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/Headers/opencl-c-base.h:832
+
+inline float __ovld __cnfn sqrt(float __x) {
+  return __builtin_elementwise_sqrt(__x);

svenvh wrote:
> Anastasia wrote:
> > Is this a generic implementation enough? Would some targets not need to do 
> > something different for this built-in?
> > 
> > Ideally this header is to be kept light so I am a bit worried about adding 
> > definitions of the functions here. Otherwise we will end up in the same 
> > situation as we one day were with opencl-c.h. So could these be left there 
> > instead? It might be good to check with @svenvh if TableGen header has 
> > already a way to do this function forwarding or can be extended to do such 
> > a thing. Then it would be implementable in the both header mechanisms. I 
> > don't know if Sven has some other ideas or opinions...
> We did already discuss this a bit on the GitHub issue: 
> https://github.com/llvm/llvm-project/issues/64264
As I mentioned on the ticket, it's only this one case so I'm not worried about 
adding a lot more to the base header. I think we can start by assuming 
llvm.sqrt always works correctly, I don't want to add more complexity to handle 
this case without a specific reason


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 546814.
arsenm marked an inline comment as done.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

Files:
  clang/lib/Headers/opencl-c-base.h
  clang/lib/Headers/opencl-c.h
  clang/lib/Sema/OpenCLBuiltins.td
  clang/test/CodeGenOpenCL/sqrt-fpmath.cl

Index: clang/test/CodeGenOpenCL/sqrt-fpmath.cl
===
--- /dev/null
+++ clang/test/CodeGenOpenCL/sqrt-fpmath.cl
@@ -0,0 +1,201 @@
+// Test that float variants of sqrt are emitted as available_externally inline
+// definitions that call the sqrt intrinsic with appropriate !fpmath metadata
+// depending on -cl-fp32-correctly-rounded-divide-sqrt
+
+// Test with -fdeclare-opencl-builtins
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -fdeclare-opencl-builtins -finclude-default-header -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,DEFAULT %s
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -fdeclare-opencl-builtins -finclude-default-header -cl-fp32-correctly-rounded-divide-sqrt -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,CORRECTLYROUNDED %s
+
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -fdeclare-opencl-builtins -finclude-default-header -cl-unsafe-math-optimizations -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,DEFAULT-UNSAFE %s
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -fdeclare-opencl-builtins -finclude-default-header -cl-fp32-correctly-rounded-divide-sqrt -cl-unsafe-math-optimizations -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,CORRECTLYROUNDED-UNSAFE %s
+
+// Test without -fdeclare-opencl-builtins
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -finclude-default-header -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,DEFAULT %s
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -finclude-default-header -cl-fp32-correctly-rounded-divide-sqrt -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,CORRECTLYROUNDED %s
+
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -finclude-default-header -cl-unsafe-math-optimizations -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,DEFAULT-UNSAFE %s
+// RUN: %clang_cc1 -disable-llvm-passes -triple amdgcn-unknown-unknown -finclude-default-header -cl-fp32-correctly-rounded-divide-sqrt -cl-unsafe-math-optimizations -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,CORRECTLYROUNDED-UNSAFE %s
+
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
+
+// CHECK-LABEL: define {{.*}} float @call_sqrt_f32(
+// CHECK: call {{.*}} float @_Z4sqrtf(float noundef %{{.+}}) #{{[0-9]+$}}
+float call_sqrt_f32(float x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define available_externally float @_Z4sqrtf(float noundef %__x)
+// DEFAULT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath [[$FPMATH:![0-9]+]]{{$}}
+// CORRECTLYROUNDED: call float @llvm.sqrt.f32(float %{{.+}}){{$}}
+
+// DEFAULT-UNSAFE: call reassoc nsz arcp contract afn float @llvm.sqrt.f32(float %{{.+}}), !fpmath [[$FPMATH:![0-9]+]]{{$}}
+// CORRECTLYROUNDED-UNSAFE: call reassoc nsz arcp contract afn float @llvm.sqrt.f32(float %{{.+}}){{$}}
+
+// CHECK-LABEL: define {{.*}} <2 x float> @call_sqrt_v2f32(
+// CHECK: call {{.*}} <2 x float> @_Z4sqrtDv2_f(<2 x float> noundef %{{.*}}) #{{[0-9]+$}}
+float2 call_sqrt_v2f32(float2 x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define available_externally <2 x float> @_Z4sqrtDv2_f(<2 x float> noundef %__x)
+// DEFAULT: call <2 x float> @llvm.sqrt.v2f32(<2 x float> %{{.+}}), !fpmath [[$FPMATH:![0-9]+]]{{$}}
+// CORRECTLYROUNDED: call <2 x float> @llvm.sqrt.v2f32(<2 x float> %{{.+}}){{$}}
+
+// DEFAULT-UNSAFE: call reassoc nsz arcp contract afn <2 x float> @llvm.sqrt.v2f32(<2 x float> %{{.+}}), !fpmath [[$FPMATH:![0-9]+]]{{$}}
+// CORRECTLYROUNDED-UNSAFE: call reassoc nsz arcp contract afn <2 x float> @llvm.sqrt.v2f32(<2 x float> %{{.+}}){{$}}
+
+// CHECK-LABEL: define {{.*}} <3 x float> @call_sqrt_v3f32(
+// CHECK: call {{.*}} <3 x float> @_Z4sqrtDv3_f(<3 x float> noundef %{{.*}}) #{{[0-9]+$}}
+float3 call_sqrt_v3f32(float3 x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define available_externally <3 x float> @_Z4sqrtDv3_f(<3 x float> noundef %__x)
+// DEFAULT: call <3 x float> @llvm.sqrt.v3f32(<3 x float> %{{.+}}), !fpmath [[$FPMATH:![0-9]+]]{{$}}
+// CORRECTLYROUNDED: call <3 x float> @llvm.sqrt.v3f32(<3 x float> %{{.+}}){{$}}
+
+// DEFAULT-UNSAFE: call reassoc nsz arcp contract afn <3 x float> @llvm.sqrt.v3f32(<3 x float> %{{.+}}), !fpmath [[$FPMATH:![0-9]+]]{{$}}
+// CORRECTLYROUNDED-UNSAFE: call reassoc nsz arcp contract afn <3 x float> @llvm.sqrt.v3f32(<3 x float> %{{.+}}){{$}}
+
+
+// CHECK-LABEL: define {{.*}} <4 x float> @call_sqrt_v4f32(
+// CHECK: call {{.*}} <4 x float> @_Z4sqrtDv4_f(<4 x float> noundef %{{.*}}) #{{[0-9]+$}}
+float4 call_sqrt_v4f32(float4 x) {

[PATCH] D156989: FloatingPointMode: Use -1 for "Dynamic"

2023-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
arsenm added reviewers: sepavloff, rjmccall, kpn, cameron.mcinally, uweigand, 
scanon, jcranmer-intel, foad.
Herald added subscribers: StephenFan, tpr.
Herald added a project: All.
arsenm requested review of this revision.
Herald added a subscriber: wdng.
Herald added a project: LLVM.

FLT_ROUNDS says -1 is used for "the default rounding direction is not
known". The previous 7 was taking away options in the
"implementation-defined behavior" range if you just wanted to extend
the enum.

  

AMDGPU has 2 separately controllable rounding modes that change
different fp types. I want to stick to the standard values in the case
the modes are the same, and use the extended range for cases where the
two are different. Dodging this gap in the enum value required
defining the AMDGPU target specific values in a weird way with strange
conversion code to handle it (see https://reviews.llvm.org/D153257).


https://reviews.llvm.org/D156989

Files:
  clang/include/clang/Basic/FPOptions.def
  clang/include/clang/Basic/LangOptions.h
  clang/lib/AST/JSONNodeDumper.cpp
  clang/lib/AST/TextNodeDumper.cpp
  clang/lib/Basic/LangOptions.cpp
  llvm/include/llvm/ADT/FloatingPointMode.h

Index: llvm/include/llvm/ADT/FloatingPointMode.h
===
--- llvm/include/llvm/ADT/FloatingPointMode.h
+++ llvm/include/llvm/ADT/FloatingPointMode.h
@@ -35,18 +35,30 @@
 /// rounding mode value, so it does not need to fit the bit fields.
 ///
 enum class RoundingMode : int8_t {
+  // Special values.
+  Invalid = -2,
+
+  ///< Denotes mode unknown at compile time.
+  Dynamic = -1,
+
   // Rounding mode defined in IEEE-754.
   TowardZero= 0,///< roundTowardZero.
   NearestTiesToEven = 1,///< roundTiesToEven.
   TowardPositive= 2,///< roundTowardPositive.
   TowardNegative= 3,///< roundTowardNegative.
-  NearestTiesToAway = 4,///< roundTiesToAway.
-
-  // Special values.
-  Dynamic = 7,///< Denotes mode unknown at compile time.
-  Invalid = -1///< Denotes invalid value.
+  NearestTiesToAway = 4 ///< roundTiesToAway.
 };
 
+/// Encode a RoundingMode value into a form suitable for a bitfield.
+constexpr int8_t encodeRoundingMode(RoundingMode RM) {
+  return static_cast(RM) + 1;
+}
+
+/// Decode a RoundingMode value from a form suitable for a bitfield.
+constexpr RoundingMode decodeRoundingMode(int8_t Val) {
+  return static_cast(Val - 1);
+}
+
 /// Returns text representation of the given rounding mode.
 inline StringRef spell(RoundingMode RM) {
   switch (RM) {
Index: clang/lib/Basic/LangOptions.cpp
===
--- clang/lib/Basic/LangOptions.cpp
+++ clang/lib/Basic/LangOptions.cpp
@@ -215,7 +215,7 @@
 
 FPOptionsOverride FPOptions::getChangesSlow(const FPOptions ) const {
   FPOptions::storage_type OverrideMask = 0;
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   if (get##NAME() != Base.get##NAME()) \
 OverrideMask |= NAME##Mask;
 #include "clang/Basic/FPOptions.def"
@@ -223,14 +223,14 @@
 }
 
 LLVM_DUMP_METHOD void FPOptions::dump() {
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   llvm::errs() << "\n " #NAME " " << get##NAME();
 #include "clang/Basic/FPOptions.def"
   llvm::errs() << "\n";
 }
 
 LLVM_DUMP_METHOD void FPOptionsOverride::dump() {
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   if (has##NAME##Override())   \
 llvm::errs() << "\n " #NAME " Override is " << get##NAME##Override();
 #include "clang/Basic/FPOptions.def"
Index: clang/lib/AST/TextNodeDumper.cpp
===
--- clang/lib/AST/TextNodeDumper.cpp
+++ clang/lib/AST/TextNodeDumper.cpp
@@ -756,7 +756,7 @@
 }
 
 void TextNodeDumper::printFPOptions(FPOptionsOverride FPO) {
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\
+#define OPTION(NAME, TYPE, ENCODE, DECODE, WIDTH, PREVIOUS)\
   if (FPO.has##NAME##Override())   \
 OS << " " #NAME "=" << FPO.get##NAME##Override();
 #include "clang/Basic/FPOptions.def"
Index: clang/lib/AST/JSONNodeDumper.cpp
===
--- clang/lib/AST/JSONNodeDumper.cpp
+++ clang/lib/AST/JSONNodeDumper.cpp
@@ -1740,7 +1740,7 @@
 
 llvm::json::Object JSONNodeDumper::createFPOptions(FPOptionsOverride FPO) {
   llvm::json::Object Ret;
-#define OPTION(NAME, TYPE, WIDTH, PREVIOUS)\

[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-02 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

missing tests




Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1066
+if (!IsCC1As) {
+  std::string CodeObjVerStr = (CodeObjVer ? Twine(CodeObjVer) : 
"none").str();
   CmdArgs.insert(CmdArgs.begin() + 1,

don't need to go through std::string? stick with Twine everywhere?



Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:2309
+  auto CovStr = StringRef(CodeObjArg->getValue());
+  if(CovStr.starts_with("none")) return;
+  

missing space after if, also return on separate line. Also why starts with, and 
not ==?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156816#4551409 , @Anastasia wrote:

> Why not to just use target address space and define it to some macro with 
> desirable spelling?

If you mean the numbered address spaces, that's the broken thing this is 
specifically trying to disallow. We probably shouldn't even allow you to use 
those numbers. The numbered versions are not treated equivalently, and don't 
have the same enforced semantic rules. For example __constant__ disallows 
storing to it, but address_space(4) does. The lack of casting rules is also an 
issue


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-08-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 546164.
arsenm retitled this revision from "[wip] clang/OpenCL: Add inline 
implementations of sqrt in builtin header" to "clang/OpenCL: Add inline 
implementations of sqrt in builtin header".
arsenm edited the summary of this revision.
arsenm added a comment.

Move to base header and fix test. update_cc_test_checks 
--include-generated-funcs does a terrible job here so write manually checks


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156743/new/

https://reviews.llvm.org/D156743

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Headers/opencl-c-base.h
  clang/lib/Headers/opencl-c.h
  clang/lib/Sema/SemaChecking.cpp
  clang/test/CodeGen/builtins-elementwise-math.c
  clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
  clang/test/CodeGenCUDA/correctly-rounded-div.cu
  clang/test/CodeGenOpenCL/fpmath.cl
  clang/test/CodeGenOpenCL/sqrt-fpmath.cl
  clang/test/Sema/builtins-elementwise-math.c
  clang/test/SemaCXX/builtins-elementwise-math.cpp

Index: clang/test/SemaCXX/builtins-elementwise-math.cpp
===
--- clang/test/SemaCXX/builtins-elementwise-math.cpp
+++ clang/test/SemaCXX/builtins-elementwise-math.cpp
@@ -111,6 +111,13 @@
   static_assert(!is_const::value);
 }
 
+void test_builtin_elementwise_sqrt() {
+  const float a = 42.0;
+  float b = 42.3;
+  static_assert(!is_const::value);
+  static_assert(!is_const::value);
+}
+
 void test_builtin_elementwise_log() {
   const float a = 42.0;
   float b = 42.3;
Index: clang/test/Sema/builtins-elementwise-math.c
===
--- clang/test/Sema/builtins-elementwise-math.c
+++ clang/test/Sema/builtins-elementwise-math.c
@@ -601,6 +601,27 @@
   // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned4' (vector of 4 'unsigned int' values))}}
 }
 
+void test_builtin_elementwise_sqrt(int i, float f, double d, float4 v, int3 iv, unsigned u, unsigned4 uv) {
+
+  struct Foo s = __builtin_elementwise_sqrt(f);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'float'}}
+
+  i = __builtin_elementwise_sqrt();
+  // expected-error@-1 {{too few arguments to function call, expected 1, have 0}}
+
+  i = __builtin_elementwise_sqrt(i);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'int')}}
+
+  i = __builtin_elementwise_sqrt(f, f);
+  // expected-error@-1 {{too many arguments to function call, expected 1, have 2}}
+
+  u = __builtin_elementwise_sqrt(u);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned int')}}
+
+  uv = __builtin_elementwise_sqrt(uv);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned4' (vector of 4 'unsigned int' values))}}
+}
+
 void test_builtin_elementwise_trunc(int i, float f, double d, float4 v, int3 iv, unsigned u, unsigned4 uv) {
 
   struct Foo s = __builtin_elementwise_trunc(f);
Index: clang/test/CodeGenOpenCL/fpmath.cl
===
--- clang/test/CodeGenOpenCL/fpmath.cl
+++ clang/test/CodeGenOpenCL/fpmath.cl
@@ -28,6 +28,21 @@
   return __builtin_sqrtf(a);
 }
 
+float elementwise_sqrt_f32(float a) {
+  // CHECK-LABEL: @elementwise_sqrt_f32
+  // NODIVOPT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // DIVOPT: call float @llvm.sqrt.f32(float %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+float4 elementwise_sqrt_v4f32(float4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f32
+  // NODIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // DIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+
 #if __OPENCL_C_VERSION__ >=120
 void printf(constant char* fmt, ...);
 
@@ -61,6 +76,18 @@
   return __builtin_sqrt(a);
 }
 
+double elementwise_sqrt_f64(double a) {
+  // CHECK-LABEL: @elementwise_sqrt_f64
+  // CHECK: call double @llvm.sqrt.f64(double %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+double4 elementwise_sqrt_v4f64(double4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f64
+  // CHECK: call <4 x double> @llvm.sqrt.v4f64(<4 x double> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
 #endif
 
 // NODIVOPT: ![[MD_FDIV]] = !{float 2.50e+00}
Index: clang/test/CodeGenCUDA/correctly-rounded-div.cu
===
--- clang/test/CodeGenCUDA/correctly-rounded-div.cu
+++ clang/test/CodeGenCUDA/correctly-rounded-div.cu
@@ -46,4 +46,18 @@
   return __builtin_sqrt(a);
 }
 
+// COMMON-LABEL: @_Z28test_builtin_elementwise_f32f
+// NCRDIV: call contract float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD:[0-9]+]]
+// CRDIV: call contract float @llvm.sqrt.f32(float %{{.+}}){{$}}
+__device__ float 

[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

I don't really see the point of doing this. These introduce ambiguous 
terminology. The reason you need the attributes is basically for FFI to opencl 
code, so might as well make the specific meaning clearer with the opencl bit


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156737: clang: Add __builtin_elementwise_sqrt

2023-07-31 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2548
+case Builtin::BI__builtin_sqrtf128:
+case Builtin::BI__builtin_elementwise_sqrt: {
   llvm::Value *Call = emitUnaryMaybeConstrainedFPBuiltin(

bob80905 wrote:
> Nit: I think despite this code working, it should be moved to be grouped with 
> the other elementwise builtins at around line 3240.
> Consider BI__builtin_log at around 2436 and the other log builtins, the 
> bultin_elementwise_log builtin is not in that group since it's with the other 
> elementwise builtins. Writing this case here would make an inconsistency with 
> the placement of the other elementwise builtins.
I think it's more important to group with the emission of the metadata

Also don't understand why there is a broken constrained path and a redundant 
set of constrained handling in IRBuilder


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156737/new/

https://reviews.llvm.org/D156737

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156743: [wip] clang/OpenCL: Add inline implementations of sqrt in builtin header

2023-07-31 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
arsenm added reviewers: yaxunl, svenvh, Anastasia.
Herald added a subscriber: Naghasan.
Herald added a project: All.
arsenm requested review of this revision.
Herald added a subscriber: wdng.

We want the !fpmath metadata to be attached to the sqrt intrinsic to
make it to the backend lowering.

  

Doesn't work with the default case with -fdeclare-opencl-builtins

  

Fixes #64264


https://reviews.llvm.org/D156743

Files:
  clang/lib/Headers/opencl-c.h
  clang/test/CodeGenOpenCL/sqrt-fpmath.cl

Index: clang/test/CodeGenOpenCL/sqrt-fpmath.cl
===
--- /dev/null
+++ clang/test/CodeGenOpenCL/sqrt-fpmath.cl
@@ -0,0 +1,119 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 2
+// REQUIRES: amdgpu-registered-target
+
+// Test with -fdeclare-opencl-builtins
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -fdeclare-opencl-builtins -finclude-default-header -target-cpu hawaii -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,DEFAULT %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -fdeclare-opencl-builtins -finclude-default-header -cl-fp32-correctly-rounded-divide-sqrt -target-cpu hawaii -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,CORRECTLYROUNDED %s
+
+// Test without -fdeclare-opencl-builtins
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -finclude-default-header -target-cpu hawaii -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,DEFAULT %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -finclude-default-header -cl-fp32-correctly-rounded-divide-sqrt -target-cpu hawaii -S -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,CORRECTLYROUNDED %s
+
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
+
+// CHECK-LABEL: define dso_local float @call_sqrt_f32
+// CHECK-SAME: (float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call float @_Z4sqrtf(float noundef [[X]]) #[[ATTR2:[0-9]+]]
+// CHECK-NEXT:ret float [[CALL]]
+//
+float call_sqrt_f32(float x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define dso_local <2 x float> @call_sqrt_v2f32
+// CHECK-SAME: (<2 x float> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call <2 x float> @_Z4sqrtDv2_f(<2 x float> noundef [[X]]) #[[ATTR2]]
+// CHECK-NEXT:ret <2 x float> [[CALL]]
+//
+float2 call_sqrt_v2f32(float2 x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define dso_local <3 x float> @call_sqrt_v3f32
+// CHECK-SAME: (<3 x float> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call <3 x float> @_Z4sqrtDv3_f(<3 x float> noundef [[X]]) #[[ATTR2]]
+// CHECK-NEXT:ret <3 x float> [[CALL]]
+//
+float3 call_sqrt_v3f32(float3 x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define dso_local <4 x float> @call_sqrt_v4f32
+// CHECK-SAME: (<4 x float> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call <4 x float> @_Z4sqrtDv4_f(<4 x float> noundef [[X]]) #[[ATTR2]]
+// CHECK-NEXT:ret <4 x float> [[CALL]]
+//
+float4 call_sqrt_v4f32(float4 x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define dso_local <8 x float> @call_sqrt_v8f32
+// CHECK-SAME: (<8 x float> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call <8 x float> @_Z4sqrtDv8_f(<8 x float> noundef [[X]]) #[[ATTR2]]
+// CHECK-NEXT:ret <8 x float> [[CALL]]
+//
+float8 call_sqrt_v8f32(float8 x) {
+  return sqrt(x);
+}
+
+// CHECK-LABEL: define dso_local <16 x float> @call_sqrt_v16f32
+// CHECK-SAME: (<16 x float> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call <16 x float> @_Z4sqrtDv16_f(<16 x float> noundef [[X]]) #[[ATTR2]]
+// CHECK-NEXT:ret <16 x float> [[CALL]]
+//
+float16 call_sqrt_v16f32(float16 x) {
+  return sqrt(x);
+}
+
+// Not for f64
+// CHECK-LABEL: define dso_local double @call_sqrt_f64
+// CHECK-SAME: (double noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call double @_Z4sqrtd(double noundef [[X]]) #[[ATTR2]]
+// CHECK-NEXT:ret double [[CALL]]
+//
+double call_sqrt_f64(double x) {
+  return sqrt(x);
+}
+
+// Not for f64
+// CHECK-LABEL: define dso_local <2 x double> @call_sqrt_v2f64
+// CHECK-SAME: (<2 x double> noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[CALL:%.*]] = tail call <2 x double> @_Z4sqrtDv2_d(<2 x double> noundef [[X]]) #[[ATTR2]]
+// CHECK-NEXT:ret <2 x double> [[CALL]]
+//
+double2 call_sqrt_v2f64(double2 x) {
+  return sqrt(x);
+}
+
+// Not for f64
+// CHECK-LABEL: define dso_local half @call_sqrt_f16
+// CHECK-SAME: (half noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0]] 

[PATCH] D156737: clang: Add __builtin_elementwise_sqrt

2023-07-31 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
arsenm added reviewers: yaxunl, fhahn, bob80905.
Herald added subscribers: StephenFan, Anastasia.
Herald added a project: All.
arsenm requested review of this revision.
Herald added a subscriber: wdng.

This will be used in the opencl builtin headers to provide direct
intrinsic access with proper !fpmath metadata.


https://reviews.llvm.org/D156737

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaChecking.cpp
  clang/test/CodeGen/builtins-elementwise-math.c
  clang/test/CodeGen/strictfp-elementwise-bulitins.cpp
  clang/test/CodeGenCUDA/correctly-rounded-div.cu
  clang/test/CodeGenOpenCL/fpmath.cl
  clang/test/Sema/builtins-elementwise-math.c
  clang/test/SemaCXX/builtins-elementwise-math.cpp

Index: clang/test/SemaCXX/builtins-elementwise-math.cpp
===
--- clang/test/SemaCXX/builtins-elementwise-math.cpp
+++ clang/test/SemaCXX/builtins-elementwise-math.cpp
@@ -111,6 +111,13 @@
   static_assert(!is_const::value);
 }
 
+void test_builtin_elementwise_sqrt() {
+  const float a = 42.0;
+  float b = 42.3;
+  static_assert(!is_const::value);
+  static_assert(!is_const::value);
+}
+
 void test_builtin_elementwise_log() {
   const float a = 42.0;
   float b = 42.3;
Index: clang/test/Sema/builtins-elementwise-math.c
===
--- clang/test/Sema/builtins-elementwise-math.c
+++ clang/test/Sema/builtins-elementwise-math.c
@@ -601,6 +601,27 @@
   // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned4' (vector of 4 'unsigned int' values))}}
 }
 
+void test_builtin_elementwise_sqrt(int i, float f, double d, float4 v, int3 iv, unsigned u, unsigned4 uv) {
+
+  struct Foo s = __builtin_elementwise_sqrt(f);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'float'}}
+
+  i = __builtin_elementwise_sqrt();
+  // expected-error@-1 {{too few arguments to function call, expected 1, have 0}}
+
+  i = __builtin_elementwise_sqrt(i);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'int')}}
+
+  i = __builtin_elementwise_sqrt(f, f);
+  // expected-error@-1 {{too many arguments to function call, expected 1, have 2}}
+
+  u = __builtin_elementwise_sqrt(u);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned int')}}
+
+  uv = __builtin_elementwise_sqrt(uv);
+  // expected-error@-1 {{1st argument must be a floating point type (was 'unsigned4' (vector of 4 'unsigned int' values))}}
+}
+
 void test_builtin_elementwise_trunc(int i, float f, double d, float4 v, int3 iv, unsigned u, unsigned4 uv) {
 
   struct Foo s = __builtin_elementwise_trunc(f);
Index: clang/test/CodeGenOpenCL/fpmath.cl
===
--- clang/test/CodeGenOpenCL/fpmath.cl
+++ clang/test/CodeGenOpenCL/fpmath.cl
@@ -28,6 +28,21 @@
   return __builtin_sqrtf(a);
 }
 
+float elementwise_sqrt_f32(float a) {
+  // CHECK-LABEL: @elementwise_sqrt_f32
+  // NODIVOPT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // DIVOPT: call float @llvm.sqrt.f32(float %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+float4 elementwise_sqrt_v4f32(float4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f32
+  // NODIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // DIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+
 #if __OPENCL_C_VERSION__ >=120
 void printf(constant char* fmt, ...);
 
@@ -61,6 +76,18 @@
   return __builtin_sqrt(a);
 }
 
+double elementwise_sqrt_f64(double a) {
+  // CHECK-LABEL: @elementwise_sqrt_f64
+  // CHECK: call double @llvm.sqrt.f64(double %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+double4 elementwise_sqrt_v4f64(double4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f64
+  // CHECK: call <4 x double> @llvm.sqrt.v4f64(<4 x double> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
 #endif
 
 // NODIVOPT: ![[MD_FDIV]] = !{float 2.50e+00}
Index: clang/test/CodeGenCUDA/correctly-rounded-div.cu
===
--- clang/test/CodeGenCUDA/correctly-rounded-div.cu
+++ clang/test/CodeGenCUDA/correctly-rounded-div.cu
@@ -46,4 +46,18 @@
   return __builtin_sqrt(a);
 }
 
+// COMMON-LABEL: @_Z28test_builtin_elementwise_f32f
+// NCRDIV: call contract float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD:[0-9]+]]
+// CRDIV: call contract float @llvm.sqrt.f32(float %{{.+}}){{$}}
+__device__ float test_builtin_elementwise_f32(float a) {
+  return __builtin_elementwise_sqrt(a);
+}
+
+// COMMON-LABEL: @_Z28test_builtin_elementwise_f64d
+// COMMON: call contract double @llvm.sqrt.f64(double %{{.+}}){{$}}
+// COMMON-NOT: !fpmath
+__device__ double 

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/test/CodeGen/dynamic-alloca-with-address-space.c:1
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s
+

AlexVlx wrote:
> arsenm wrote:
> > Can you add an opencl 1.2 and 2.0 run line too
> This is not valid 1.2 code, for 2.0 sure.
well also need a 1.2 flavored version then with the __privates in it (or use 
some macro trickery)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/test/CodeGen/dynamic-alloca-with-address-space.c:1
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s
+

Can you add an opencl 1.2 and 2.0 run line too



Comment at: clang/test/CodeGen/dynamic-alloca-with-address-space.c:12
+// CHECK: store i64 %n, ptr %n.addr.ascast, align 8
+// CHECK: %0 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %1 = alloca i8, i64 %0, align 8, addrspace(5)

Use generated checks


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86154: AMDGPU: Add llvm.amdgcn.{read,readfirst,write}lane2 intrinsics with type overloads

2023-07-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision.
arsenm added a comment.
This revision now requires changes to proceed.
Herald added subscribers: nlopes, StephenFan.
Herald added a project: All.

Should be obsoleted by D147732 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86154/new/

https://reviews.llvm.org/D86154

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:3540
+  return RValue::get(Builder.CreateAddrSpaceCast(AI, CGM.Int8PtrTy));
+else
+  return RValue::get(AI);

No return after else



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:3558
+  return RValue::get(Builder.CreateAddrSpaceCast(AI, CGM.Int8PtrTy));
+else
+  return RValue::get(AI);

No return after else


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156539#4542836 , @rjmccall wrote:

> We should probably write this code to work properly in case we add a target 
> that makes `__builtin_alloca` return a pointer in the private address space.  
> Could you recover the target AS from the type of the expression instead of 
> assuming `LangAS::Default`?

This should happen for opencl today with amdgpu


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156357: clang: Add elementwise bitreverse builtin

2023-07-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:634
  the most negative integer remains 
the most negative integer
- T __builtin_elementwise_fma(T x, T y, T z)  fused multiply add, (x * y) +  z. 
 floating point types
+ T __builtin_elementwise_fma(T x, T y, T z)  fused multiply add, (x * y) +  z. 
   floating point types
  T __builtin_elementwise_ceil(T x)   return the smallest integral 
value greater than or equal to xfloating point types

Unrelated but I noticed a couple of the elementwise builtins are missing from 
this list if you're fixing up the docs for them. Can't remember which off the 
top of my head



Comment at: clang/test/Sema/builtins-elementwise-math.c:272
 
+void test_builtin_elementwise_bitreverse(int i, float f, double d, float4 v, 
int3 iv, unsigned u, unsigned4 uv) {
+

Test the vector of float case?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156357/new/

https://reviews.llvm.org/D156357

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156366: HIP: Use __builtin_sqrt instead of routing through ocml sqrt for f64

2023-07-26 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

63dbe7e808d07bdf25bad85301980bc323b0cd64


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156366/new/

https://reviews.llvm.org/D156366

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2023-07-26 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9461-9463
+  bool CorrectSqrt = CGM.getLangOpts().OpenCL
+ ? CGM.getCodeGenOpts().OpenCLCorrectlyRoundedDivSqrt
+ : CGM.getCodeGenOpts().HIPCorrectlyRoundedDivSqrt;

Can we move this into something more proper in LangOpts?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9467
+  AddGlobal("__oclc_daz_opt", DenormAreZero, /*Size=*/8);
+  AddGlobal("__oclc_finite_only_opt", FiniteOnly || RelaxedMath, /*Size=*/8);
+  AddGlobal("__oclc_unsafe_math_opt", UnsafeMath || RelaxedMath, /*Size=*/8);

I'd hope you don't have to check relaxed math, finite only should suffice



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9476
+llvm::GlobalValue::LinkOnceODRLinkage);
+  AddGlobal("__oclc_ABI_version",
+CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32,

This should probably get an __llvm_amdgcn prefix and be renamed


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2023-07-26 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

We should just do this now. clang shouldn't have to dig around on disk to emit 
a constant definition for a constant it already knows, and we have a clear path 
to removing these globals altogether. I have adequate patches to completely 
delete `__oclc_daz_opt` today. `__oclc_finite_only_opt` should be deleteable as 
soon as nofpclass is inferred by default. Deleting 
`__oclc_correctly_rounded_sqrt32` and `__oclc_unsafe_math_opt` require more 
work, but are basically the same thing and require extending the libcall 
optimizer pass.

It will be easier to delete these from the library as they become unnecessary 
if clang stops enforcing these files exists like it does today, and it's easier 
to just stop using them entirely than to delete them one at a time


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156040: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output

2023-07-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156040#4526036 , @JonChesterfield 
wrote:

> What's the use case I'm missing which makes this flag necessary/beneficial?

The metadata is also irrelevant to this patch which is just reporting 
optimization hint information


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156040/new/

https://reviews.llvm.org/D156040

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156040: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output

2023-07-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision.
arsenm added a comment.
This revision is now accepted and ready to land.

lgtm with nits




Comment at: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:1298
+  CurrentProgramInfo.DynamicCallStack ? "True" : "False";
+  EmitResourceUsageRemark("UsesDynamicStack", "Uses Dynamic Stack",
+  UsesDynamicStackStr);

Drop "Uses". Could also just inline the bool->string



Comment at: llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll:189
+
+declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)
+ 

No typed pointers


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156040/new/

https://reviews.llvm.org/D156040

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153310: clang: Add elementwise pow builtin

2023-07-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision.
arsenm added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/test/CodeGen/strictfp-elementwise-bulitins.cpp:223
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call <4 x float> @llvm.pow.v4f32(<4 x 
float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT:ret <4 x float> [[TMP0]]

This is broken but I know this is a pre-existing broken with the others


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153310/new/

https://reviews.llvm.org/D153310

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153310: Add codegen for llvm pow builtin

2023-07-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

Code looks fine, patch title is a bit confusing. Don't say codegen, and say 
clang: Add elementwise pow builtin?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153310/new/

https://reviews.llvm.org/D153310

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154123: [HIP] Start document HIP support by clang

2023-07-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/docs/HIPSupport.rst:65
+
+   clang++ --offload-arch=gfx906 -xhip sample.cpp -o sample
+

scchan wrote:
> missing --hip-link
What does hip-link do? Why is it needed? I never use it and it works


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154123/new/

https://reviews.llvm.org/D154123

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156040: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output

2023-07-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/test/Frontend/amdgcn-machine-analysis-remarks.cl:13
+// expected-remark@+2 {{LDS Size [bytes/block]: 0}}
+// expected-remark@+1 {{Uses Dynamic Stack: False}}
 __kernel void foo() {

Print right after the scratch size



Comment at: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:1305
 CurrentProgramInfo.LDSSize);
+  std::string UsesDynamicStackStr =
+  CurrentProgramInfo.DynamicCallStack ? "True" : "False";

don't need std::string for simple literals? StringRef?



Comment at: llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll:168
+}
+
+

Maybe add another that has a static component too


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156040/new/

https://reviews.llvm.org/D156040

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156040: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output

2023-07-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

In D156040#4526036 , @JonChesterfield 
wrote:

> I don't see how this conveys any information. The compiler writes the stack 
> size to be allocated. If it doesn't know what is sufficient, it's going to 
> request some maximum and hope for the best.

That was the old broken workaround for the old bit that was never actually 
implemented in the runtime. The runtime now does properly respect some field to 
switch to interpreting the reported size as a minimum and then allocates the 
max of that minimum and some API provided size value


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156040/new/

https://reviews.llvm.org/D156040

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D147732: [AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsics

2023-07-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp:187
 
+Value *AMDGPULateCodeGenPrepare::buildLegalLaneIntrinsic(
+IRBuilder<> , Intrinsic::ID IID, Value *Data0, Value *Data1, Value 
*Lane0,

jrbyrnes wrote:
> arsenm wrote:
> > You're not relying on this for correctness are you? This is an optimization 
> > pass, you can't lower here. You also shouldn't need to handle this in the 
> > IR, it should codegen normally 
> This is the legalization for non 32bit types -- I don't exactly know why it 
> wasn't handled via the normal codegen / selection process. @nhaehnle , I 
> believe you tried this in https://reviews.llvm.org/D86154 -- do you happen to 
> remember why we do legalization this way? If not, I'll rework the approach.
CodeGenPrepare/LateCodeGenPrepare can't be used for lowering, they're 
optimization passes. Legalization needs to be handled in the codegen


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147732/new/

https://reviews.llvm.org/D147732

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155982: Partially revert "clang/HIP: Remove __llvm_amdgcn_* wrapper hacks"

2023-07-22 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

9b2dfff57a382b757c358b43ee1df7591cb480ee 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155982/new/

https://reviews.llvm.org/D155982

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155982: Partially revert "clang/HIP: Remove __llvm_amdgcn_* wrapper hacks"

2023-07-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 543111.
arsenm marked an inline comment as done.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155982/new/

https://reviews.llvm.org/D155982

Files:
  clang/lib/Headers/__clang_hip_libdevice_declares.h
  clang/test/Headers/__clang_hip_math_deprecated.hip


Index: clang/test/Headers/__clang_hip_math_deprecated.hip
===
--- /dev/null
+++ clang/test/Headers/__clang_hip_math_deprecated.hip
@@ -0,0 +1,29 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \
+// RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
+// RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \
+// RUN:   -target-cpu gfx906 -emit-llvm %s -fcuda-is-device -O1 -o - \
+// RUN:   -D__HIPCC_RTC__ | FileCheck %s
+
+// Test deprecated functions in the header that should be removed eventually
+
+// CHECK-LABEL: @test_rcpf16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract half 0xH3C00, [[X:%.*]]
+// CHECK-NEXT:ret half [[DIV_I]]
+//
+extern "C" __device__ _Float16 test_rcpf16_wrapper(_Float16 x) {
+  return __llvm_amdgcn_rcp_f16(x);
+}
+
+// CHECK-LABEL: @test_rcp2f16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract <2 x half> , [[X:%.*]]
+// CHECK-NEXT:ret <2 x half> [[DIV_I]]
+//
+extern "C" __device__ __2f16 test_rcp2f16_wrapper(__2f16 x) {
+  return __llvm_amdgcn_rcp_2f16(x);
+}
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,6 +10,10 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#if !defined(__HIPCC_RTC__) && __has_include("hip/hip_version.h")
+#include "hip/hip_version.h"
+#endif // __has_include("hip/hip_version.h")
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -312,6 +316,29 @@
 __device__ __attribute__((pure)) __2f16 __ocml_log_2f16(__2f16);
 __device__ __attribute__((pure)) __2f16 __ocml_log10_2f16(__2f16);
 __device__ __attribute__((pure)) __2f16 __ocml_log2_2f16(__2f16);
+
+#if HIP_VERSION_MAJOR * 100 + HIP_VERSION_MINOR >= 560 || 1
+#define __DEPRECATED_SINCE_HIP_560(X) __attribute__((deprecated(X)))
+#else
+#define __DEPRECATED_SINCE_HIP_560(X)
+#endif
+
+// Deprecated, should be removed when rocm releases using it are no longer
+// relevant.
+__DEPRECATED_SINCE_HIP_560("use ((_Float16)1.0) / ")
+__device__ inline _Float16 __llvm_amdgcn_rcp_f16(_Float16 x) {
+  return ((_Float16)1.0f) / x;
+}
+
+__DEPRECATED_SINCE_HIP_560("use ((__2f16)1.0) / ")
+__device__ inline __2f16
+__llvm_amdgcn_rcp_2f16(__2f16 __x)
+{
+  return ((__2f16)1.0f) / __x;
+}
+
+#undef __DEPRECATED_SINCE_HIP_560
+
 __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16);
 __device__ __2f16 __ocml_sin_2f16(__2f16);


Index: clang/test/Headers/__clang_hip_math_deprecated.hip
===
--- /dev/null
+++ clang/test/Headers/__clang_hip_math_deprecated.hip
@@ -0,0 +1,29 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \
+// RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
+// RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \
+// RUN:   -target-cpu gfx906 -emit-llvm %s -fcuda-is-device -O1 -o - \
+// RUN:   -D__HIPCC_RTC__ | FileCheck %s
+
+// Test deprecated functions in the header that should be removed eventually
+
+// CHECK-LABEL: @test_rcpf16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract half 0xH3C00, [[X:%.*]]
+// CHECK-NEXT:ret half [[DIV_I]]
+//
+extern "C" __device__ _Float16 test_rcpf16_wrapper(_Float16 x) {
+  return __llvm_amdgcn_rcp_f16(x);
+}
+
+// CHECK-LABEL: @test_rcp2f16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract <2 x half> , [[X:%.*]]
+// CHECK-NEXT:ret <2 x half> [[DIV_I]]
+//
+extern "C" __device__ __2f16 test_rcp2f16_wrapper(__2f16 x) {
+  return __llvm_amdgcn_rcp_2f16(x);
+}
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,6 +10,10 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#if !defined(__HIPCC_RTC__) && __has_include("hip/hip_version.h")
+#include 

[PATCH] D155982: Partially revert "clang/HIP: Remove __llvm_amdgcn_* wrapper hacks"

2023-07-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 543025.
arsenm added a comment.

Add versioned deprecated macro


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155982/new/

https://reviews.llvm.org/D155982

Files:
  clang/lib/Headers/__clang_hip_libdevice_declares.h
  clang/test/Headers/__clang_hip_math_deprecated.hip


Index: clang/test/Headers/__clang_hip_math_deprecated.hip
===
--- /dev/null
+++ clang/test/Headers/__clang_hip_math_deprecated.hip
@@ -0,0 +1,29 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \
+// RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
+// RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \
+// RUN:   -target-cpu gfx906 -emit-llvm %s -fcuda-is-device -O1 -o - \
+// RUN:   -D__HIPCC_RTC__ | FileCheck %s
+
+// Test deprecated functions in the header that should be removed eventually
+
+// CHECK-LABEL: @test_rcpf16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract half 0xH3C00, [[X:%.*]]
+// CHECK-NEXT:ret half [[DIV_I]]
+//
+extern "C" __device__ _Float16 test_rcpf16_wrapper(_Float16 x) {
+  return __llvm_amdgcn_rcp_f16(x);
+}
+
+// CHECK-LABEL: @test_rcp2f16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract <2 x half> , [[X:%.*]]
+// CHECK-NEXT:ret <2 x half> [[DIV_I]]
+//
+extern "C" __device__ __2f16 test_rcp2f16_wrapper(__2f16 x) {
+  return __llvm_amdgcn_rcp_2f16(x);
+}
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,6 +10,10 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#if __has_include("hip/hip_version.h")
+#include "hip/hip_version.h"
+#endif // __has_include("hip/hip_version.h")
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -312,6 +316,29 @@
 __device__ __attribute__((pure)) __2f16 __ocml_log_2f16(__2f16);
 __device__ __attribute__((pure)) __2f16 __ocml_log10_2f16(__2f16);
 __device__ __attribute__((pure)) __2f16 __ocml_log2_2f16(__2f16);
+
+#if HIP_VERSION_MAJOR * 100 + HIP_VERSION_MINOR >= 560
+#define __DEPRECATED_SINCE_HIP_560 __attribute__((deprecated))
+#else
+#define __DEPRECATED_SINCE_HIP_560
+#endif
+
+// Deprecated, should be removed when rocm releases using it are no longer
+// relevant.
+__DEPRECATED_SINCE_HIP_560
+__device__ inline _Float16 __llvm_amdgcn_rcp_f16(_Float16 x) {
+  return ((_Float16)1.0f) / x;
+}
+
+__DEPRECATED_SINCE_HIP_560
+__device__ inline __2f16
+__llvm_amdgcn_rcp_2f16(__2f16 __x)
+{
+  return ((__2f16)1.0f) / __x;
+}
+
+#undef __DEPRECATED_SINCE_HIP_560
+
 __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16);
 __device__ __2f16 __ocml_sin_2f16(__2f16);


Index: clang/test/Headers/__clang_hip_math_deprecated.hip
===
--- /dev/null
+++ clang/test/Headers/__clang_hip_math_deprecated.hip
@@ -0,0 +1,29 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \
+// RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
+// RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \
+// RUN:   -target-cpu gfx906 -emit-llvm %s -fcuda-is-device -O1 -o - \
+// RUN:   -D__HIPCC_RTC__ | FileCheck %s
+
+// Test deprecated functions in the header that should be removed eventually
+
+// CHECK-LABEL: @test_rcpf16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract half 0xH3C00, [[X:%.*]]
+// CHECK-NEXT:ret half [[DIV_I]]
+//
+extern "C" __device__ _Float16 test_rcpf16_wrapper(_Float16 x) {
+  return __llvm_amdgcn_rcp_f16(x);
+}
+
+// CHECK-LABEL: @test_rcp2f16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract <2 x half> , [[X:%.*]]
+// CHECK-NEXT:ret <2 x half> [[DIV_I]]
+//
+extern "C" __device__ __2f16 test_rcp2f16_wrapper(__2f16 x) {
+  return __llvm_amdgcn_rcp_2f16(x);
+}
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,6 +10,10 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#if __has_include("hip/hip_version.h")
+#include "hip/hip_version.h"
+#endif // __has_include("hip/hip_version.h")
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ 

[PATCH] D85917: [MSP430] Fix passing C structs and unions as function arguments

2023-07-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.
Herald added a project: All.

Is there a reason this never landed?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85917/new/

https://reviews.llvm.org/D85917

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155982: Partially revert "clang/HIP: Remove __llvm_amdgcn_* wrapper hacks"

2023-07-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/Headers/__clang_hip_libdevice_declares.h:319
+// relevant.
+__device__ inline _Float16 __llvm_amdgcn_rcp_f16(_Float16 x) {
+  return ((_Float16)1.0f) / x;

arsenm wrote:
> yaxunl wrote:
> > Can we add the deprecated attribute to urge people not to use them?
> I initially added those, but thought it was a bit aggressive to put in always 
> included headers. Is there an established practice for deprecating builtin 
> header functions?
Is there a hip header version macro I could guard this on?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155982/new/

https://reviews.llvm.org/D155982

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155982: Partially revert "clang/HIP: Remove __llvm_amdgcn_* wrapper hacks"

2023-07-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/lib/Headers/__clang_hip_libdevice_declares.h:319
+// relevant.
+__device__ inline _Float16 __llvm_amdgcn_rcp_f16(_Float16 x) {
+  return ((_Float16)1.0f) / x;

yaxunl wrote:
> Can we add the deprecated attribute to urge people not to use them?
I initially added those, but thought it was a bit aggressive to put in always 
included headers. Is there an established practice for deprecating builtin 
header functions?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155982/new/

https://reviews.llvm.org/D155982

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155982: Partially revert "clang/HIP: Remove __llvm_amdgcn_* wrapper hacks"

2023-07-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
arsenm added a reviewer: yaxunl.
Herald added a project: All.
arsenm requested review of this revision.
Herald added a subscriber: wdng.

Revert part of f407a7399575a6821940973c54754d42e72dd9ce 
.

  

Some of the HIP headers were using the f16 rcp inline, such that it
breaks compiling code against non-top-of-tree headers. Need to wait
for a few HIP releases to expire to fully remove these.

  

Fixes #63981


https://reviews.llvm.org/D155982

Files:
  clang/lib/Headers/__clang_hip_libdevice_declares.h
  clang/test/Headers/__clang_hip_math_deprecated.hip


Index: clang/test/Headers/__clang_hip_math_deprecated.hip
===
--- /dev/null
+++ clang/test/Headers/__clang_hip_math_deprecated.hip
@@ -0,0 +1,29 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \
+// RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
+// RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \
+// RUN:   -target-cpu gfx906 -emit-llvm %s -fcuda-is-device -O1 -o - \
+// RUN:   -D__HIPCC_RTC__ | FileCheck %s
+
+// Test deprecated functions in the header that should be removed eventually
+
+// CHECK-LABEL: @test_rcpf16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract half 0xH3C00, [[X:%.*]]
+// CHECK-NEXT:ret half [[DIV_I]]
+//
+extern "C" __device__ _Float16 test_rcpf16_wrapper(_Float16 x) {
+  return __llvm_amdgcn_rcp_f16(x);
+}
+
+// CHECK-LABEL: @test_rcp2f16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract <2 x half> , [[X:%.*]]
+// CHECK-NEXT:ret <2 x half> [[DIV_I]]
+//
+extern "C" __device__ __2f16 test_rcp2f16_wrapper(__2f16 x) {
+  return __llvm_amdgcn_rcp_2f16(x);
+}
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -279,6 +279,7 @@
 __device__ __attribute__((pure)) _Float16 __ocml_log_f16(_Float16);
 __device__ __attribute__((pure)) _Float16 __ocml_log10_f16(_Float16);
 __device__ __attribute__((pure)) _Float16 __ocml_log2_f16(_Float16);
+
 __device__ __attribute__((const)) _Float16 __ocml_rint_f16(_Float16);
 __device__ __attribute__((const)) _Float16 __ocml_rsqrt_f16(_Float16);
 __device__ _Float16 __ocml_sin_f16(_Float16);
@@ -312,6 +313,18 @@
 __device__ __attribute__((pure)) __2f16 __ocml_log_2f16(__2f16);
 __device__ __attribute__((pure)) __2f16 __ocml_log10_2f16(__2f16);
 __device__ __attribute__((pure)) __2f16 __ocml_log2_2f16(__2f16);
+
+// Deprecated, should be removed when rocm releases using it are no longer
+// relevant.
+__device__ inline _Float16 __llvm_amdgcn_rcp_f16(_Float16 x) {
+  return ((_Float16)1.0f) / x;
+}
+
+__device__ inline __2f16
+__llvm_amdgcn_rcp_2f16(__2f16 __x)
+{
+  return ((__2f16)1.0f) / __x;
+}
 __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16);
 __device__ __2f16 __ocml_sin_2f16(__2f16);


Index: clang/test/Headers/__clang_hip_math_deprecated.hip
===
--- /dev/null
+++ clang/test/Headers/__clang_hip_math_deprecated.hip
@@ -0,0 +1,29 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \
+// RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
+// RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \
+// RUN:   -target-cpu gfx906 -emit-llvm %s -fcuda-is-device -O1 -o - \
+// RUN:   -D__HIPCC_RTC__ | FileCheck %s
+
+// Test deprecated functions in the header that should be removed eventually
+
+// CHECK-LABEL: @test_rcpf16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract half 0xH3C00, [[X:%.*]]
+// CHECK-NEXT:ret half [[DIV_I]]
+//
+extern "C" __device__ _Float16 test_rcpf16_wrapper(_Float16 x) {
+  return __llvm_amdgcn_rcp_f16(x);
+}
+
+// CHECK-LABEL: @test_rcp2f16_wrapper(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DIV_I:%.*]] = fdiv contract <2 x half> , [[X:%.*]]
+// CHECK-NEXT:ret <2 x half> [[DIV_I]]
+//
+extern "C" __device__ __2f16 test_rcp2f16_wrapper(__2f16 x) {
+  return __llvm_amdgcn_rcp_2f16(x);
+}
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -279,6 +279,7 @@
 __device__ 

[PATCH] D155191: clang/HIP: Directly use f32 exp and log builtins

2023-07-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

5f1d3834a2bc3b77e126a278a0e7f00bce5576fc 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155191/new/

https://reviews.llvm.org/D155191

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153310: Add codegen for llvm pow builtin

2023-07-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

Add a test for the strictfp case (there's an existing strictfp test for all the 
elementwise builtins)




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:3241
+unsigned Opc = llvm::Intrinsic::pow;
+Value* Result = Builder.CreateBinaryIntrinsic(Opc, Op0, Op1, nullptr, 
"elt.pow");
+return RValue::get(Result);

emitBinaryBuiltin?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153310/new/

https://reviews.llvm.org/D153310

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154495: clang: Attach !fpmath metadata to __builtin_sqrt based on language flags

2023-07-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

bac2a075408377a8aa41f6626b17bb3e471221f3 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154495/new/

https://reviews.llvm.org/D154495

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >