[PATCH] D159118: [libc] Implement the 'clock()' function on the GPU

2023-08-29 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I think something needs to notice when clock_t is a different thing on the host 
and on the GPU for the offloading languages and complain, probably fatally.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159118/new/

https://reviews.llvm.org/D159118

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Name candidates

- expand
- lower
- desugar
- transform

Lowering probably makes the most sense for the abi level apply to all 
functions, I like desugar to cover rewriting a subset of the graph




Comment at: libc/config/gpu/entrypoints.txt:84-85
 # stdio.h entrypoints
+libc.src.stdio.snprintf
+libc.src.stdio.vsnprintf
 libc.src.stdio.puts

arsenm wrote:
> Split of the libc stuff into a separate patch, the lowering pass should be a 
> standalone change
There's quite a lot of wip in this patch. Your feedback is welcome, but you 
could also wait for a cleaned up version with your name on the reviewer list if 
you prefer. The libc plumbing is keeping track of which parts are currently 
passing for collaboration with jhuber6



Comment at: llvm/test/CodeGen/AMDGPU/unsupported-calls.ll:57-58
 
 ; GCN: in function test_tail_call_bitcast_extern_variadic{{.*}}: unsupported 
required tail call to function extern_variadic
-; R600: in function test_tail_call_bitcast_extern_variadic{{.*}}: unsupported 
call to function extern_variadic
 define i32 @test_tail_call_bitcast_extern_variadic(<4 x float> %arg0, <4 x 
float> %arg1, i32 %arg2) {

arsenm wrote:
> Deleted the wrong error?
this is curious - the test is passing as written, maybe a quirk of running it 
with not llc


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield removed subscribers: kristof.beyls, wangpc, jdoerfert.
JonChesterfield added inline comments.



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:215-217
+auto alloced = Builder.Insert(
+new AllocaInst(VarargsTy, DL.getAllocaAddrSpace(), nullptr,
+   std::max(MaxFieldAlign, assumedStructAlignment(DL))),

arsenm wrote:
> What's wrong with just Builder.CreateAlloca?
I want to mark the callee parameter with alignment metadata so the pointer 
alignment can constant fold (todo, said folding isn't currently working very 
well).

The struct contains whatever the caller was passing which we don't know much 
about. If we call CreateAlloca, it gets whatever alignment suffices for those 
fields.

If those two don't line up nicely, you get bad times. I'm using the stack 
alignment from the target string as a heuristic for a reasonable alignment to 
use for things on the stack. For amdgpu that's 4, which means passing a double 
gets an dynamic realignment which doesn't optimise out, but otherwise it seems 
reasonable.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 551616.
JonChesterfield added a comment.

- Rename ExpandVAIntrinsics to DesugarVariadics


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

Files:
  clang/lib/CodeGen/Targets/AMDGPU.cpp
  libc/config/gpu/entrypoints.txt
  libc/test/src/__support/CMakeLists.txt
  libc/test/src/stdio/CMakeLists.txt
  llvm/include/llvm/CodeGen/DesugarVariadics.h
  llvm/include/llvm/CodeGen/MachinePassRegistry.def
  llvm/include/llvm/CodeGen/Passes.h
  llvm/include/llvm/InitializePasses.h
  llvm/lib/CodeGen/CMakeLists.txt
  llvm/lib/CodeGen/CodeGen.cpp
  llvm/lib/CodeGen/DesugarVariadics.cpp
  llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
  llvm/test/CodeGen/AMDGPU/unsupported-calls.ll
  llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
  llvm/tools/opt/opt.cpp
  llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn

Index: llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn
===
--- llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn
+++ llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn
@@ -72,6 +72,7 @@
 "ExpandMemCmp.cpp",
 "ExpandPostRAPseudos.cpp",
 "ExpandReductions.cpp",
+"DesugarVariadics.cpp",
 "ExpandVectorPredication.cpp",
 "FEntryInserter.cpp",
 "FaultMaps.cpp",
Index: llvm/tools/opt/opt.cpp
===
--- llvm/tools/opt/opt.cpp
+++ llvm/tools/opt/opt.cpp
@@ -365,6 +365,7 @@
   "global-merge",
   "pre-isel-intrinsic-lowering",
   "expand-reductions",
+  "desugar-variadics",
   "indirectbr-expand",
   "generic-to-nvvm",
   "expandmemcmp",
@@ -456,6 +457,7 @@
   initializeWriteBitcodePassPass(Registry);
   initializeReplaceWithVeclibLegacyPass(Registry);
   initializeJMCInstrumenterPass(Registry);
+  initializeDesugarVariadicsPass(Registry);
 
   SmallVector PluginList;
   PassPlugins.setCallback([&](const std::string ) {
Index: llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
===
--- /dev/null
+++ llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
@@ -0,0 +1,85 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -desugar-variadics -desugar-variadics-all=true -S < %s | FileCheck %s
+
+target datalayout = "e-m:e-S8"
+
+; CHECK: %variadic_to_call_middle.vararg = type { i32, [4 x i8], double, i32 }
+
+define dso_local void @variadic_to_call_middle(double %d, ...) {
+; CHECK-LABEL: @variadic_to_call_middle(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[VA:%.*]] = alloca ptr, align 8
+; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[VA]])
+; CHECK-NEXT:store ptr [[VARARGS:%.*]], ptr [[VA]], align 8
+; CHECK-NEXT:[[ARGLIST_CURRENT:%.*]] = load ptr, ptr [[VA]], align 8
+; CHECK-NEXT:[[TMP0:%.*]] = getelementptr inbounds i8, ptr [[ARGLIST_CURRENT]], i32 3
+; CHECK-NEXT:[[TMP1:%.*]] = ptrtoint ptr [[TMP0]] to i64
+; CHECK-NEXT:[[TMP2:%.*]] = and i64 [[TMP1]], -4
+; CHECK-NEXT:[[TMP3:%.*]] = inttoptr i64 [[TMP2]] to ptr
+; CHECK-NEXT:[[TMP4:%.*]] = load i32, ptr [[TMP3]], align 4
+; CHECK-NEXT:[[ARGLIST_NEXT:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i64 1
+; CHECK-NEXT:store ptr [[ARGLIST_NEXT]], ptr [[VA]], align 8
+; CHECK-NEXT:[[ARGLIST_CURRENT1:%.*]] = load ptr, ptr [[VA]], align 8
+; CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds i8, ptr [[ARGLIST_CURRENT1]], i32 7
+; CHECK-NEXT:[[TMP6:%.*]] = ptrtoint ptr [[TMP5]] to i64
+; CHECK-NEXT:[[TMP7:%.*]] = and i64 [[TMP6]], -8
+; CHECK-NEXT:[[TMP8:%.*]] = inttoptr i64 [[TMP7]] to ptr
+; CHECK-NEXT:[[TMP9:%.*]] = load double, ptr [[TMP8]], align 8
+; CHECK-NEXT:[[ARGLIST_NEXT2:%.*]] = getelementptr inbounds double, ptr [[TMP8]], i64 1
+; CHECK-NEXT:store ptr [[ARGLIST_NEXT2]], ptr [[VA]], align 8
+; CHECK-NEXT:[[ARGLIST_CURRENT3:%.*]] = load ptr, ptr [[VA]], align 8
+; CHECK-NEXT:[[TMP10:%.*]] = getelementptr inbounds i8, ptr [[ARGLIST_CURRENT3]], i32 3
+; CHECK-NEXT:[[TMP11:%.*]] = ptrtoint ptr [[TMP10]] to i64
+; CHECK-NEXT:[[TMP12:%.*]] = and i64 [[TMP11]], -4
+; CHECK-NEXT:[[TMP13:%.*]] = inttoptr i64 [[TMP12]] to ptr
+; CHECK-NEXT:[[TMP14:%.*]] = load i32, ptr [[TMP13]], align 4
+; CHECK-NEXT:[[ARGLIST_NEXT4:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i64 1
+; CHECK-NEXT:store ptr [[ARGLIST_NEXT4]], ptr [[VA]], align 8
+; CHECK-NEXT:call void @_Z3erri(i32 [[TMP4]])
+; CHECK-NEXT:call void @_Z3errd(double [[TMP9]])
+; CHECK-NEXT:call void @_Z3erri(i32 [[TMP14]])
+; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[VA]])
+; CHECK-NEXT:ret void
+;
+entry:
+  %va = alloca ptr, align 8
+  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %va)
+  

[PATCH] D158246: [amdgpu] WIP variadics

2023-08-18 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

@arsenm suggested va_start/va_end should be addrspace aware, similar idea to 
https://reviews.llvm.org/D15. Should be less invasive for the va intrinsics 
as they're not used so widely.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-17 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: llvm/lib/CodeGen/ExpandVAIntrinsics.cpp:159
+// conventions. Escape analysis on va_list values.
+return false;
+  }

Cut this from the WIP patch as the draft is noisy. Ran some tests through x64 
with a patched clang locally, expanding functions that pass that precondition 
and leaving others unchanged.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-17 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

The lowering pass is broadly right, missing a few edge cases.




Comment at: libc/config/gpu/entrypoints.txt:88
+libc.src.stdio.vsnprintf
 libc.src.stdio.puts
 libc.src.stdio.fopen

^these try to build, but fail. I haven't managed to debug the libc cmake to 
work out what I'm missing



Comment at: libc/test/src/__support/CMakeLists.txt:62
 
-# The GPU does not support varargs currently.
-if(NOT LIBC_TARGET_ARCHITECTURE_IS_GPU)
-  add_libc_test(
-arg_list_test
-SUITE
-  libc-support-tests
-SRCS
-  arg_list_test.cpp
-DEPENDS
-  libc.src.__support.arg_list
-  )
-endif()
+add_libc_test(
+  arg_list_test

this one builds and passes


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158246/new/

https://reviews.llvm.org/D158246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158246: [amdgpu] WIP variadics

2023-08-17 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield created this revision.
Herald added subscribers: libc-commits, foad, kerbowa, hiraditya, tpr, 
dstuttard, yaxunl, jvesely, kzhuravl, arsenm.
Herald added projects: libc-project, All.
JonChesterfield requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wdng.
Herald added projects: clang, LLVM.

Won't ship as is, varargs and testing to go. Posting to collaborate with 
@jhuber6 on libc testing


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158246

Files:
  clang/lib/CodeGen/Targets/AMDGPU.cpp
  libc/config/gpu/entrypoints.txt
  libc/test/src/__support/CMakeLists.txt
  llvm/include/llvm/CodeGen/ExpandVAIntrinsics.h
  llvm/include/llvm/CodeGen/MachinePassRegistry.def
  llvm/include/llvm/CodeGen/Passes.h
  llvm/include/llvm/InitializePasses.h
  llvm/lib/CodeGen/CMakeLists.txt
  llvm/lib/CodeGen/CodeGen.cpp
  llvm/lib/CodeGen/ExpandVAIntrinsics.cpp
  llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
  llvm/test/CodeGen/AMDGPU/unsupported-calls.ll
  llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
  llvm/tools/opt/opt.cpp
  llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn

Index: llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn
===
--- llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn
+++ llvm/utils/gn/secondary/llvm/lib/CodeGen/BUILD.gn
@@ -72,6 +72,7 @@
 "ExpandMemCmp.cpp",
 "ExpandPostRAPseudos.cpp",
 "ExpandReductions.cpp",
+"ExpandVAIntrinsics.cpp",
 "ExpandVectorPredication.cpp",
 "FEntryInserter.cpp",
 "FaultMaps.cpp",
Index: llvm/tools/opt/opt.cpp
===
--- llvm/tools/opt/opt.cpp
+++ llvm/tools/opt/opt.cpp
@@ -365,6 +365,7 @@
   "global-merge",
   "pre-isel-intrinsic-lowering",
   "expand-reductions",
+  "expand-va-intrinsics",
   "indirectbr-expand",
   "generic-to-nvvm",
   "expandmemcmp",
@@ -456,6 +457,7 @@
   initializeWriteBitcodePassPass(Registry);
   initializeReplaceWithVeclibLegacyPass(Registry);
   initializeJMCInstrumenterPass(Registry);
+  initializeExpandVAIntrinsicsPass(Registry);
 
   SmallVector PluginList;
   PassPlugins.setCallback([&](const std::string ) {
Index: llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
===
--- /dev/null
+++ llvm/test/CodeGen/Generic/expand-variadic-intrinsics.ll
@@ -0,0 +1,85 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -expand-va-intrinsics -expand-va-intrinsics-all=true -S < %s | FileCheck %s
+
+target datalayout = "e-m:e-S8"
+
+; CHECK: %variadic_to_call_middle.vararg = type { i32, [4 x i8], double, i32 }
+
+define dso_local void @variadic_to_call_middle(double %d, ...) {
+; CHECK-LABEL: @variadic_to_call_middle(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[VA:%.*]] = alloca ptr, align 8
+; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[VA]])
+; CHECK-NEXT:store ptr [[VARARGS:%.*]], ptr [[VA]], align 8
+; CHECK-NEXT:[[ARGLIST_CURRENT:%.*]] = load ptr, ptr [[VA]], align 8
+; CHECK-NEXT:[[TMP0:%.*]] = getelementptr inbounds i8, ptr [[ARGLIST_CURRENT]], i32 3
+; CHECK-NEXT:[[TMP1:%.*]] = ptrtoint ptr [[TMP0]] to i64
+; CHECK-NEXT:[[TMP2:%.*]] = and i64 [[TMP1]], -4
+; CHECK-NEXT:[[TMP3:%.*]] = inttoptr i64 [[TMP2]] to ptr
+; CHECK-NEXT:[[TMP4:%.*]] = load i32, ptr [[TMP3]], align 4
+; CHECK-NEXT:[[ARGLIST_NEXT:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i64 1
+; CHECK-NEXT:store ptr [[ARGLIST_NEXT]], ptr [[VA]], align 8
+; CHECK-NEXT:[[ARGLIST_CURRENT1:%.*]] = load ptr, ptr [[VA]], align 8
+; CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds i8, ptr [[ARGLIST_CURRENT1]], i32 7
+; CHECK-NEXT:[[TMP6:%.*]] = ptrtoint ptr [[TMP5]] to i64
+; CHECK-NEXT:[[TMP7:%.*]] = and i64 [[TMP6]], -8
+; CHECK-NEXT:[[TMP8:%.*]] = inttoptr i64 [[TMP7]] to ptr
+; CHECK-NEXT:[[TMP9:%.*]] = load double, ptr [[TMP8]], align 8
+; CHECK-NEXT:[[ARGLIST_NEXT2:%.*]] = getelementptr inbounds double, ptr [[TMP8]], i64 1
+; CHECK-NEXT:store ptr [[ARGLIST_NEXT2]], ptr [[VA]], align 8
+; CHECK-NEXT:[[ARGLIST_CURRENT3:%.*]] = load ptr, ptr [[VA]], align 8
+; CHECK-NEXT:[[TMP10:%.*]] = getelementptr inbounds i8, ptr [[ARGLIST_CURRENT3]], i32 3
+; CHECK-NEXT:[[TMP11:%.*]] = ptrtoint ptr [[TMP10]] to i64
+; CHECK-NEXT:[[TMP12:%.*]] = and i64 [[TMP11]], -4
+; CHECK-NEXT:[[TMP13:%.*]] = inttoptr i64 [[TMP12]] to ptr
+; CHECK-NEXT:[[TMP14:%.*]] = load i32, ptr [[TMP13]], align 4
+; CHECK-NEXT:[[ARGLIST_NEXT4:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i64 1
+; CHECK-NEXT:store ptr [[ARGLIST_NEXT4]], ptr [[VA]], align 8
+; CHECK-NEXT:call void @_Z3erri(i32 [[TMP4]])
+; CHECK-NEXT:call void @_Z3errd(double [[TMP9]])
+; CHECK-NEXT:call void 

[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Ok, I'm really sure this needs to be reflected in the type system.  for 
some target foo gets to be larger than 8 bytes and we use a different calling 
convention for it. Otherwise however we carve this the type erasure is going to 
make unrelated calls acquire the dynamic search overhead.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

If calling an indirect function pointer on the GPU requires a table lookup 
(keyed by host function addresses, which I didn't think we knew at GPU compile 
time), and we cannot distinguish indirect function pointers from function 
pointers, then this feature must send _every_ indirect call on the GPU through 
the table search in case it hits in the table and then branch on the value if 
it doesn't.

So if we have a few indirect openmp functions and then we call a function which 
makes indirect calls of a function pointer which happened to be unrelated to 
this feature, it's going to search that table anyway. Say qsort on a 
non-inlined comparison function.

This feature, as I understand it so far, therefore induces a global slowdown on 
every call to an unknown function. Slow function calls are a bad thing.

What am I missing?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

> calling device functions via their associated host pointer

What does this mean? Defining a function foo such that the host and each 
individual target each have their own machine code for it, such that  can 
be copied over to the target and then invoked to mean call the function on the 
local target with the same name?

If so, calling through the pointer  on the GPU doing a logarithmic search 
through a table to choose a function address to branch to sounds like something 
that will codegen into very slow code. Does it do that search on every call?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155986: [clang][AMDGPU]: Don't use byval for struct arguments in function ABI

2023-08-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Thanks! Happy to see function calls getting cheaper


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155986/new/

https://reviews.llvm.org/D155986

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-04 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D156928#4562412 , @yaxunl wrote:

> Some control variables are per-module. Clang cannot emit control variables 
> that have different values for different modules. Intrinsics should work 
> since it can take an argument as its value.

Sure it can. In the most limited case, we could exactly replace the variables 
imported from source IR in the device libs based on command line variables with 
the same globals with the same flags. That would be NFC except it would no 
longer matter if the devicertl source was present.

There's just also other better things to be done as well, like not burning ABI 
decisions in the front end, or not using magic variables at all. But it's 
definitely not necessary to write constants in IR files that encode the same 
information as the command line flag. That's a whole indirect no-op that 
doesn't need to be there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-04 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Or, the front end could define those objects directly, without importing IR 
files that define the objects with the content clang used to choose the object 
file. E.g. instead of the argument daz=off (spelled differently) finding a file 
called daz.off.ll that defines variable called daz with a value 0, that 
argument could define that variable. I think @jhuber6 has a partial patch 
trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at 
O0 instead of magic variables that hopefully get constant folded. That would 
kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to 
be performance critical relative to splicing them in at the start of the 
backend.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-04 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D156928#4561849 , @arsenm wrote:

> In D156928#4561811 , 
> @JonChesterfield wrote:
>
>> What does code objects version= none mean?
>
> Handle any version

So... That should be the default, right? Emit IR that the back end specialises. 
Or, ideally, the only behaviour as far as the front end is concerned.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-04 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

What does code objects version= none mean?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155191: clang/HIP: Directly use f32 exp and log builtins

2023-07-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.

Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155191/new/

https://reviews.llvm.org/D155191

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142569: [OpenMP] Introduce kernel environment

2023-07-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

How does a function find the corresponding kernel environment at runtime?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142569/new/

https://reviews.llvm.org/D142569

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154850: [libc] Remove GPU string functions incompatible with C++

2023-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Is this a const thing? If so I'd naively hope we can declare both in C++ mode 
and alias them, but I can believe that is detectably broken.

How does glibc manage their hack?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154850/new/

https://reviews.llvm.org/D154850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

The test detection is an awkward compromise between people who want to run the 
GPU tests and people who don't, and reflects diverse hardware in use and 
variation on whether cuda / hsa are installed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D153725#4484966 , @arsenm wrote:

> In D153725#4484754 , 
> @JonChesterfield wrote:
>
>> - if you open the driver too many times at once it fails to open, so running 
>> a parallel build that uses this tool doesn't work on fast machines
>
> Why would this happen? Seems like a bug to fix?

It definitely annoys me. The argument is you can't usefully run some large N 
number of programs at the same time anyway and the driver failing to open is a 
rate limit. The problem is there are things we could usefully do, like this 
query, without needing to run a kernel as well. The net effect is we don't run 
tests widely in parallel because they fail if we do, for this and possibly 
other reasons.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D153725#4484747 , @arsenm wrote:

> In D153725#4484711 , 
> @JonChesterfield wrote:
>
>> The right thing to do on Linux for this is to query the driver directly. 
>> That is, the kernel should populate some string under /sys that we read. 
>> That isn't yet implemented.
>
> It should definitely not do that. That's what this redundant thing does 
> 
>  . The kernel doesn't know the names of these devices. The kernel knows 
> different names that map to PCI ids that are not the same as the gfx numbers. 
> The compiler should not be responsible for maintaining yet another name 
> mapping table and should go through a real API

There's a lot of pcie to gfx906 style tables lying around already. There used 
to be one in roct, last time I looked people wanted to move that to somewhere 
else. I don't really want to copy/paste it.

The problem with using the proper API via HSA or similar is twofold:

- we use this tool to enable tests, which means HSA has to exist before 
building clang or the tests don't run and HSA now requires clang to build
- if you open the driver too many times at once it fails to open, so running a 
parallel build that uses this tool doesn't work on fast machines


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:50
+#else
+  return printGPUsByHSA();
+#endif

yaxunl wrote:
> jhuber6 wrote:
> > arsenm wrote:
> > > The HIP path should work on linux too. I generally think we should build 
> > > as much code as possible on all hosts, so how about
> > > ```
> > > #ifndef _WIN32
> > >   if (tryHSA())
> > > return 0;
> > > #endif
> > > 
> > > tryHIP()
> > > ```
> > > 
> > > 
> > > 
> > That'd be fine, I'm in favor of sticking to HSA since it's a smaller 
> > runtime that's more reasonable to build standalone without the whole ROCm 
> > stack.
> done
I can't think of a case on linux where HIP would work and HSA would not, given 
that HIP calls into HSA to do the same query. So I think this fallback path 
only contributes to line noise when HSA doesn't load, 

```
./bin/amdgpu-arch 
Failed to 'dlopen' libhsa-runtime64.so
Failed to load libamdhip64.so: libamdhip64.so: cannot open shared object file: 
No such file or directory
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

The right thing to do on Linux for this is to query the driver directly. That 
is, the kernel should populate some string under /sys that we read. That isn't 
yet implemented. Does windows happen to have that functionality available?

(landed here while trying to work out why tests aren't running because we now 
print errors about failing to load libamdhip64.so when hsa fails)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154036: [libc] Add support for creating wrapper headers for offloading in clang

2023-07-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

OK, let's go with this. It's a fairly alarming mess localised quite closely to 
the language that requires the complexity, minimal damage to libc itself.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154036/new/

https://reviews.llvm.org/D154036

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153909: [AMDGPU] Always pass `-mcpu` to the `lld` linker

2023-06-27 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I thought lld did the right thing if you pass it raw IR, without specifying an 
arch, but on reflection it might just be silently miscompiling things. The test 
doesn't seem to involve a specific type of input file, does clang foo.ll -o 
a.out pass a mcpu flag along?

It's a bit weird that the architecture doesn't seem to be embedded in the IR 
file (e.g. you have to pass it to llc) but that's out of scope here


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153909/new/

https://reviews.llvm.org/D153909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153568: [ClangPackager] Add an option to extract inputs to an archive

2023-06-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

LGTM. As a meta level comment, we have far too many of these binary munging 
sorts of tools.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153568/new/

https://reviews.llvm.org/D153568

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153578: [Clang] Disable `libc` headers for offloading languages

2023-06-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks, unblocked :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153578/new/

https://reviews.llvm.org/D153578

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152867: OpenMP: Add a new test for constantexpr evaluation of math headers

2023-06-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152867/new/

https://reviews.llvm.org/D152867

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152829: clang: Add start of header test for __clang_hip_libdevice_declares

2023-06-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

That seems like something we should fix. Visibility/dso really should be the 
same default. I don't know what that fp thing is


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152829/new/

https://reviews.llvm.org/D152829

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152442: [LinkerWrapper] Support linking vendor bitcode late

2023-06-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

This LG, the complicated stuff is in the previous patch. Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152442/new/

https://reviews.llvm.org/D152442

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152391: [Clang] Allow bitcode linking when the input is LLVM-IR

2023-06-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:2001
 
-void CodeGenModule::getDefaultFunctionAttributes(StringRef Name,
- bool HasOptnone,
- bool AttrOnCallSite,
- llvm::AttrBuilder ) 
{
-  getTrivialDefaultFunctionAttributes(Name, HasOptnone, AttrOnCallSite,
-  FuncAttrs);
-  if (!AttrOnCallSite) {
-// If we're just getting the default, get the default values for mergeable
-// attributes.
-addMergableDefaultFunctionAttributes(CodeGenOpts, FuncAttrs);
-  }
-}
+/// Adds attributes to \p F according to our \p CodeGenOpts and \p LangOpts, as
+/// though we had emitted it ourselves. We remove any attributes on F that

I'm used to this sort of copy-some-args-and-not-others showing up in bug 
reports. Could this patch be re-ordered to make it apparent what functional 
changes are happening relative to the current head?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152391/new/

https://reviews.llvm.org/D152391

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138504: clang/HIP: Remove __llvm_amdgcn_* wrapper hacks

2023-06-08 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Yep, thanks. Anything under __llvm is fair game to change and there's a find& 
replace fix for anything that decided to call these directly.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138504/new/

https://reviews.llvm.org/D138504

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D152391: [Clang] Allow bitcode linking when the input is LLVM-IR

2023-06-07 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

That's cleaner than I expected, thanks. Might be reasonable to factor out the 
method as an initial NFC then insert the call to it along with the new test 
case as the functional change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152391/new/

https://reviews.llvm.org/D152391

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D102107: [OpenMP] Codegen aggregate for outlined function captures

2023-06-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I'm trying to pick up the context for this and D95976 
. Superficially it looks like lowering 
variadic functions in the compiler could be used to simplify quite a lot of 
this, @jdoerfert there's a comment from some time ago which suggests that this 
code path was originally a workaround for lack of variadics.

I'm currently debugging an IR pass that eliminates variadic calls in the hope 
of using that all the time on amdgpu. I think it could be adapted to patch 
these calls on the fly for nvptx as well if we added it to the openmp codegen 
pipeline, need to see whether the function pointer interacts well with the 
recent specialisation pass.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102107/new/

https://reviews.llvm.org/D102107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D151087: [Clang] Permit address space casts with 'reinterpret_cast' in C++

2023-05-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I like the current behaviour of bypass semantic checking and emit IR with the 
same number on it as my primary use case for feeding freestanding C++ to clang 
is as a convenient way to emit specific IR and I'm not that bothered about 
clang detecting mistakes along the way.

Funny that you mention addrspace(5) global as that's a pretty good 
approximation to what we'd like for errno. The semantics that would have been 
convenient is put it at the bottom of the stack in every kernel that can reach 
the variable and wire accesses to it through some base-of-stack built-in, which 
is presently not a thing but could be implemented.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151087/new/

https://reviews.llvm.org/D151087

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D149028: [Clang] Always pass `-fconvergent-functions` for GPU targets

2023-04-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I think this is sensible. Passing fno-convergent-functions presumably changes 
the default?

I wonder if we should adopt this and then remove the checks for each of the GPU 
programming models


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149028/new/

https://reviews.llvm.org/D149028

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D149019: [Clang] Accept and forward `-fconvergent-functions` in the driver

2023-04-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149019/new/

https://reviews.llvm.org/D149019

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D149019: [Clang] Accept and forward `-fconvergent-functions` in the driver

2023-04-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Can't reasonably see the semantic change between all the whitespace reformat, 
please split those two. E.g. use git-clang-format to only fix formatting in the 
part you're changing, or commit a clang-format only change first and then 
rebase this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149019/new/

https://reviews.llvm.org/D149019

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D147666: [OPENMP] Adds /lib to rpath to avoid need to set LD_LIBRARY_PATH to find plugins.

2023-04-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Sorry Greg, we can't have this back in trunk. It's now value add for the ROCm 
toolchain, and some llvm releases had this feature, but no longer.

The consensus in the OpenMP weekly meeting was to require users to hack around 
with linker flags or environment variables to get a working program. This makes 
Fedora's packaging scripts happy.

I do not like this but was comprehensively outvoted.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147666/new/

https://reviews.llvm.org/D147666

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146973: [Clang] Implicitly include LLVM libc headers for the GPU

2023-03-27 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Can we default to freestanding on, or just document that freestanding is a good 
idea, instead of hacking with include behaviour directly?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146973/new/

https://reviews.llvm.org/D146973

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145941: [Clang] Always use -zdefs when linking AMDGPU images

2023-03-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a reviewer: AMDGPU.
JonChesterfield added a comment.

Adding the amdgpu reviewer group. This change might be contentious - we 
currently treat various broken code as calls to trap on the grounds that those 
functions might not be executed. By analogy, functions that call undefined 
functions but are not themselves called might be considered not a problem.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145941/new/

https://reviews.llvm.org/D145941

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D118493: Set rpath on openmp executables

2023-03-09 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

That is great news, phabricator's list of branches has mislead me. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118493/new/

https://reviews.llvm.org/D118493

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D118493: Set rpath on openmp executables

2023-03-09 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Duplicating a comment from the commit thread so it's easier for me to find 
later.

You've applied this to the release branches going back as far as 14. It's a 
user facing breaking change. As in people who have a working openmp toolchain 
and update to the point release which looks like a semantic versioning thing 
indicating minor bugfixes will experience immediate cessation of function. I 
consider that very poor user experience and do not agree with the scope of 
breakage.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118493/new/

https://reviews.llvm.org/D118493

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D118493: Set rpath on openmp executables

2023-03-08 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Reporting after another round of discussion. I don't have much support from the 
llvm openmp working group that we should maintain the current divergence from 
libc++ and the like so I'm backing down. Let's revert this and regress for all 
users who don't install globally.

The intent is to document in the FAQ things users can do themselves to replace 
the functionality here, where possibly one enumerated option is to pass 
--fopenmp-implicit-rpath (maybe renamed) though my preference would be to 
delete this code path from the compiler entirely if it's going to be a 
minimally-tested, non-default path (as it'll break and we won't notice).

@MaskRay please revert this patch at your leisure. I'm reluctantly consenting 
to the capability regression but not volunteering to fix any CI fallout from 
removing it.

@ronlieb I think we should keep this functionality in rocm so that rocm built 
binaries don't start going looking for llvm libs under /usr instead and fall 
over


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118493/new/

https://reviews.llvm.org/D118493

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142174: [OpenMP] Don't set rpath for system paths

2023-03-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

I'm happy with this but agree that "what might be a system path?" is a tricky 
heuristic. What we want is to exclude the places that the application will 
search anyway, but that's OS dependent. I'm mostly worried about toolchains 
installed under /opt or under $HOME so this is good for me. It's also easy to 
pull later if we reconsider.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142174/new/

https://reviews.llvm.org/D142174

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143306: [Driver] Default to -fno-openmp-implicit-rpath

2023-02-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

If fedora has a specific set of paths it rejects in DT_RUNPATH

- and libomptarget is on one of those by default
- and user executables find it there
- and we can detect we're building on fedora

then we can disable the rpath for fedora installs to system root, while keeping 
openmp binaries working when the toolchain is installed elsewhere.

Are those constraints satisfiable?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143306/new/

https://reviews.llvm.org/D143306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144505: [Clang] Add options in LTO mode when cross compiling for AMDGPU

2023-02-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

NVM the above, all the other call sites to addLTOOptions have that test in them 
so it must be fine


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144505/new/

https://reviews.llvm.org/D144505

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144505: [Clang] Add options in LTO mode when cross compiling for AMDGPU

2023-02-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:545
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
   CmdArgs.push_back("-shared");

What's the test against OK_Thin for? ThinLTO is a thing but I don't know if it 
exists (/works) on amdgpu, is this that?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144505/new/

https://reviews.llvm.org/D144505

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143306: [Driver] Default to -fno-openmp-implicit-rpath

2023-02-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield requested changes to this revision.
JonChesterfield added a comment.
This revision now requires changes to proceed.

Marking this as "no" because as far as I can tell it'll stop anyone running 
openmp built from source which constitutes a severe regression and I am 
struggling to find information on what Fedora are doing here. This link 
https://fedoraproject.org/wiki/Changes/Broken_RPATH_will_fail_rpmbuild suggests 
changing clang to not set rpath when it would be aiming at a "system 
directory", which is unfortunately platform specific magic strings, would be 
fine. That is, maybe Fedora is OK with setting RPATH as long as it isn't set to 
/usr/lib64 or possibly other unspecified strings.

The use case I want to preserve is people running clang -fopenmp from a local 
install and without setting environment variables. That means the binary needs 
to find the shared libraries from that local install and not unrelated files 
with the same name that happen to be under /usr somewhere.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143306/new/

https://reviews.llvm.org/D143306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143306: [Driver] Default to -fno-openmp-implicit-rpath

2023-02-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/include/clang/Driver/Options.td:4218-4223
 defm openmp_implicit_rpath: BoolFOption<"openmp-implicit-rpath",
   LangOpts<"OpenMP">,
-  DefaultTrue,
+  DefaultFalse,
   PosFlag,
   NegFlag,
   BothFlags<[NoArgumentUnused]>>;

MaskRay wrote:
> yaxunl wrote:
> > I am wondering whether this option can be aliased to `--offload-add-rpath`, 
> > which seems to have the same purpose. (https://reviews.llvm.org/D136854)
> It can be aliased to `--[no-]offload-add-rpath`, but I think it probably 
> makes sense to drop this option completely. This patch as-is is useful for 
> backporting into 16.x.
Folding the flags together seems good to me, I'd have probably used 
offload-add-rpath in the initial implementation if I spotted it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143306/new/

https://reviews.llvm.org/D143306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143306: [Driver] Default to -fno-openmp-implicit-rpath

2023-02-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

So what is this configuration file? Joseph found a Gentoo blog post 
https://blogs.gentoo.org/mgorny/2022/10/07/clang-in-gentoo-now-sets-default-runtimes-via-config-file/
 and I don't have a clang.cfg file in my install dir


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143306/new/

https://reviews.llvm.org/D143306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143306: [Driver] Default to -fno-openmp-implicit-rpath

2023-02-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I don't know how this configuration file works. Running clang from the build 
directory is useful when developing, but to avoid user facing regression we'd 
need it to run from the install directory. The use case is someone builds trunk 
or a release on a HPC machine and expects it to run without hacking with the 
environment variables, which is especially important as module systems managing 
the machine are likely to be hacking with the environment variables.

It's not obvious to me how that helps fedora. Maybe changing rpath is 
forbidden, but global installs put the shared libraries somewhere that is 
detectable anyway, and the config file handling has some special case to not 
set rpath on binaries if it would point to system anyway? That logic would 
probably work for openmp too - if the library we want to find is the one on the 
system path, great - the problem is when the library we want to find is not on 
the system path. However if this machinery is already implemented as something 
that reads a text configuration file and we can use that instead, great.

I would rather we not apply this patch as-is to trunk, and further not apply it 
to the previous release, because it'll break anyone running clang -fopenmp who 
doesn't have clang globally installed. If we can add some config file change to 
this patch which keeps things working then sure, back-porting is fine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143306/new/

https://reviews.llvm.org/D143306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143306: [Driver] Default to -fno-openmp-implicit-rpath

2023-02-21 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I don't mind hugely what mechanism is used but would really like clang++ 
-fopenmp foo.cpp to build a program that runs. How can we preserve that 'works' 
feature without setting rpath on the binary?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143306/new/

https://reviews.llvm.org/D143306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D143306: [Driver] Default to -fno-openmp-implicit-rpath

2023-02-04 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added reviewers: tianshilei1992, ye-luo, RaviNarayanaswamy.
JonChesterfield added a comment.

Added some people I can remember from the original discussion.

The effect of this patch will be that clang -fopenmp foo.cpp will build an 
executable that doesn't know where its runtime libraries are. If it finds the 
right ones anyway that'll be fine. If it finds ones from some unrelated 
toolchain, that application usually won't work.

This is likely to be more painful for people with multiple toolchains installed.

How do the other projects deal with this hazard?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143306/new/

https://reviews.llvm.org/D143306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D118493: Set rpath on openmp executables

2023-01-24 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

That works if you have one toolchain installed globally or you are happy to 
cobble together a working system using environment variables. If you have 
multiple toolchains, they can't all sit in the global directory. If you don't 
have root, you can't install them there.

Previously we required people to set LD_LIBRARY_PATH to use openmp at all. 
That's an inherently poor UX and interacted especially poorly with module setup 
systems found in HPC.

We could make this opt-in, at the cost of new users seeing that openmp just 
doesn't work out of the box and other ones having to modify build scripts. In 
exchange we get - consistency with other tools that fail to work without global 
installation or extra user burden.

In my opinion dynamically linking language runtimes is a bad default. It wins 
us an awful lot of failure modes. I've also argued against being able to swap 
out individual components via environment variables as way too error prone for 
a user facing interface. This is what the community consensus went for. The 
combination of dynamically substitutable and requiring LD_LIBRARY_PATH was 
especially sharp.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118493/new/

https://reviews.llvm.org/D118493

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142138: [OpenMP] Clean up AMD handling for `-fopenmp-targets=amdgcn` arch inference

2023-01-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a subscriber: arsenm.
JonChesterfield added a comment.

@arsenm as above, mcpu != march important? llc takes a different one to clang 
iirc


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142138/new/

https://reviews.llvm.org/D142138

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142133: [LinkerWrapper] Use `clang` to perform the device linking

2023-01-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.
Herald added a subscriber: ormris.

That's much cleaner, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142133/new/

https://reviews.llvm.org/D142133

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141859: [amdgpu-arch] Dynamically load the HSA runtime if not found during the build

2023-01-16 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Yes, ok. Not thrilled about the copy from openmp but we can fix that as 
soon as we agree on a subdir to put code shared between llvm/clang/openmp and 
that has been tricky to achieve consensus on. Feature is good, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141859/new/

https://reviews.llvm.org/D141859

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141440: [OpenMP] Adjust phases for AMDGPU offloading for OpenMP in save-temps mode

2023-01-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

If i'm reading this right, the change means we emit the same save-temps files 
as hip with the same naming convention. That sounds great, makes life easier 
for backend devs looking at either. I don't know why rdc is a factor - it's not 
obvious to me that it means anything on amdgpu - but if that's preserving hip's 
current handling it's fine by me.

Thanks!




Comment at: clang/lib/Driver/Driver.cpp:4411
 ++TCAndArch;
   }
 }

Why is rdc involved here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141440/new/

https://reviews.llvm.org/D141440

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141105: [OpenMP] Add support for '--offload-arch=native' to OpenMP offloading

2023-01-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Possible naming hazard here. march=native means target the local processor 
architecture, zen2 or whatever, and we have the host CPU as an offloading 
target already. So what I'd expect this to do is host offloading with the 
openmp runtime compiled for the local variant of x86 or aarch64, not for it to 
have a guess at a GPU target.

What you think of offload-arch=GPU for pick a plausible GPU? That distinguishes 
it from other things we might want to offload to. Open question whether it 
should create a vgpu instance if it can't detect a physical card.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141105/new/

https://reviews.llvm.org/D141105

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140226: [NVPTX] Introduce attribute to mark kernels without a language mode

2022-12-18 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Calling convention is the right model here. Kernels are functions with a 
different calling convention to the 'normal' functions in a very literal sense. 
The calling convention modelling in clang is different to attribute handling 
and changing nvptx to it is probably invasive, though it seems to me it could 
be done incrementally.

I wouldn't suggest adding a nvptx_kernel calling convention to clang though, 
rather we could repurpose the amdgpu one to be gpu_kernel. Possibly spelled 
nvptx_kernel for the user but represented within clang as gpu_kernel.

Related, I think there's a spirv or opencl kernel representation in llvm for 
amdgpu, I would be interested in collapsing those and the openmp or hip 
annotation to a single thing if possible.

That's all medium term cleanup ideas, current patch looks good to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140226/new/

https://reviews.llvm.org/D140226

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

If we do magic header including, we should check for the freestanding argument 
and not include them with that set. I would prefer we not include cuda headers 
into C++ source that isn't being compiled as cuda, and also not link in misc 
cuda library files.

Anyone who has decided to compile raw c or c++ to ptx is doing something off 
the beaten path, I don't think we should assume they want implicit behaviour 
from other programming models thrown in.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137875: clang/AMDGPU: Drop volatile from builtin_atomic* pointers

2022-12-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137875/new/

https://reviews.llvm.org/D137875

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2022-12-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I am reluctant to add the dependency edge to rocm device libs to openmp's GPU 
runtime.

We currently require that library for libm, which I'm also not thrilled about, 
but at least you can currently build and run openmp programs (that don't use 
libm, like much of our tests) without it.

My problem with device libs is that it usually doesn't build with trunk. It 
follows a rolling dev model tied to rocm clang and when upstream does something 
that takes a long time to apply, device libs doesn't build until rocm catches 
up. I've literally never managed to compile any branch of device libs with 
trunk clang without modifying the source, generally to delete directories that 
don't look necessary for libm.

Further, selecting an ABI based on runtime code found in a library which is 
hopefully constant folded is a weird layering choice. The compiler knows what 
ABI it is emitting code for, and that's how it picks files from device libs to 
effect that choice, but it would make far more sense to me for the compiler 
back end to set this stuff up itself.

Also, if we handle ABI in the back end, then we don't get the inevitable 
problem of rocm device libs and trunk clang having totally different ideas of 
what the ABI is as they drift in and out of sync.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138141: [amdgpu] Reimplement LDS lowering

2022-12-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 480484.
JonChesterfield added a comment.
Herald added subscribers: cfe-commits, libc-commits, libcxx-commits, 
openmp-commits, lldb-commits, Sanitizers, hanchung, kadircet, jsetoain, 
Moerafaat, zero9178, pcwang-thead, anlunx, steakhal, mtrofin, Enna1, 
bzcheeseman, mravishankar, mattd, gchakrabarti, ThomasRaoux, pmatos, asb, 
jhenderson, yota9, ayermolo, awarzynski, sdasgup3, asavonic, carlosgalvezp, 
jeroen.dobbelaere, abrachet, wenzhicui, wrengr, armkevincheng, ormris, sjarus, 
eric-k256, mstorsjo, ChuanqiXu, cota, teijeong, frasercrmck, rdzhabarov, 
ecnelises, tatianashp, wenlei, cishida, mehdi_amini, okura, jdoerfert, 
msifontes, sstefan1, jurahul, kuter, cmtice, Kayjukh, vkmr, grosul1, martong, 
Joonsoo, stephenneuendorffer, phosek, liufengdb, aartbik, mgester, 
arpith-jacob, csigg, nicolasvasilache, antiagainst, shauheen, rriddle, 
luismarques, apazos, sameer.abuasal, pengfei, s.egerton, dmgreen, Jim, 
asbirlea, miyuki, jocewei, PkmX, arphaman, george.burgess.iv, the_o, 
brucehoult, MartinMosbeck, rogfer01, steven_wu, atanasyan, edward-jones, 
zzheng, MaskRay, jrtc27, gbedwell, niosHD, sabuasal, simoncook, haicheng, 
johnrusso, rbar, fedor.sergeev, kbarton, aheejin, jgravelle-google, 
krytarowski, arichardson, sbc100, nemanjai, sdardis, emaste, dylanmckay, 
jyknight, dschuff, jholewinski.
Herald added a reviewer: bollu.
Herald added a reviewer: JDevlieghere.
Herald added a reviewer: andreadb.
Herald added a reviewer: shafik.
Herald added a reviewer: jdoerfert.
Herald added a reviewer: jhenderson.
Herald added a reviewer: jdoerfert.
Herald added a reviewer: sstefan1.
Herald added a reviewer: antiagainst.
Herald added a reviewer: nicolasvasilache.
Herald added a reviewer: herhut.
Herald added a reviewer: rriddle.
Herald added a reviewer: aartbik.
Herald added a reviewer: MaskRay.
Herald added a reviewer: sscalpone.
Herald added a reviewer: ftynse.
Herald added a reviewer: aartbik.
Herald added a reviewer: aartbik.
Herald added a reviewer: awarzynski.
Herald added a reviewer: mravishankar.
Herald added a reviewer: bondhugula.
Herald added a reviewer: rafauler.
Herald added a reviewer: Amir.
Herald added a reviewer: maksfb.
Herald added a reviewer: paulkirth.
Herald added a reviewer: NoQ.
Herald added a reviewer: ributzka.
Herald added a reviewer: dcaballe.
Herald added a reviewer: njames93.
Herald added projects: clang, Sanitizers, LLDB, libc++, OpenMP, libc-project, 
libc++abi, MLIR, lld-macho, clang-tools-extra, Flang.
Herald added a reviewer: libc++.
Herald added a reviewer: libc++abi.
Herald added a reviewer: lld-macho.
This revision now requires review to proceed.

- remove obsolete comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138141/new/

https://reviews.llvm.org/D138141

Files:
  bolt/include/bolt/Core/BinaryBasicBlock.h
  bolt/include/bolt/Core/BinaryFunction.h
  bolt/include/bolt/Core/BinarySection.h
  bolt/lib/Core/BinaryEmitter.cpp
  bolt/lib/Core/BinarySection.cpp
  bolt/lib/Profile/DataAggregator.cpp
  bolt/lib/Profile/YAMLProfileWriter.cpp
  clang-tools-extra/clang-tidy/modernize/UseNullptrCheck.cpp
  clang-tools-extra/clang-tidy/utils/RenamerClangTidyCheck.cpp
  clang-tools-extra/clangd/Headers.cpp
  clang-tools-extra/clangd/Headers.h
  clang-tools-extra/clangd/TidyFastChecks.inc
  clang-tools-extra/clangd/TidyFastChecks.py
  clang-tools-extra/clangd/index/CanonicalIncludes.cpp
  clang-tools-extra/clangd/support/ThreadsafeFS.h
  clang-tools-extra/clangd/tool/Check.cpp
  clang-tools-extra/clangd/unittests/HeadersTests.cpp
  clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
  clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h
  clang-tools-extra/include-cleaner/lib/Analysis.cpp
  clang-tools-extra/include-cleaner/lib/AnalysisInternal.h
  clang-tools-extra/include-cleaner/lib/HTMLReport.cpp
  clang-tools-extra/include-cleaner/lib/Record.cpp
  clang-tools-extra/include-cleaner/tool/IncludeCleaner.cpp
  clang-tools-extra/include-cleaner/unittests/RecordTest.cpp
  clang-tools-extra/test/clang-tidy/checkers/modernize/use-nullptr-cxx20.cpp
  clang/cmake/modules/ClangConfig.cmake.in
  clang/docs/ReleaseNotes.rst
  clang/docs/tools/clang-formatted-files.txt
  clang/include/clang/AST/Decl.h
  clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/DirectoryEntry.h
  clang/include/clang/Basic/FileEntry.h
  clang/include/clang/Sema/Sema.h
  clang/include/clang/Sema/Template.h
  clang/include/clang/Serialization/ASTBitCodes.h
  clang/include/clang/StaticAnalyzer/Checkers/LocalCheckers.h
  clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
  clang/include/clang/Tooling/Inclusions/HeaderAnalysis.h
  clang/lib/AST/ASTContext.cpp
  clang/lib/AST/ASTImporter.cpp
  clang/lib/AST/DeclCXX.cpp
  clang/lib/AST/ExprConstant.cpp
  

[PATCH] D138392: clang/HIP: Fix broken implementations of __make_mantissa* functions

2022-11-28 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Patch is obviously fine. Given these are internal functions, perhaps we should 
annotate them with the non-null attribute and delete the early exit.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138392/new/

https://reviews.llvm.org/D138392

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138862: [Clang] Do not set offload kind in a freestanding build

2022-11-28 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Does this change the annotation on kernels compiled from C or C++ with 
-ffreestanding? If so, probably want a test showing the change. I have a vague 
idea that they pick up a default of 'opencl' at present, where 'none' is 
probably better. Or perhaps this is only a linker wrapper thing and doesn't 
make it as far as the HSA metadata.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138862/new/

https://reviews.llvm.org/D138862

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138391: clang/HIP: Add new header test for math IR gen

2022-11-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Good call adding this test, thanks.

I think some of the skipping is glibc quirk related and some was to avoid 
clobbering the user's namespace for openmp. The math header defines various 
functions at global scope which can collide with user symbols. In practice that 
may not be a big deal.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138391/new/

https://reviews.llvm.org/D138391

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138473: clang/HIP: Inline frexp/frexpf implementations

2022-11-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Noisy diff! Getting rid of the openmp special case handling here is nice.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138473/new/

https://reviews.llvm.org/D138473

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137524: clang/AMDGPU: Emit atomicrmw for atomic_inc/dec builtins

2022-11-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I don't recognize atomicrmw udec_wrap and can't find it in 
https://llvm.org/docs/LangRef.html#atomicrmw-instruction. I do vaguely recall 
the semantics of these builtins (well, the instructions they target) being 
surprising, Do you know where the uinc_wrao etc were introduced?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137524/new/

https://reviews.llvm.org/D137524

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-03 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D130096#3816149 , @arsenm wrote:

> I'd prefer to avoid spreading special treatment of the device libraries into 
> the backend. The contract is poorly defined and spread around too much as it 
> is






Comment at: clang/test/CodeGen/amdgcn-link-control-constants.c:2-3
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature --check-globals --include-generated-funcs 
--global-value-regex "__oclc_daz_opt"
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-emit-llvm-bc -o %t.bc -DLIBRARY %s
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a 
-mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s | FileCheck %s
+

jhuber6 wrote:
> jhuber6 wrote:
> > yaxunl wrote:
> > > This is compiling HIP as host. Please add -fcuda-is-device.
> > This test should only require that the triple is `amdgcn`. I could 
> > potentially make the generation of the constants require HIP or 
> > OpenMPDevice, or OpenCL is enabled if you think that's bad.
> I can also change it to just `-x c` if the HIP is the problem.
We probably want these magic constants for C++ code as well, so keying it off 
the triple (at least triple + that we're using rocm / compute stuff, which I 
think is adequately indicated by hsa in the triple) is better. And likewise 
don't want to emit these constants for non-gpu code, e.g. x64 host hip doesn't 
need the daz_opt constant, which also suggests triple is the right hook.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135076: [Clang] Make offloading flags accept '-' and '--'

2022-10-03 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Like that a lot, good quality of life improvement.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135076/new/

https://reviews.llvm.org/D135076

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

2022-09-20 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D82087#3803464 , @arsenm wrote:

> In D82087#3797883 , @jdoerfert wrote:
>
>> Can we land this? I'd like to use the new intrinsics as I don't understand 
>> the old ones.
>
> What do you think about using the two separate builtins, vs. one magic 
> builtin that auto-changes the wavesize?

The magic one would also change its return type, or always be i64 with high 
bits (zext or maybe sext or maybe copy of low), so less magic seems clearer


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82087/new/

https://reviews.llvm.org/D82087

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

I don't like this but will concede it's quicker than changing device libs to 
contain IR that doesn't have to be patched on the fly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

march for openmp, mcpu for hip seems ok. Notably llc needs to be told this as 
well, using mcpu, which may be an issue for save-temps


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.h:58
+  llvm::SmallVector
+  getHIPDeviceLibs(const llvm::opt::ArgList ) const override;
+

jhuber6 wrote:
> JonChesterfield wrote:
> > Why hip device libs? There's a common set, plus a hip.bc plus a opencl.bc. 
> > I'd expect us to want only the common set. Hip.bc shouldn't have non-hip 
> > stuff in it.
> Existing virtual function, just re-used it.
Rename it rocm perhaps?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.h:58
+  llvm::SmallVector
+  getHIPDeviceLibs(const llvm::opt::ArgList ) const override;
+

Why hip device libs? There's a common set, plus a hip.bc plus a opencl.bc. I'd 
expect us to want only the common set. Hip.bc shouldn't have non-hip stuff in 
it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

We can do this but should expect an increase in code size from having multiple 
internalised copies of the same function. There may be an incidental benefit if 
we can specialise some functions to the call site without additional cloning. 
Address of the same functions from different TUs will be inequal, which is 
wrong, but probably doesn't matter in practice.

It does have the major advantage that mlink-builtin-bitcode patches up the 
invalid IR on the fly, which is likely easier than changing the device libs or 
making IR mcpu-agnostic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D131639: [OpenMP] Remove 'stdbool.h' from OpenMP header wrappers

2022-08-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

We should write down (maybe in the commit message) what this is fixing. The 
linked bug has someone defining bool as a workaround for msvc which shouldn't 
be applied when not compiling with msvc, so superficially it looks like this 
patch works around broken application code. That doesn't seem a good thing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131639/new/

https://reviews.llvm.org/D131639

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D131307: [Clang] Allow downgrading to a warning the diagnostic for setting a non fixed enum to a value outside the range of the enumeration values

2022-08-09 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Did some digging here. The function hsa_agent_get_info takes an argument of 
type hsa_agent_info_t, which has declared values in the range [0 24]. The 
implementation of that (in amd_gpu_agent fwiw) casts that argument to a size_t 
and then switches on it, checking those declared values and a bunch of 
extensions. This is used to provide vendor extensions through a vendor-agnostic 
interface.

This seems to be a case where C and C++ have diverged. As far as I can tell, C 
thinks an enum is an int, and anything that fits in an int can be stored in one 
and retrieved later. C23 lets one specify the underlying type. C++ evidently 
thinks the value stored must be within [min max] of the declaration, which is 
at least more flexible than requiring it be one in the declaration.

So I think the fix here is to change hsa_agent_info_t to include 
`HSA_AGENT_INFO_UNUSED_INCREASE_RANGE_OF_TYPE = INT32_MAX` so the vendor 
extensions remain accessible. It's a header that is usable from C and C++ so it 
needs to do something conforming to both. Does that sound right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131307/new/

https://reviews.llvm.org/D131307

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

A safer bet is to use the current control flow that links in specific bitcode 
files, but create the global directly instead of linking in the file. That'll 
give us zero semantic change and a clang that ignores those bitcode files if 
present.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a reviewer: b-sumner.
JonChesterfield added a comment.

Tagging Brian as the code owner of rocm device libs - emitting these in clang 
would simplify that library.

Currently clang reads these commandline flags and conditionally links in 
bitcode files to introduce these symbols. There's existing command line flags 
for controlling which files are linked. I think this patch should probably use 
the existing flags to choose which values to set and delete the existing 
handling.

As written I think this is a no op, in that the libraries will currently be 
linked anyway and override the symbols clang has injected


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D129534: [OpenMP] Do not link static library with `-nogpulib`

2022-07-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129534/new/

https://reviews.llvm.org/D129534

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128914: [HIP] Add support for handling HIP in the linker wrapper

2022-07-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Code looks good to me. It's hard to be sure whether it works without running a 
bunch of hip test cases through it, have you already done so? If it doesn't 
work out of the box it should be close enough to fix up post commit, e.g. when 
trying to move hip over to this by default.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128914/new/

https://reviews.llvm.org/D128914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D129435: [Clang] Parse toolchain-specific offloading arguments directly

2022-07-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

LG, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129435/new/

https://reviews.llvm.org/D129435

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D127246: [LinkerWrapper] Rework the linker wrapper and use owning binaries

2022-06-21 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

I think this is probably OK. Smaller patches usually get reviewed faster so 
minimising the line noise in the browser is worthwhile.




Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:784
 };
-Conf.PostInternalizeModuleHook = [&](size_t, const Module ) {
+Conf.PostInternalizeModuleHook = [&, Arch](size_t, const Module ) {
   SmallString<128> TempFile;

jhuber6 wrote:
> JonChesterfield wrote:
> > Why call out Arch here? Passing a StringRef by reference via the & should 
> > be fine
> It apparently wasn't fine. For whatever reason when this function was called 
> by the LTO engine this value was getting mutated and pointing to bad memory. 
> If I capture it by-value everything is fine. I didn't care enough to look 
> into it once I figured out the fix.
Interesting. The filename operator+ behaving differently for StringRef vs 
StringRef& seems suspicious. Passing by std::string is presumably safer again, 
but I guess this will do for now.



Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1136
+
+  Triple TheTriple = Triple(Images.front().StringData.lookup("triple"));
+  auto FileOrErr = nvptx::fatbinary(InputFiles, TheTriple);

jhuber6 wrote:
> JonChesterfield wrote:
> > This is close but not quite a copy of existing code, moving it around in 
> > the file at the same time as changing it makes the diff significantly 
> > harder to follow
> Sorry, I don't use a header file here so it's a little bit of a pain to make 
> sure new functions can call the ones they need. This function is 
> fundamentally different from the old one so I wouldn't worry about the old 
> implementations.
You could probably move the old functions further up in the file as a precommit 
change so that they lined up in the functional diff


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127246/new/

https://reviews.llvm.org/D127246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D127246: [LinkerWrapper] Rework the linker wrapper and use owning binaries

2022-06-21 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I've read it but can't promise it's correct -  the diff is large and has some 
spurious noise in it which distracts significantly from the functional changes.

Would you be willing to split this into two patches, one which renames 
variables and moves blocks of code around without changing them and a second 
that has the functional changes?




Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:338
+Contents.getBufferIdentifier());
+auto NewBinaryOrErr = OffloadBinary::create(*BufferCopy);
+if (!NewBinaryOrErr)

This is syntactically a bit weird: `std::move(*NewBinaryOrErr)`

Could you spell out the type? I think based on the previous version it's 
probably an Expected> but my first thought was whether a 
unique_ptr was being heap allocated by ::create



Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:784
 };
-Conf.PostInternalizeModuleHook = [&](size_t, const Module ) {
+Conf.PostInternalizeModuleHook = [&, Arch](size_t, const Module ) {
   SmallString<128> TempFile;

Why call out Arch here? Passing a StringRef by reference via the & should be 
fine



Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:903
 
   // We assume visibility of the whole program if every input file was bitcode.
+  bool WholeProgram = InputFiles.empty();

Existing comment I see, but what justifies that assumption?



Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1136
+
+  Triple TheTriple = Triple(Images.front().StringData.lookup("triple"));
+  auto FileOrErr = nvptx::fatbinary(InputFiles, TheTriple);

This is close but not quite a copy of existing code, moving it around in the 
file at the same time as changing it makes the diff significantly harder to 
follow


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127246/new/

https://reviews.llvm.org/D127246

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D125970#3531645 , @aaron.ballman 
wrote:

> In D125970#3527685 , 
> @JonChesterfield wrote:
>
>> If it was adding a calling convention, sure - caution warranted. There's no 
>> llvm change here though, an existing CC is exposed to C++. No change to the 
>> type system either.
>
> This is adding a user-facing calling convention to Clang and it changes the 
> type system as a result. For example, lambda function pointer conversion 
> operators sometimes are generated for each calling convention so that you can 
> form a function pointer of the correct type (this might not be impacted by 
> your change here); there's a specific number of bits for representing the 
> enumeration of calling conventions and this uses one of those bits, etc.

It slightly changes the type system of C++ code in that the calling convention 
was previously only available in opencl / openmp etc. I was under the 
impression that the compiler data representation cost of calling conventions 
was in LLVM and thus pre-paid for the calling convention this gives access to. 
There's the `enum CallingConv ` which has gained a field, I didn't realise that 
was input into something of limited bitwidth.

>> I'll propose a patch with some documentation for it if you wish, but it'll 
>> just say "For ad hoc debugging of the amdgpu backend". Undocumented seems to 
>> state that more clearly.
>
> I continue to question whether we want to support such a calling convention. 
> This does not seem to be generally useful enough to warrant inclusion in 
> Clang. The fact that you'd like to leave it undocumented as that's more clear 
> for users is a pretty good indication that this calling convention doesn't 
> meet the bar for an extension.

Strictly speaking this lets people write a GPU kernel that can execute on 
AMDGPU in freestanding C++. I happen to want to do that for testing LLVM in the 
immediate instance but there's arguably wider applicability. However, it looks 
like how arguments are represented in this calling convention has some 
strangeness (see discussion with Sam above), particularly with regard to 
address spaces.

I can revert this patch if necessary, but it'll force me to continue trying to 
test our compiler through the lens of opencl, and rules out programming the 
hardware without the various specific language front ends. I think that would 
be a sad loss.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-20 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

If it was adding a calling convention, sure - caution warranted. There's no 
llvm change here though, an existing CC is exposed to C++. No change to the 
type system either.

I'll propose a patch with some documentation for it if you wish, but it'll just 
say "For ad hoc debugging of the amdgpu backend". Undocumented seems to state 
that more clearly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-20 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

A clangd buildbot (https://lab.llvm.org/buildbot/#/builders/131/builds/27770) 
failed on this with

  [ RUN  ] SerializationTest.NoCrashOnBadArraySize
  ==384111==ERROR: ThreadSanitizer failed to allocate 0x1 (65536) bytes of 
stack depot (error code: 12)
  ERROR: Failed to mmap

I believe this is coincidence


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-20 Thread Jon Chesterfield via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG83c431fb9e72: [amdgpu] Add amdgpu_kernel calling conv 
attribute to clang (authored by JonChesterfield).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/Specifiers.h
  clang/lib/AST/ItaniumMangle.cpp
  clang/lib/AST/Type.cpp
  clang/lib/AST/TypePrinter.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGDebugInfo.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
  clang/test/Sema/callingconv.c
  clang/tools/libclang/CXType.cpp

Index: clang/tools/libclang/CXType.cpp
===
--- clang/tools/libclang/CXType.cpp
+++ clang/tools/libclang/CXType.cpp
@@ -677,6 +677,7 @@
   TCALLINGCONV(PreserveMost);
   TCALLINGCONV(PreserveAll);
 case CC_SpirFunction: return CXCallingConv_Unexposed;
+case CC_AMDGPUKernelCall: return CXCallingConv_Unexposed;
 case CC_OpenCLKernel: return CXCallingConv_Unexposed;
   break;
 }
Index: clang/test/Sema/callingconv.c
===
--- clang/test/Sema/callingconv.c
+++ clang/test/Sema/callingconv.c
@@ -54,6 +54,8 @@
 int __attribute__((aarch64_vector_pcs)) aavpcs(void); // expected-warning {{'aarch64_vector_pcs' calling convention is not supported for this target}}
 int __attribute__((aarch64_sve_pcs)) aasvepcs(void);  // expected-warning {{'aarch64_sve_pcs' calling convention is not supported for this target}}
 
+int __attribute__((amdgpu_kernel)) amdgpu_kernel(void); // expected-warning {{'amdgpu_kernel' calling convention is not supported for this target}}
+
 // PR6361
 void ctest3();
 void __attribute__((cdecl)) ctest3() {}
Index: clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
@@ -0,0 +1,83 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -no-opaque-pointers -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefixes=COMMON,CHECK %s
+
+// Derived from CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu by deleting references to HOST
+// The original test passes the result through opt O2, but that seems to introduce invalid
+// addrspace casts which are not being fixed as part of the present change.
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel1Pi(i32* {{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel1(int *x) {
+  x[0]++;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel2Ri(i32* {{.*}} nonnull align 4 dereferenceable(4) %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel2(int ) {
+  x++;
+}
+
+// CHECK-LABEL: define{{.*}} amdgpu_kernel void  @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)*{{.*}} %x, i32 addrspace(1)*{{.*}} %y)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel3(__attribute__((address_space(2))) int *x,
+__attribute__((address_space(1))) int *y) {
+  y[0] = x[0];
+}
+
+// COMMON-LABEL: define{{.*}} void @_Z4funcPi(i32*{{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void func(int *x) {
+  x[0]++;
+}
+
+struct S {
+  int *x;
+  float *y;
+};
+// `by-val` struct is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel41S(%struct.S addrspace(4)*{{.*}} byref(%struct.S) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel4(struct S s) {
+  s.x[0]++;
+  s.y[0] += 1.f;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel5P1S(%struct.S* {{.*}} %s)
+__attribute__((amdgpu_kernel)) void kernel5(struct S *s) {
+  s->x[0]++;
+  s->y[0] += 1.f;
+}
+
+struct T {
+  float *x[2];
+};
+// `by-val` array is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel61T(%struct.T addrspace(4)*{{.*}} byref(%struct.T) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel6(struct T t) {
+  t.x[0][0] += 1.f;
+  t.x[1][0] += 2.f;
+}
+
+// Check that coerced pointers retain the noalias attribute when 

[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Unintentionally created this patch against an older version of main and it 
interacted badly with D124998  on the rebase. 
Rerunning tests now, and will leave this open for further comments for a little 
while. Thanks all


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 430821.
JonChesterfield added a comment.

- Fix git merge misfires


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/Specifiers.h
  clang/lib/AST/ItaniumMangle.cpp
  clang/lib/AST/Type.cpp
  clang/lib/AST/TypePrinter.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGDebugInfo.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
  clang/test/Sema/callingconv.c
  clang/tools/libclang/CXType.cpp

Index: clang/tools/libclang/CXType.cpp
===
--- clang/tools/libclang/CXType.cpp
+++ clang/tools/libclang/CXType.cpp
@@ -677,6 +677,7 @@
   TCALLINGCONV(PreserveMost);
   TCALLINGCONV(PreserveAll);
 case CC_SpirFunction: return CXCallingConv_Unexposed;
+case CC_AMDGPUKernelCall: return CXCallingConv_Unexposed;
 case CC_OpenCLKernel: return CXCallingConv_Unexposed;
   break;
 }
Index: clang/test/Sema/callingconv.c
===
--- clang/test/Sema/callingconv.c
+++ clang/test/Sema/callingconv.c
@@ -54,6 +54,8 @@
 int __attribute__((aarch64_vector_pcs)) aavpcs(void); // expected-warning {{'aarch64_vector_pcs' calling convention is not supported for this target}}
 int __attribute__((aarch64_sve_pcs)) aasvepcs(void);  // expected-warning {{'aarch64_sve_pcs' calling convention is not supported for this target}}
 
+int __attribute__((amdgpu_kernel)) amdgpu_kernel(void); // expected-warning {{'amdgpu_kernel' calling convention is not supported for this target}}
+
 // PR6361
 void ctest3();
 void __attribute__((cdecl)) ctest3() {}
Index: clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
@@ -0,0 +1,83 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -no-opaque-pointers -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefixes=COMMON,CHECK %s
+
+// Derived from CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu by deleting references to HOST
+// The original test passes the result through opt O2, but that seems to introduce invalid
+// addrspace casts which are not being fixed as part of the present change.
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel1Pi(i32* {{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel1(int *x) {
+  x[0]++;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel2Ri(i32* {{.*}} nonnull align 4 dereferenceable(4) %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel2(int ) {
+  x++;
+}
+
+// CHECK-LABEL: define{{.*}} amdgpu_kernel void  @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)*{{.*}} %x, i32 addrspace(1)*{{.*}} %y)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel3(__attribute__((address_space(2))) int *x,
+__attribute__((address_space(1))) int *y) {
+  y[0] = x[0];
+}
+
+// COMMON-LABEL: define{{.*}} void @_Z4funcPi(i32*{{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void func(int *x) {
+  x[0]++;
+}
+
+struct S {
+  int *x;
+  float *y;
+};
+// `by-val` struct is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel41S(%struct.S addrspace(4)*{{.*}} byref(%struct.S) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel4(struct S s) {
+  s.x[0]++;
+  s.y[0] += 1.f;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel5P1S(%struct.S* {{.*}} %s)
+__attribute__((amdgpu_kernel)) void kernel5(struct S *s) {
+  s->x[0]++;
+  s->y[0] += 1.f;
+}
+
+struct T {
+  float *x[2];
+};
+// `by-val` array is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel61T(%struct.T addrspace(4)*{{.*}} byref(%struct.T) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel6(struct T t) {
+  t.x[0][0] += 1.f;
+  t.x[1][0] += 2.f;
+}
+
+// Check that coerced pointers retain the noalias attribute when qualified with __restrict.
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel7Pi(i32* noalias{{.*}} %x)

[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 430820.
JonChesterfield added a comment.

- Fix git merge misfires


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/Specifiers.h
  clang/lib/AST/ItaniumMangle.cpp
  clang/lib/AST/Type.cpp
  clang/lib/AST/TypePrinter.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGDebugInfo.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
  clang/test/Sema/callingconv.c
  clang/tools/libclang/CXType.cpp

Index: clang/tools/libclang/CXType.cpp
===
--- clang/tools/libclang/CXType.cpp
+++ clang/tools/libclang/CXType.cpp
@@ -677,6 +677,7 @@
   TCALLINGCONV(PreserveMost);
   TCALLINGCONV(PreserveAll);
 case CC_SpirFunction: return CXCallingConv_Unexposed;
+case CC_AMDGPUKernelCall: return CXCallingConv_Unexposed;
 case CC_OpenCLKernel: return CXCallingConv_Unexposed;
   break;
 }
Index: clang/test/Sema/callingconv.c
===
--- clang/test/Sema/callingconv.c
+++ clang/test/Sema/callingconv.c
@@ -54,6 +54,8 @@
 int __attribute__((aarch64_vector_pcs)) aavpcs(void); // expected-warning {{'aarch64_vector_pcs' calling convention is not supported for this target}}
 int __attribute__((aarch64_sve_pcs)) aasvepcs(void);  // expected-warning {{'aarch64_sve_pcs' calling convention is not supported for this target}}
 
+int __attribute__((amdgpu_kernel)) aavpcs(void); // expected-warning {{'amdgpu_kernel' calling convention is not supported for this target}}
+
 // PR6361
 void ctest3();
 void __attribute__((cdecl)) ctest3() {}
Index: clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
@@ -0,0 +1,83 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -no-opaque-pointers -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefixes=COMMON,CHECK %s
+
+// Derived from CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu by deleting references to HOST
+// The original test passes the result through opt O2, but that seems to introduce invalid
+// addrspace casts which are not being fixed as part of the present change.
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel1Pi(i32* {{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel1(int *x) {
+  x[0]++;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel2Ri(i32* {{.*}} nonnull align 4 dereferenceable(4) %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel2(int ) {
+  x++;
+}
+
+// CHECK-LABEL: define{{.*}} amdgpu_kernel void  @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)*{{.*}} %x, i32 addrspace(1)*{{.*}} %y)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel3(__attribute__((address_space(2))) int *x,
+__attribute__((address_space(1))) int *y) {
+  y[0] = x[0];
+}
+
+// COMMON-LABEL: define{{.*}} void @_Z4funcPi(i32*{{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void func(int *x) {
+  x[0]++;
+}
+
+struct S {
+  int *x;
+  float *y;
+};
+// `by-val` struct is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel41S(%struct.S addrspace(4)*{{.*}} byref(%struct.S) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel4(struct S s) {
+  s.x[0]++;
+  s.y[0] += 1.f;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel5P1S(%struct.S* {{.*}} %s)
+__attribute__((amdgpu_kernel)) void kernel5(struct S *s) {
+  s->x[0]++;
+  s->y[0] += 1.f;
+}
+
+struct T {
+  float *x[2];
+};
+// `by-val` array is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel61T(%struct.T addrspace(4)*{{.*}} byref(%struct.T) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel6(struct T t) {
+  t.x[0][0] += 1.f;
+  t.x[1][0] += 2.f;
+}
+
+// Check that coerced pointers retain the noalias attribute when qualified with __restrict.
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel7Pi(i32* noalias{{.*}} %x)
+__attribute__((amdgpu_kernel)) void 

[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 430818.
JonChesterfield added a comment.

- Rebase on main


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/Specifiers.h
  clang/lib/AST/ItaniumMangle.cpp
  clang/lib/AST/Type.cpp
  clang/lib/AST/TypePrinter.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGDebugInfo.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
  clang/test/Sema/callingconv.c
  clang/tools/libclang/CXType.cpp

Index: clang/tools/libclang/CXType.cpp
===
--- clang/tools/libclang/CXType.cpp
+++ clang/tools/libclang/CXType.cpp
@@ -677,6 +677,7 @@
   TCALLINGCONV(PreserveMost);
   TCALLINGCONV(PreserveAll);
 case CC_SpirFunction: return CXCallingConv_Unexposed;
+case CC_AMDGPUKernelCall: return CXCallingConv_Unexposed;
 case CC_OpenCLKernel: return CXCallingConv_Unexposed;
   break;
 }
Index: clang/test/Sema/callingconv.c
===
--- clang/test/Sema/callingconv.c
+++ clang/test/Sema/callingconv.c
@@ -54,6 +54,8 @@
 int __attribute__((aarch64_vector_pcs)) aavpcs(void); // expected-warning {{'aarch64_vector_pcs' calling convention is not supported for this target}}
 int __attribute__((aarch64_sve_pcs)) aasvepcs(void);  // expected-warning {{'aarch64_sve_pcs' calling convention is not supported for this target}}
 
+int __attribute__((amdgpu_kernel)) aavpcs(void); // expected-warning {{'amdgpu_kernel' calling convention is not supported for this target}}
+
 // PR6361
 void ctest3();
 void __attribute__((cdecl)) ctest3() {}
Index: clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
@@ -0,0 +1,83 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -no-opaque-pointers -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefixes=COMMON,CHECK %s
+
+// Derived from CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu by deleting references to HOST
+// The original test passes the result through opt O2, but that seems to introduce invalid
+// addrspace casts which are not being fixed as part of the present change.
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel1Pi(i32* {{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel1(int *x) {
+  x[0]++;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel2Ri(i32* {{.*}} nonnull align 4 dereferenceable(4) %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel2(int ) {
+  x++;
+}
+
+// CHECK-LABEL: define{{.*}} amdgpu_kernel void  @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)*{{.*}} %x, i32 addrspace(1)*{{.*}} %y)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel3(__attribute__((address_space(2))) int *x,
+__attribute__((address_space(1))) int *y) {
+  y[0] = x[0];
+}
+
+// COMMON-LABEL: define{{.*}} void @_Z4funcPi(i32*{{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void func(int *x) {
+  x[0]++;
+}
+
+struct S {
+  int *x;
+  float *y;
+};
+// `by-val` struct is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel41S(%struct.S addrspace(4)*{{.*}} byref(%struct.S) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel4(struct S s) {
+  s.x[0]++;
+  s.y[0] += 1.f;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel5P1S(%struct.S* {{.*}} %s)
+__attribute__((amdgpu_kernel)) void kernel5(struct S *s) {
+  s->x[0]++;
+  s->y[0] += 1.f;
+}
+
+struct T {
+  float *x[2];
+};
+// `by-val` array is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel61T(%struct.T addrspace(4)*{{.*}} byref(%struct.T) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel6(struct T t) {
+  t.x[0][0] += 1.f;
+  t.x[1][0] += 2.f;
+}
+
+// Check that coerced pointers retain the noalias attribute when qualified with __restrict.
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel7Pi(i32* noalias{{.*}} %x)
+__attribute__((amdgpu_kernel)) void 

[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Thanks for accepting! I'm interested to learn more about how the calling conv 
works, e.g. if parts of it are implemented in clang and parts of it patched on 
the fly by opt, but that's downstream of easy access to writing C tests that 
use it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Added a codegen test for arg passing. It establishes that most arguments are 
left alone, but structs passed by value are handled as an addrspace(4) byref. 
Letting opt -O2 run annotated some argument pointers as being in addrspace(1) 
which I think is wrong.

I have no judgement to pass on the current codegen for this - it might be 
correct, or we might have bugs in opt - but those bugs are exactly those that 
are easier to flush out if we land this patch first. All I want to achieve with 
this patch is a means of tagging a void func(void) with the calling convention.

If we wanted to go further - and turn C++ into a language that is robustly 
usable on the GPU - I think we have a bunch of tasks related to handling 
addrspace annotations to burn through.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 430801.
JonChesterfield added a comment.

- Add O0 arg passing codegen test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/Specifiers.h
  clang/lib/AST/ItaniumMangle.cpp
  clang/lib/AST/Type.cpp
  clang/lib/AST/TypePrinter.cpp
  clang/lib/Basic/Targets/AMDGPU.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGDebugInfo.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/lib/Sema/SemaType.cpp
  clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
  clang/test/Sema/callingconv.c
  clang/tools/libclang/CXType.cpp

Index: clang/tools/libclang/CXType.cpp
===
--- clang/tools/libclang/CXType.cpp
+++ clang/tools/libclang/CXType.cpp
@@ -676,6 +676,7 @@
   TCALLINGCONV(PreserveMost);
   TCALLINGCONV(PreserveAll);
 case CC_SpirFunction: return CXCallingConv_Unexposed;
+case CC_AMDGPUKernelCall: return CXCallingConv_Unexposed;
 case CC_OpenCLKernel: return CXCallingConv_Unexposed;
   break;
 }
Index: clang/test/Sema/callingconv.c
===
--- clang/test/Sema/callingconv.c
+++ clang/test/Sema/callingconv.c
@@ -53,6 +53,8 @@
 
 int __attribute__((aarch64_vector_pcs)) aavpcs(void); // expected-warning {{'aarch64_vector_pcs' calling convention is not supported for this target}}
 
+int __attribute__((amdgpu_kernel)) aavpcs(void); // expected-warning {{'amdgpu_kernel' calling convention is not supported for this target}}
+
 // PR6361
 void ctest3();
 void __attribute__((cdecl)) ctest3() {}
Index: clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp
@@ -0,0 +1,83 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -no-opaque-pointers -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefixes=COMMON,CHECK %s
+
+// Derived from CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu by deleting references to HOST
+// The original test passes the result through opt O2, but that seems to introduce invalid
+// addrspace casts which are not being fixed as part of the present change.
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel1Pi(i32* {{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel1(int *x) {
+  x[0]++;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel2Ri(i32* {{.*}} nonnull align 4 dereferenceable(4) %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel2(int ) {
+  x++;
+}
+
+// CHECK-LABEL: define{{.*}} amdgpu_kernel void  @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)*{{.*}} %x, i32 addrspace(1)*{{.*}} %y)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void kernel3(__attribute__((address_space(2))) int *x,
+__attribute__((address_space(1))) int *y) {
+  y[0] = x[0];
+}
+
+// COMMON-LABEL: define{{.*}} void @_Z4funcPi(i32*{{.*}} %x)
+// CHECK-NOT: ={{.*}} addrspacecast [[TYPE:.*]] addrspace(1)* %{{.*}} to [[TYPE]]*
+__attribute__((amdgpu_kernel)) void func(int *x) {
+  x[0]++;
+}
+
+struct S {
+  int *x;
+  float *y;
+};
+// `by-val` struct is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel41S(%struct.S addrspace(4)*{{.*}} byref(%struct.S) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel4(struct S s) {
+  s.x[0]++;
+  s.y[0] += 1.f;
+}
+
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel5P1S(%struct.S* {{.*}} %s)
+__attribute__((amdgpu_kernel)) void kernel5(struct S *s) {
+  s->x[0]++;
+  s->y[0] += 1.f;
+}
+
+struct T {
+  float *x[2];
+};
+// `by-val` array is passed by-indirect-alias (a mix of by-ref and indirect
+// by-val). However, the enhanced address inferring pass should be able to
+// assume they are global pointers.
+//
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel61T(%struct.T addrspace(4)*{{.*}} byref(%struct.T) align 8 %0)
+__attribute__((amdgpu_kernel)) void kernel6(struct T t) {
+  t.x[0][0] += 1.f;
+  t.x[1][0] += 2.f;
+}
+
+// Check that coerced pointers retain the noalias attribute when qualified with __restrict.
+// COMMON-LABEL: define{{.*}} amdgpu_kernel void @_Z7kernel7Pi(i32* noalias{{.*}} %x)
+__attribute__((amdgpu_kernel)) void kernel7(int *__restrict x) {
+  x[0]++;
+}
+
+// Single element struct.
+struct SS {
+  float *x;
+};
+// COMMON-LABEL: define{{.*}} 

[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

OK, so that's a different thing. CUDA/HIP has a bunch of rules about implicitly 
tagging things with addrspace(1) at the call boundary. I don't think any of 
that magic should exist for C or C++, the developer gets to spell out the 
address space stuff they want explicitly, and if they want to link it against 
CUDA/HIP/OpenMP code they get to look up the rules. In particular, persuading 
clang to do the extra argument mangling stuff will get in the way of using this 
to create test cases.

I'm writing the corresponding test case for that now


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   >