from:"Joseph Huber via cfe\-commits"

[clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268)

2024-06-24 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> @jhuber6 The clang format errors are mostly due to my local version of 
> `clang-format` disagreeing with the buildbot's version. Its a bit annoying, 
> but it shouldn't be too much of a problem given I plan on squashing and 
> merging once this gets approved.
> 
> I added new flags for GPU PGO specifically because I didn't want to modify 
> the PGO flags' existing behavior. PGO has a significant runtime cost, so I 
> figured it would be best for the end user experience to only enable PGO on 
> the GPU when it was specifically requested.

Is this something that specifically requires its own flag? Or could we just do 
`-Xarch_device -fprofile-generate`.

https://github.com/llvm/llvm-project/pull/94268
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268)

2024-06-23 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

Seems to be lots of accidental `clang-format` changes. Why do we need new flags 
for this instead of just using the old ones and changing behavior when the 
target is a known GPU? I.e. SPIR-V, CUDA, or HSA.

https://github.com/llvm/llvm-project/pull/94268
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Incrementing by align is just a bug, of course the size is the real value. 
> Whether we want to continue wasting space is another not-correctness 
> discussion

Struct padding is pretty universal, AMDGPU seems the odd one out here. I 
wouldn't mind it so much if it didn't require me to know which vendor I was 
dealing with in the RPC implementation, but I suppose I could store that 
information somewhere if we want to use a compressed option and we know it 
works.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Here, because the minimum alignment is 4, we will only increment the
> > buffer by 4,
> 
> It should be incrementing by the size? 4 byte aligned access of 8 byte type 
> should work fine

Guess that's an AMD thing, so I'm going to assume that @JonChesterfield wrote 
this intentionally to save on stack space? I suppose the issue I'm having with 
my `printf` implementation is that we then want to copy this struct and because 
it doesn't follow natural alignment the person printing it doesn't know where 
these are stored in a common sense. I suppose I could change the code to just 
be `ptr += sizeof(T)` instead of doing the alignment, but I feel like some 
architectures require strict alignment for these and it wouldn't work in the 
general case.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-21 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

Out of curiosity, how badly does this fail when you use `--offload-new-driver` 
w/ HIP? I swear I'll get that passing the internal test suite eventually, 
there's a single case for emitting IR that comgr uses that I can't seem to fix.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-21 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96370

Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what  the underlying type is. This is incorrect in
the following case.

```c
void foo(int, ...);

void bar() {
  int x;
  void *p;
  foo(0, x, p);
}
```
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.


>From 5ee5bccb5dd4bd1d78dc04ead3c334d88b86f4fd Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 21 Jun 2024 19:17:42 -0500
Subject: [PATCH] [LLVM] Fix incorrect alignment on AMDGPU variadics

Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what  the underlying type is. This is incorrect in
the following case.

```c
void foo(int, ...);

void bar() {
  int x;
  void *p;
  foo(0, x, p);
}
```
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.
---
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  11 +-
 clang/test/CodeGen/amdgpu-variadic-call.c |  32 +-
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |   6 +-
 .../CodeGen/AMDGPU/expand-variadic-call.ll| 574 +-
 4 files changed, 316 insertions(+), 307 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 4d3275e17c386..a169a7d920456 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -121,7 +121,7 @@ void AMDGPUABIInfo::computeInfo(CGFunctionInfo ) const {
 RValue AMDGPUABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
 QualType Ty, AggValueSlot Slot) const {
   const bool IsIndirect = false;
-  const bool AllowHigherAlign = false;
+  const bool AllowHigherAlign = true;
   return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect,
   getContext().getTypeInfoInChars(Ty),
   CharUnits::fromQuantity(4), AllowHigherAlign, Slot);
@@ -212,13 +212,8 @@ ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType 
Ty, bool Variadic,
 
   Ty = useFirstFieldIfTransparentUnion(Ty);
 
-  if (Variadic) {
-return ABIArgInfo::getDirect(/*T=*/nullptr,
- /*Offset=*/0,
- /*Padding=*/nullptr,
- /*CanBeFlattened=*/false,
- /*Align=*/0);
-  }
+  if (Variadic)
+return ABIArgInfo::getDirect();
 
   if (isAggregateTypeForABI(Ty)) {
 // Records with non-trivial destructors/copy-constructors should not be
diff --git a/clang/test/CodeGen/amdgpu-variadic-call.c 
b/clang/test/CodeGen/amdgpu-variadic-call.c
index 17eda215211a2..0529d6b3171c8 100644
--- a/clang/test/CodeGen/amdgpu-variadic-call.c
+++ b/clang/test/CodeGen/amdgpu-variadic-call.c
@@ -1,4 +1,3 @@
-// REQUIRES: amdgpu-registered-target
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature
 // RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s 
-o - | FileCheck %s
 
@@ -179,11 +178,9 @@ typedef struct
 // CHECK-LABEL: define {{[^@]+}}@one_pair_f64
 // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], double 
[[V0_COERCE0:%.*]], double [[V0_COERCE1:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue 
[[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0
-// CHECK-NEXT:[[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] 
[[DOTFCA_0_INSERT]], double [[V0_COERCE1]], 1
-// CHECK-NEXT:tail call void (...) @sink_0([[STRUCT_PAIR_F64]] 
[[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], 
[[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (...) @sink_0(double [[V0_COERCE0]], double 
[[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double 
[[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) 
#[[ATTR2]]
 // CHECK-NEXT:ret void
 //
 void one_pair_f64(int f0, double f1, pair_f64 v0)
@@ -220,10 +217,9 @@ typedef union
 // CHECK-SAME: (i32 noundef

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-21 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96369

Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.

Depends on https://github.com/llvm/llvm-project/pull/96015.


>From 42a7a45c845de377b9b714af39a449fdc49eb768 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 21 Jun 2024 19:10:40 -0500
Subject: [PATCH] [libc] Implement (v|f)printf on the GPU

Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.

Depends on https://github.com/llvm/llvm-project/pull/96015.
---
 .../ClangLinkerWrapper.cpp|  1 +
 libc/config/gpu/entrypoints.txt   | 19 ++---
 libc/src/__support/arg_list.h |  3 +-
 libc/src/gpu/rpc_fprintf.cpp  |  5 +-
 libc/src/stdio/CMakeLists.txt | 24 +-
 libc/src/stdio/generic/CMakeLists.txt | 25 +++
 libc/src/stdio/{ => generic}/fprintf.cpp  |  0
 libc/src/stdio/{ => generic}/vfprintf.cpp |  0
 libc/src/stdio/gpu/CMakeLists.txt | 48 
 libc/src/stdio/gpu/fprintf.cpp| 32 
 libc/src/stdio/gpu/printf.cpp | 30 
 libc/src/stdio/gpu/vfprintf.cpp   | 29 
 libc/src/stdio/gpu/vfprintf_utils.h   | 73 +++
 libc/src/stdio/gpu/vprintf.cpp| 28 +++
 .../integration/src/stdio/gpu/CMakeLists.txt  |  2 +-
 .../test/integration/src/stdio/gpu/printf.cpp | 43 ---
 libc/utils/gpu/server/rpc_server.cpp  | 24 +-
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  8 +-
 18 files changed, 326 insertions(+), 68 deletions(-)
 rename libc/src/stdio/{ => generic}/fprintf.cpp (100%)
 rename libc/src/stdio/{ => generic}/vfprintf.cpp (100%)
 create mode 100644 libc/src/stdio/gpu/fprintf.cpp
 create mode 100644 libc/src/stdio/gpu/printf.cpp
 create mode 100644 libc/src/stdio/gpu/vfprintf.cpp
 create mode 100644 libc/src/stdio/gpu/vfprintf_utils.h
 create mode 100644 libc/src/stdio/gpu/vprintf.cpp

diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index cdfe8cfbd9379..03fd23ae39c29 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1671,6 +1671,7 @@ int main(int Argc, char **Argv) {
 NewArgv.push_back(Arg->getValue());
   for (const opt::Arg *Arg : Args.filtered(OPT_offload_opt_eq_minus))
 NewArgv.push_back(Args.MakeArgString(StringRef("-") + Arg->getValue()));
+  llvm::errs() << "asdfasdf\n";
   cl::ParseCommandLineOptions(NewArgv.size(), [0]);
 
   Verbose = Args.hasArg(OPT_verbose);
diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 2217a696fc5d1..de1ca6bfd151f 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -1,13 +1,3 @@
-if(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU)
-  set(extra_entrypoints
-  # stdio.h entrypoints
-  libc.src.stdio.sprintf
-  libc.src.stdio.snprintf
-  libc.src.stdio.vsprintf
-  libc.src.stdio.vsnprintf
-  )
-endif()
-
 set(TARGET_LIBC_ENTRYPOINTS
 # assert.h entrypoints
 libc.src.assert.__assert_fail
@@ -185,7 +175,14 @@ set(TARGET_LIBC_ENTRYPOINTS
 libc.src.errno.errno
 
 # stdio.h entrypoints
-${extra_entrypoints}
+libc.src.stdio.printf
+libc.src.stdio.vprintf
+libc.src.stdio.fprintf
+libc.src.stdio.vfprintf
+libc.src.stdio.sprintf
+libc.src.stdio.snprintf
+libc.src.stdio.vsprintf
+libc.src.stdio.vsnprintf
 libc.src.stdio.feof
 libc.src.stdio.ferror
 libc.src.stdio.fseek
diff --git a/libc/src/__support/arg_list.h b/libc/src/__support/arg_list.h
index 0965e12afd562..3a4e5ad0fab3c 100644
--- a/libc/src/__support/arg_list.h
+++ b/libc/src/__support/arg_list.h
@@ -54,7 +54,8 @@ class MockArgList {
   }
 
   template  LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter =
+((arg_counter + alignof(T) - 1) / alignof(T)) * alignof(T) + sizeof(T);
 return T(arg_counter);
   }
 
diff --git a/libc/src/gpu/rpc_fprintf.cpp b/libc/src/gpu/rpc_fprintf.cpp
index

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-21 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From 8bd49caa9fa93fd3d0812e0a4315f8ff4956056a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/Basic/Targets/NVPTX.h   |   3 +-
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  43 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 916 insertions(+), 31 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index f476d49047c01..e30eaf808ca93 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
-// FIXME: implement
-return TargetInfo::CharPtrBuiltinVaList;
+return TargetInfo::VoidPtrBuiltinVaList;
   }
 
   bool isValidCPUName(StringRef Name) const override {
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca

[clang] [compiler-rt] [libcxx] [libunwind] [llvm] [openmp] [cmake] switch to CMake's native `check_{compiler,linker}_flag` (PR #96171)

2024-06-20 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Here's a radical question, do we really want to use CMake's support for this? I 
remember a discussion recently about the increasingly large amount of time 
spent in the CMake configuration step, and most of that time is spent during 
these flag checks which pretty much all compile + link some file with no 
parallelism. I've also had issues working with these flags when trying to 
cross-compile things for the GPU, namely because the compilation flags insist 
on checking the linker so I need to do something like `set(CMAKE_REQUIRED_FLAGS 
"-c -flto")` to prevent it from invoking non-LLVM binaries for NVIDIA 
compilation.

https://github.com/llvm/llvm-project/pull/96171
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From 0cae8db24812b2ab5539cc581fbc461af072b5fd Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/Basic/Targets/NVPTX.h   |   3 +-
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  43 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 916 insertions(+), 31 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index f476d49047c01..e30eaf808ca93 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
-// FIXME: implement
-return TargetInfo::CharPtrBuiltinVaList;
+return TargetInfo::VoidPtrBuiltinVaList;
   }
 
   bool isValidCPUName(StringRef Name) const override {
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From a05b24a06429c1ad6c4988f232442d53010e79a9 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  43 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 8 files changed, 915 insertions(+), 29 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32
+// CHECK-NEXT:[[TMP1:%.*]] = load i16, ptr [[S]], align

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> With the possible exception of some alignment handling this looks about as 
> I'd expect it to. Ideally we'd get some feedback from nvptx-associated people 
> but fixing libc is a good sign

Yep, I believe @Artem-B is on vacation, so hopefully @AlexMaclean can chime in. 
This should be ABI compatible with NVIDIA as far as I'm aware.

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits



@@ -938,6 +938,37 @@ struct Amdgpu final : public VariadicABIInfo {
   }
 };
 
+struct NVPTX final : public VariadicABIInfo {
+
+  bool enableForTarget() override { return true; }
+
+  bool vaListPassedInSSARegister() override { return true; }
+
+  Type *vaListType(LLVMContext ) override {
+return PointerType::getUnqual(Ctx);
+  }
+
+  Type *vaListParameterType(Module ) override {
+return PointerType::getUnqual(M.getContext());
+  }
+
+  Value *initializeVaList(Module , LLVMContext , IRBuilder<> ,
+  AllocaInst *, Value *Buffer) override {
+return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M));
+  }
+
+  VAArgSlotInfo slotInfo(const DataLayout , Type *Parameter) override {
+// NVPTX doesn't apply minimum alignment to types present in structs. Types
+// with alignment less than four should be promoted by the compiler and 
will
+// get the proper minimum alignment in those cases.
+const unsigned MinAlign = 1;

jhuber6 wrote:

So, the standard varargs handling will automatically promote things like shorts 
to ints and floats to doubles. What the comment means is that `clang` already 
handled the size / alignment in those cases, so we need to use a minimum 
alignment of 1 so we respect the alignment for things that clang didn't modify.

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits



@@ -17,6 +17,8 @@
 #define MODULE_PASS(NAME, CREATE_PASS)
 #endif
 MODULE_PASS("generic-to-nvvm", GenericToNVVMPass())
+MODULE_PASS("expand-variadics",

jhuber6 wrote:

Couldn't remember if adding it to `addIRPasses` applied to all uses. I remember 
something like different pipeline configurations using different things. I'll 
try to figure it out.

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From bf6f8852621f4a5ac58e6d062d7c78e5eb639c1a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXPassRegistry.def   |   2 +
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  44 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 918 insertions(+), 29 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32
+//

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-18 Thread Joseph Huber via cfe-commits



@@ -203,8 +203,15 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
-  for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+
+  unsigned ArgumentsCount = 0;
+  for (auto  : FI.arguments()) {
+if (FI.isVariadic() && ArgumentsCount > 0)

jhuber6 wrote:

You're right, this needs to account for all fixed arguments, not just the first 
(guaranteed) one. NVIDIA seems to handle it where the fixed arguments are 
passed using the regular ABI (can be indirect or direct) while the variadic 
arguments are always direct. Is there an easy way to check if an argument is 
part of the variadic set? Maybe if the argument number > number of arguments?

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96015

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.


>From 01d101dff102e4465ec284818f234152cd09c8da Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  16 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXPassRegistry.def   |   2 +
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  44 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 922 insertions(+), 30 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..1a5205eb4dabc 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,15 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
-  for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+
+  unsigned ArgumentsCount = 0;
+  for (auto  : FI.arguments()) {
+if (FI.isVariadic() && ArgumentsCount > 0)
+  I.info = ABIArgInfo::getDirect();
+else
+  I.info = classifyArgumentType(I.type);
+++ArgumentsCount;
+  }
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +222,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..b47a5d7a2670d
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

LG overall, the growing number of "Is gpu target and some vendor" in the Driver 
is concerning.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-18 Thread Joseph Huber via cfe-commits



@@ -907,7 +907,8 @@ void CodeGenModule::Release() {
   if (Context.getTargetInfo().getTriple().isWasm())
 EmitMainVoidAlias();
 
-  if (getTriple().isAMDGPU()) {
+  if (getTriple().isAMDGPU() ||
+  (getTriple().isSPIRV() && getTriple().getVendor() == llvm::Triple::AMD)) 
{

jhuber6 wrote:

I'm wondering if we should add `isAMD` to `llvm::Triple` or something.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

If you really need this, perhaps you can check if the Triple will invoke the 
fallback toolchain or something? Would be a lack of vendor in the Triple.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > I thought that clang accepted `-rpath `? I see that format when I try 
> > CPU offloading.
> 
> Yeah, but when running `--target=x86_64` and underlying gcc command is issued 
> and complains about `-rpath `

Oh, I see. When using `-fopenmp-targets=x86_64` it goes through the default GCC 
toolchain because you gave it no information. I'm wondering if we should bother 
supporting that since it's supposed to be 
`-fopenmp-targets=x86-64-unknown-linux-gnu` or similar. The GCC fallback isn't 
really guaranteed to work.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

I remember intentionally using the clang argument format instead of 
`-Wl,-rpath,` because the `-Wl` format would try to forward it to things 
like `nvlink` which don't support it.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits





jhuber6 wrote:

The tests use an option that causes nothing to actually run, so it only uses 
the filename.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits





jhuber6 wrote:

What is this?

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

I thought that clang accepted `-rpath `? I see that format when I try CPU 
offloading.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-10 Thread Joseph Huber via cfe-commits



@@ -128,12 +128,13 @@ enum class CudaArch {
   GFX12_GENERIC,
   GFX1200,
   GFX1201,
+  AMDGCNSPIRV,
   Generic, // A processor model named 'generic' if the target backend defines a
// public one.
   LAST,
 
   CudaDefault = CudaArch::SM_52,
-  HIPDefault = CudaArch::GFX906,
+  HIPDefault = CudaArch::AMDGCNSPIRV,

jhuber6 wrote:

Yeah, makes sense. But doesn't the SPIR-V toolchain require extra tools?

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-10 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-10 Thread Joseph Huber via cfe-commits



@@ -128,12 +128,13 @@ enum class CudaArch {
   GFX12_GENERIC,
   GFX1200,
   GFX1201,
+  AMDGCNSPIRV,
   Generic, // A processor model named 'generic' if the target backend defines a
// public one.
   LAST,
 
   CudaDefault = CudaArch::SM_52,
-  HIPDefault = CudaArch::GFX906,
+  HIPDefault = CudaArch::AMDGCNSPIRV,

jhuber6 wrote:

Why was the default changed here? The default here is just what HIP compilation 
gives without an explicit arch. Is this intentional?

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Ooh... I think I know exactly what may be causing this.

I've observed this a few times. For my case it's usually when some application 
hangs on the GPU and no one notices, then these tools hang forever and it takes 
awhile to notice. Figured an error is friendlier since I highly doubt these 
tools will take over ten seconds to run even in the worst case.

> On machines where NVIDIA GPUs are used for compute only (e.g. a headless 
> server machine), NVIDIA drivers are not always loaded by default and may not 
> have driver persistence enabled.

What's the config to set this by default without any graphics? Would be nice to 
not need to worry about it on my dev machine.

> For the GPU detection, we may be able to work around the issue by leaving the 
> detection app running for the duration of the compilation, and prevent driver 
> unloading, but it's a rather gross hack.

I know for AMD stuff we used to just probe the PCI connections, but that leaked 
a lot of information so this is the easier way to do it. I wonder what 
`__nvcc_device_query` does internally.

https://github.com/llvm/llvm-project/pull/94751
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Fix passing target id features to AMDGPU offloading (PR #94765)

2024-06-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/94765
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Fix passing target id features to AMDGPU offloading (PR #94765)

2024-06-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/94765

Summary:
AMDGPU supports a `target-id` feature which is used to qualify targets
with different incompatible features. These are both rules and target
features. Currently, we pass `-target-cpu` twice when offloading to
OpenMP, and do not pass the target-id features at all. The effect was
that passing something like `--offload-arch=gfx90a:xnack+` would show up
as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the
xnack completely and passing it twice. This patch fixes that to pass it
once and then separate it like how HIP does.


>From 9c91b92e62c60720fbd142660fd723dd1838400a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 7 Jun 2024 10:59:26 -0500
Subject: [PATCH] [OpenMP] Fix passing target id features to AMDGPU offloading

Summary:
AMDGPU supports a `target-id` feature which is used to qualify targets
with different incompatible features. These are both rules and target
features. Currently, we pass `-target-cpu` twice when offloading to
OpenMP, and do not pass the target-id features at all. The effect was
that passing something like `--offload-arch=gfx90a:xnack+` would show up
as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the
xnack completely and passing it twice. This patch fixes that to pass it
once and then separate it like how HIP does.
---
 clang/lib/Driver/ToolChains/AMDGPU.cpp   | 6 +-
 clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 5 -
 clang/test/Driver/amdgpu-openmp-toolchain.c  | 3 ++-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp 
b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 11a98a0ec314d..20f879e2f75cb 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -645,7 +645,11 @@ void amdgpu::getAMDGPUTargetFeatures(const Driver ,
  std::vector ) {
   // Add target ID features to -target-feature options. No diagnostics should
   // be emitted here since invalid target ID is diagnosed at other places.
-  StringRef TargetID = Args.getLastArgValue(options::OPT_mcpu_EQ);
+  StringRef TargetID;
+  if (Args.hasArg(options::OPT_mcpu_EQ))
+TargetID = Args.getLastArgValue(options::OPT_mcpu_EQ);
+  else if (Args.hasArg(options::OPT_march_EQ))
+TargetID = Args.getLastArgValue(options::OPT_march_EQ);
   if (!TargetID.empty()) {
 llvm::StringMap FeatureMap;
 auto OptionalGpuArch = parseTargetID(Triple, TargetID, );
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index cca18431ff773..d17ecb15c8208 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -44,14 +44,9 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions(
 Action::OffloadKind DeviceOffloadingKind) const {
   HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);
 
-  StringRef GPUArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
-  assert(!GPUArch.empty() && "Must have an explicit GPU arch.");
-
   assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
  "Only OpenMP offloading kinds are supported.");
 
-  CC1Args.push_back("-target-cpu");
-  CC1Args.push_back(DriverArgs.MakeArgStringRef(GPUArch));
   CC1Args.push_back("-fcuda-is-device");
 
   if (DriverArgs.hasArg(options::OPT_nogpulib))
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index ef58c2c4e3f3a..49af04acc4639 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -7,7 +7,7 @@
 
 // verify the tools invocations
 // CHECK: "-cc1" "-triple" 
"x86_64-unknown-linux-gnu"{{.*}}"-emit-llvm-bc"{{.*}}"-x" "c"
-// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" 
"x86_64-unknown-linux-gnu"{{.*}}"-target-cpu" 
"gfx906"{{.*}}"-fcuda-is-device"{{.*}}
+// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" 
"x86_64-unknown-linux-gnu"{{.*}}"-fcuda-is-device"{{.*}}"-target-cpu" "gfx906"
 // CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-emit-obj"
 // CHECK: clang-linker-wrapper{{.*}} "-o" "a.out"
 
@@ -63,6 +63,7 @@
 
 // RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a:sramecc-:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID
+// CHECK-TARGET-ID: "-cc1" "-triple" "amdgcn-amd-amdhsa" {{.*}} "-target-cpu" 
"gfx90a" "-target-feature" "-sramecc" "-target-feature" "+xnack"
 // CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack
 
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/94751
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/94751

>From 0e367c72a1cc163fd781f98b9fac809d90f4beb7 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 7 Jun 2024 08:15:06 -0500
Subject: [PATCH] [Clang] Add timeout for GPU detection utilities

Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.

This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.
---
 clang/include/clang/Driver/ToolChain.h | 3 ++-
 clang/lib/Driver/ToolChain.cpp | 8 
 clang/lib/Driver/ToolChains/AMDGPU.cpp | 2 +-
 clang/lib/Driver/ToolChains/Cuda.cpp   | 2 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/clang/include/clang/Driver/ToolChain.h 
b/clang/include/clang/Driver/ToolChain.h
index a4f9cad98aa8b..9789cfacafd78 100644
--- a/clang/include/clang/Driver/ToolChain.h
+++ b/clang/include/clang/Driver/ToolChain.h
@@ -205,7 +205,8 @@ class ToolChain {
 
   /// Executes the given \p Executable and returns the stdout.
   llvm::Expected>
-  executeToolChainProgram(StringRef Executable) const;
+  executeToolChainProgram(StringRef Executable,
+  unsigned SecondsToWait = 0) const;
 
   void setTripleEnvironment(llvm::Triple::EnvironmentType Env);
 
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 0e86bc07e0ea2..40ab2e91125d1 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -104,7 +104,8 @@ ToolChain::ToolChain(const Driver , const llvm::Triple ,
 }
 
 llvm::Expected>
-ToolChain::executeToolChainProgram(StringRef Executable) const {
+ToolChain::executeToolChainProgram(StringRef Executable,
+   unsigned SecondsToWait) const {
   llvm::SmallString<64> OutputFile;
   llvm::sys::fs::createTemporaryFile("toolchain-program", "txt", OutputFile);
   llvm::FileRemover OutputRemover(OutputFile.c_str());
@@ -115,9 +116,8 @@ ToolChain::executeToolChainProgram(StringRef Executable) 
const {
   };
 
   std::string ErrorMessage;
-  if (llvm::sys::ExecuteAndWait(Executable, {}, {}, Redirects,
-/* SecondsToWait */ 0,
-/*MemoryLimit*/ 0, ))
+  if (llvm::sys::ExecuteAndWait(Executable, {}, {}, Redirects, SecondsToWait,
+/*MemoryLimit=*/0, ))
 return llvm::createStringError(std::error_code(),
Executable + ": " + ErrorMessage);
 
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp 
b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 9ffea57b005de..11a98a0ec314d 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -877,7 +877,7 @@ AMDGPUToolChain::getSystemGPUArchs(const ArgList ) 
const {
   else
 Program = GetProgramPath("amdgpu-arch");
 
-  auto StdoutOrErr = executeToolChainProgram(Program);
+  auto StdoutOrErr = executeToolChainProgram(Program, /*SecondsToWait=*/10);
   if (!StdoutOrErr)
 return StdoutOrErr.takeError();
 
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index bbc8be91fd70b..2dfc7457b0ac7 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -826,7 +826,7 @@ NVPTXToolChain::getSystemGPUArchs(const ArgList ) 
const {
   else
 Program = GetProgramPath("nvptx-arch");
 
-  auto StdoutOrErr = executeToolChainProgram(Program);
+  auto StdoutOrErr = executeToolChainProgram(Program, /*SecondsToWait=*/10);
   if (!StdoutOrErr)
 return StdoutOrErr.takeError();
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

No active test because I have no clue how you would, but I intentionally made 
it time out and it returns a 'Child timed out` error as expected.

https://github.com/llvm/llvm-project/pull/94751
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/94751

Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.

This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.


>From a7bcd6b0568b00d0cac9bf0a6f9b17ca681425f2 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 7 Jun 2024 08:15:06 -0500
Subject: [PATCH] [Clang] Add timeout for GPU detection utilities

Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.

This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.
---
 clang/include/clang/Driver/ToolChain.h | 2 +-
 clang/lib/Driver/ToolChain.cpp | 8 
 clang/lib/Driver/ToolChains/AMDGPU.cpp | 2 +-
 clang/lib/Driver/ToolChains/Cuda.cpp   | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/clang/include/clang/Driver/ToolChain.h 
b/clang/include/clang/Driver/ToolChain.h
index a4f9cad98aa8b..87a5034dfd78b 100644
--- a/clang/include/clang/Driver/ToolChain.h
+++ b/clang/include/clang/Driver/ToolChain.h
@@ -205,7 +205,7 @@ class ToolChain {
 
   /// Executes the given \p Executable and returns the stdout.
   llvm::Expected>
-  executeToolChainProgram(StringRef Executable) const;
+  executeToolChainProgram(StringRef Executable, unsigned Timeout = 0) const;
 
   void setTripleEnvironment(llvm::Triple::EnvironmentType Env);
 
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 0e86bc07e0ea2..8c746ac8066cb 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -104,7 +104,8 @@ ToolChain::ToolChain(const Driver , const llvm::Triple ,
 }
 
 llvm::Expected>
-ToolChain::executeToolChainProgram(StringRef Executable) const {
+ToolChain::executeToolChainProgram(StringRef Executable,
+   unsigned Timeout) const {
   llvm::SmallString<64> OutputFile;
   llvm::sys::fs::createTemporaryFile("toolchain-program", "txt", OutputFile);
   llvm::FileRemover OutputRemover(OutputFile.c_str());
@@ -115,9 +116,8 @@ ToolChain::executeToolChainProgram(StringRef Executable) 
const {
   };
 
   std::string ErrorMessage;
-  if (llvm::sys::ExecuteAndWait(Executable, {}, {}, Redirects,
-/* SecondsToWait */ 0,
-/*MemoryLimit*/ 0, ))
+  if (llvm::sys::ExecuteAndWait(Executable, {}, {}, Redirects, Timeout,
+/*MemoryLimit=*/0, ))
 return llvm::createStringError(std::error_code(),
Executable + ": " + ErrorMessage);
 
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp 
b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 9ffea57b005de..92895d8186e83 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -877,7 +877,7 @@ AMDGPUToolChain::getSystemGPUArchs(const ArgList ) 
const {
   else
 Program = GetProgramPath("amdgpu-arch");
 
-  auto StdoutOrErr = executeToolChainProgram(Program);
+  auto StdoutOrErr = executeToolChainProgram(Program, /*Timeout=*/10);
   if (!StdoutOrErr)
 return StdoutOrErr.takeError();
 
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index bbc8be91fd70b..47dac0e439f10 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -826,7 +826,7 @@ NVPTXToolChain::getSystemGPUArchs(const ArgList ) 
const {
   else
 Program = GetProgramPath("nvptx-arch");
 
-  auto StdoutOrErr = executeToolChainProgram(Program);
+  auto StdoutOrErr = executeToolChainProgram(Program, /*Timeout=*/10);
   if (!StdoutOrErr)
 return StdoutOrErr.takeError();
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-05 Thread Joseph Huber via cfe-commits



@@ -8,10 +8,15 @@ add_custom_target(libc-long-running-tests)
 
 add_subdirectory(UnitTest)
 
-if(LIBC_TARGET_OS_IS_GPU AND

jhuber6 wrote:

Done

https://github.com/llvm/llvm-project/pull/93362
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-05 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> Early exit on lack of va_start will be incorrect in the lowering case, which 
> is the only one enabled by default. I believe existing comments are all 
> addressed.

Figured if there's no `va_start` there's nothing for the pass to do anyway.

> Precommit the cmake diagnostic tweak sounds good, would you like to land that?

Sure

https://github.com/llvm/llvm-project/pull/93362
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-05 Thread Joseph Huber via cfe-commits



@@ -8,10 +8,15 @@ add_custom_target(libc-long-running-tests)
 
 add_subdirectory(UnitTest)
 
-if(LIBC_TARGET_OS_IS_GPU AND

jhuber6 wrote:

Can we precommit or move this to a separate patch

https://github.com/llvm/llvm-project/pull/93362
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-05 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/93362
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-05 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

Overall I think this is good. Address the existing comments and I think we 
should be able to land it. Potentially we should be able to check for the 
existence of `va_start` in the module to early-exit like you said, which will 
keep functional changes to a minimum.

FWIW if you use `clangd` with `clang-tidy` it should let you automatically fix 
the LLVM style issues.

https://github.com/llvm/llvm-project/pull/93362
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-05 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> An offline suggestion from Pierre is that this should early-exit if there are 
> no variadic functions in the module. That's a good thing, I'd like to 
> consider it another of the increase-complexity-for-decreased-compile-time to 
> implement after something has landed.

I thought any use of varargs would introduce LLVM intrinsics. Shouldn't it be 
trivial to look if any of those exist in the module?

https://github.com/llvm/llvm-project/pull/93362
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-03 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

I can confirm that it passes the tests against the `libc` targets, namely basic 
`stdarg.h` implementations and `sprintf`.

https://github.com/llvm/llvm-project/pull/93362
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Driver] Remove a bunch of unnecessary REQUIRES constraints (PR #94055)

2024-05-31 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

I've wondered about these as well, there might also be some OpenMP tests that 
have `requries powerpc-registered-target` or similar  that could be removed. I 
guess we'll see what  the CI thinks with this patch.

https://github.com/llvm/llvm-project/pull/94055
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-29 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

ping

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] clang/Driver/Options.td - fix typo in fopenmp-force-usm HelpText (PR #93599)

2024-05-28 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/93599
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Offload] Move HIP and CUDA to new driver by default (PR #84420)

2024-05-21 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/84420

>From b0dc390bc52059d7a31b5f0878ffb8024201774d Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 7 Mar 2024 15:48:00 -0600
Subject: [PATCH] [Offload] Move HIP and CUDA to new driver by default

Summary:
This patch updates the `--offload-new-driver` flag to be default for all
current offloading languages. This mostly just required updating a lot
of tests to use the old format. I tried to update them where possible,
but some were directly checking the old format.

This is not intended to be landed immediately, but to allow for greater
testing. One potential issue I've discovered is the lack of SPIR-V
support or handling for `--offload`.
---
 clang/lib/Driver/Driver.cpp   |  6 ++---
 clang/lib/Driver/ToolChains/Clang.cpp | 10 ---
 clang/test/Driver/cl-offload.cu   |  5 ++--
 clang/test/Driver/cuda-arch-translation.cu| 26 +--
 clang/test/Driver/cuda-bindings.cu| 24 -
 clang/test/Driver/cuda-options.cu | 23 
 clang/test/Driver/cuda-output-asm.cu  |  4 ---
 clang/test/Driver/cuda-version-check.cu   |  6 ++---
 clang/test/Driver/hip-gz-options.hip  |  1 -
 clang/test/Driver/hip-invalid-target-id.hip   |  4 +--
 clang/test/Driver/hip-macros.hip  |  3 ---
 clang/test/Driver/hip-offload-arch.hip|  2 +-
 clang/test/Driver/hip-options.hip |  8 ++
 clang/test/Driver/hip-sanitize-options.hip|  2 +-
 clang/test/Driver/hip-save-temps.hip  | 12 -
 .../test/Driver/hip-toolchain-device-only.hip |  4 ---
 clang/test/Driver/hip-toolchain-mllvm.hip |  2 --
 clang/test/Driver/invalid-offload-options.cpp |  2 +-
 clang/test/Preprocessor/cuda-preprocess.cu|  8 +++---
 clang/unittests/Tooling/ToolingTest.cpp   |  6 ++---
 20 files changed, 71 insertions(+), 87 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 2868b4f2b02e9..0b5283ffa5bcc 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4146,9 +4146,9 @@ void Driver::BuildActions(Compilation , DerivedArgList 
,
   handleArguments(C, Args, Inputs, Actions);
 
   bool UseNewOffloadingDriver =
-  C.isOffloadingHostKind(Action::OFK_OpenMP) ||
+  C.getActiveOffloadKinds() != Action::OFK_None &&
   Args.hasFlag(options::OPT_offload_new_driver,
-   options::OPT_no_offload_new_driver, false);
+   options::OPT_no_offload_new_driver, true);
 
   // Builder to be used to build offloading actions.
   std::unique_ptr OffloadBuilder =
@@ -4857,7 +4857,7 @@ Action *Driver::ConstructPhaseAction(
offloadDeviceOnly() ||
(TargetDeviceOffloadKind == Action::OFK_HIP &&
 !Args.hasFlag(options::OPT_offload_new_driver,
-  options::OPT_no_offload_new_driver, false)))
+  options::OPT_no_offload_new_driver, true)))
   ? types::TY_LLVM_IR
   : types::TY_LLVM_BC;
   return C.MakeAction(Input, Output);
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 6d2015b2cd156..3bed0d4d785d9 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4808,8 +4808,9 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
   bool IsHostOffloadingAction =
   JA.isHostOffloading(Action::OFK_OpenMP) ||
   (JA.isHostOffloading(C.getActiveOffloadKinds()) &&
+   C.getActiveOffloadKinds() != Action::OFK_None &&
Args.hasFlag(options::OPT_offload_new_driver,
-options::OPT_no_offload_new_driver, false));
+options::OPT_no_offload_new_driver, true));
 
   bool IsRDCMode =
   Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
@@ -5133,7 +5134,7 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 if (IsUsingLTO) {
   if (IsDeviceOffloadAction && !JA.isDeviceOffloading(Action::OFK_OpenMP) 
&&
   !Args.hasFlag(options::OPT_offload_new_driver,
-options::OPT_no_offload_new_driver, false) &&
+options::OPT_no_offload_new_driver, true) &&
   !Triple.isAMDGPU()) {
 D.Diag(diag::err_drv_unsupported_opt_for_target)
 << Args.getLastArg(options::OPT_foffload_lto,
@@ -6660,8 +6661,9 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
   }
 
   // Forward the new driver to change offloading code generation.
-  if (Args.hasFlag(options::OPT_offload_new_driver,
-   options::OPT_no_offload_new_driver, false))
+  if (C.getActiveOffloadKinds() != Action::OFK_None &&
+  Args.hasFlag(options::OPT_offload_new_driver,
+   options::OPT_no_offload_new_driver, true))

[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > > If `-march` is the wrong option then let's start deprecating it and 
> > > remove it altogether in the next llvm release. But, as long as it is 
> > > here, it should be equivalent to `--offload-arch`.
> > 
> > 
> > Honestly not a bad idea. I could make a patch warning users to use 
> > `--offload-arch` instead for now.
> 
> Sure, let's do that. But, let this land as long as this option is supported.

That doesn't track, LLVM has never supported `-march` to support multiple 
options and there's no reason to add it now when we're talking about 
deprecating it.

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> If `-march` is the wrong option then let's start deprecating it and remove it 
> altogether in the next llvm release. But, as long as it is here, it should be 
> equivalent to `--offload-arch`.

Honestly not a bad idea. I could make a patch warning users to use 
`--offload-arch` instead for now. 

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > I don't think we want to support this. `-march` was the wrong option to use 
> > in the first place, and upstream LLVM never supported specifying multiple 
> > device images with `-march` so there isn't a legacy argument in trunk. 
> > However, AOMP did support this and if it's deemed too disruptive to request 
> > users move to `--offload-arch=a,b,c` then we can carry that change in the 
> > fork.
> > > It will fix tests like: 
> > > [targetid_multi_image](https://github.com/ROCm/aomp/tree/aomp-dev/test/smoke/targetid_multi_image)
> > 
> > 
> > I think the easier way to fix this is to update the Makefile.
> 
> Irrespective of what AOMP does, I think it makes sense to ensure parity 
> between the two ways of specifying architecture. People have been 
> historically using `-Xopenmp-target -march` style, and using the same for 
> multiple architectures seems to be the most obvious choice. Isn't it quite 
> confusing to tell the users that they can use `offload-arch` style for single 
> as well as multiple archs, but can use `-march` style only for single arch?

`-march` was the wrong option to use for this from the beginning. It's supposed 
to be an overriding option and it shouldn't be overloaded to mean something 
different here. In LLVM / trunk we never supported multiple architectures with 
the `-march` option so I don't see any reason to start now. `--offload-arch=` 
is a complete replacement for this behavior and I consider the single `-march` 
option to be legacy. Even within this it's divergent because HIP / OpenCL / 
AMDGPU use `-mcpu` but the OpenMP toolchain ignored that and uses `-march=`.

Using `--offload-arch=` is a direct replacement for `-march` in all use-cases. 
It's also easier to use and interoperable with CUDA. I would just change the 
test, you can replace every use of `-march` with `--offload-arch` and it will 
work. See the following.
```console
> clang input.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda 
> -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx1030 
> -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a 
> -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_89 
> llvm-objdump --offloading a.out


a.out:  file format elf64-x86-64

OFFLOADING IMAGE [0]:
kindelf
archsm_89
triple  nvptx64-nvidia-cuda
produceropenmp

OFFLOADING IMAGE [1]:
kindelf
archgfx90a
triple  amdgcn-amd-amdhsa
produceropenmp

OFFLOADING IMAGE [2]:
kindelf
archgfx1030
triple  amdgcn-amd-amdhsa
produceropenmp

```

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

I don't think we want to support this. `-march` was the wrong option to use in 
the first place, and upstream LLVM never supported specifying multiple device 
images with `-march` so there isn't a legacy argument in trunk. However, AOMP 
did support this and if it's deemed too disruptive to request users move to 
`--offload-arch=a,b,c` then we can carry that change in the fork.

> It will fix tests like: 
> [targetid_multi_image](https://github.com/ROCm/aomp/tree/aomp-dev/test/smoke/targetid_multi_image)

I think the easier way to fix this is to update the Makefile.

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] c5cd049 - [Clang][Fixup] Fix deleted constructor on older compilers

2024-05-14 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-05-14T18:43:42-05:00
New Revision: c5cd049566a795ba5de88dfbb2eb563cad4a9d8a

URL: 
https://github.com/llvm/llvm-project/commit/c5cd049566a795ba5de88dfbb2eb563cad4a9d8a
DIFF: 
https://github.com/llvm/llvm-project/commit/c5cd049566a795ba5de88dfbb2eb563cad4a9d8a.diff

LOG: [Clang][Fixup] Fix deleted constructor on older compilers

Added: 


Modified: 
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

Removed: 




diff  --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index aee98c5a524ad..07a8d53c04b16 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1195,7 +1195,7 @@ Expected> linkAndWrapDeviceFiles(
   // Initialize the images with any overriding inputs.
   if (Args.hasArg(OPT_override_image))
 if (Error Err = handleOverrideImages(Args, Images))
-  return Err;
+  return std::move(Err);
 
   auto Err = parallelForEachError(LinkerInputFiles, [&](auto ) -> Error {
 llvm::TimeTraceScope TimeScope("Link device input");



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Add an overriding option for debugging (PR #91984)

2024-05-14 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/91984
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Add an overriding option for debugging (PR #91984)

2024-05-14 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> would it be more useful to allow swapping the output by environment variable 
> and MD5 hash, e.g.
> 
> CLANG_LINK_WRAPPER_SWAP_OUTPUT=hash1:file1,hash2:file2
> 
> it calculates the MD5 hash of the output file, if matching, swap it with the 
> specified file. This way, we can set an env var to swap any linker wrapper 
> output in a normal build.

I don't think we need an environment variable since the output will always come 
from the link phase, which you can pass arguments to. While I can see it being 
more convenient, I don't really see this being used as a legitimate tool for 
anything but debugging small applications.

Manipulating the hash is an interesting idea, since we can have multiple files 
it might be nice to only replace one of them. However, this is a pretty big 
hammer so I'm unsure if we'll need this kind of fine-grained control for what I 
only intend to use for triaging bugs.

https://github.com/llvm/llvm-project/pull/91984
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Add an overriding option for debugging (PR #91984)

2024-05-14 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/91984

>From 4c60b32a4c1916a3ba575d4edc6d79f9b194ab03 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 13 May 2024 10:53:55 -0500
Subject: [PATCH] [LinkerWrapper] Add an overriding option for debugging

Summary:
One of the downsides of the linker wrapper is that it made debugging
more difficult. It is very powerful in that it can resolve a lot of
input matching and library handling that could not be done before.
However, the old method allowed users to simply copy-paste the script
files to modify the output and test it.

This patch attempts to make it easier to debug changes by letting the
user override all the linker inputs. That is, we provide a user-created
binary that is treated like the final output of the device link step.
The intended use-case is for using `-save-temps` to get some IR, then
modifying the IR and sticking it back in to see if it exhibits the old
failures.
---
 clang/docs/ClangLinkerWrapper.rst | 38 
 clang/test/Driver/linker-wrapper.c|  7 +++
 .../ClangLinkerWrapper.cpp| 43 +++
 .../clang-linker-wrapper/LinkerWrapperOpts.td |  4 ++
 4 files changed, 92 insertions(+)

diff --git a/clang/docs/ClangLinkerWrapper.rst 
b/clang/docs/ClangLinkerWrapper.rst
index 3bef558475735..99352863b4773 100644
--- a/clang/docs/ClangLinkerWrapper.rst
+++ b/clang/docs/ClangLinkerWrapper.rst
@@ -46,6 +46,8 @@ only for the linker wrapper will be forwarded to the wrapped 
linker job.
 -lSearch for library 
 --opt-level=
Optimization level for LTO
+--override-image=
+Uses the provided file as if it were the output of 
the device link step
 -o   Path to file to write output
 --pass-remarks-analysis=
Pass remarks for LTO
@@ -87,6 +89,42 @@ other. Generally, this requires that the target triple and 
architecture match.
 An exception is made when the architecture is listed as ``generic``, which will
 cause it be linked with any other device code with the same target triple.
 
+Debugging
+=
+
+The linker wrapper performs a lot of steps internally, such as input matching,
+symbol resolution, and image registration. This makes it difficult to debug in
+some scenarios. The behavior of the linker-wrapper is controlled mostly through
+metadata, described in `clang documentation
+`_. Intermediate output can
+be obtained from the linker-wrapper using the ``--save-temps`` flag. These 
files
+can then be modified.
+
+.. code-block:: sh
+
+  $> clang openmp.c -fopenmp --offload-arch=gfx90a -c
+  $> clang openmp.o -fopenmp --offload-arch=gfx90a -Wl,--save-temps
+  $> ; Modify temp files.
+  $> llvm-objcopy --update-section=.llvm.offloading=out.bc openmp.o
+
+Doing this will allow you to override one of the input files by replacing its
+embedded offloading metadata with a user-modified version. However, this will 
be
+more difficult when there are multiple input files. For a very large hammer, 
the
+``--override-image==`` flag can be used.
+
+In the following example, we use the ``--save-temps`` to obtain the LLVM-IR 
just
+before running the backend. We then modify it to test altered behavior, and 
then
+compile it to a binary. This can then be passed to the linker-wrapper which 
will
+then ignore all embedded metadata and use the provided image as if it were the
+result of the device linking phase.
+
+.. code-block:: sh
+
+  $> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--save-temps
+  $> ; Modify temp files.
+  $> clang --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib out.bc -o a.out
+  $> clang openmp.c -fopenmp --offload-arch=gfx90a 
-Wl,--override-image=openmp=a.out
+
 Example
 ===
 
diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index 51bf98b2ed39d..0d05f913aad63 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -226,3 +226,10 @@ __attribute__((visibility("protected"), used)) int x;
 // RELOCATABLE-LINK-CUDA: fatbinary{{.*}} -64 --create {{.*}}.fatbin 
--image=profile=sm_89,file={{.*}}.img
 // RELOCATABLE-LINK-CUDA: /usr/bin/ld.lld{{.*}}-r
 // RELOCATABLE-LINK-CUDA: llvm-objcopy{{.*}}a.out --remove-section 
.llvm.offloading
+
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld --override=image=openmp=%t.o %t.o -o a.out 
2>&1 \
+// RUN: | FileCheck %s --check-prefix=OVERRIDE
+// OVERRIDE-NOT: clang
+// OVERRIDE: /usr/bin/ld
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 69d8cb446fad1..aee98c5a524ad 100644
---

[clang] [LinkerWrapper] Add an overriding option for debugging (PR #91984)

2024-05-13 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/91984

Summary:
One of the downsides of the linker wrapper is that it made debugging
more difficult. It is very powerful in that it can resolve a lot of
input matching and library handling that could not be done before.
However, the old method allowed users to simply copy-paste the script
files to modify the output and test it.

This patch attempts to make it easier to debug changes by letting the
user override all the linker inputs. That is, we provide a user-created
binary that is treated like the final output of the device link step.
The intended use-case is for using `-save-temps` to get some IR, then
modifying the IR and sticking it back in to see if it exhibits the old
failures.


>From 4164203d3afbd2930a98e720ba29b6da1935626a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 13 May 2024 10:53:55 -0500
Subject: [PATCH] [LinkerWrapper] Add an overriding option for debugging

Summary:
One of the downsides of the linker wrapper is that it made debugging
more difficult. It is very powerful in that it can resolve a lot of
input matching and library handling that could not be done before.
However, the old method allowed users to simply copy-paste the script
files to modify the output and test it.

This patch attempts to make it easier to debug changes by letting the
user override all the linker inputs. That is, we provide a user-created
binary that is treated like the final output of the device link step.
The intended use-case is for using `-save-temps` to get some IR, then
modifying the IR and sticking it back in to see if it exhibits the old
failures.
---
 clang/docs/ClangLinkerWrapper.rst | 38 
 clang/test/Driver/linker-wrapper.c|  7 +++
 .../ClangLinkerWrapper.cpp| 43 +++
 .../clang-linker-wrapper/LinkerWrapperOpts.td |  4 ++
 4 files changed, 92 insertions(+)

diff --git a/clang/docs/ClangLinkerWrapper.rst 
b/clang/docs/ClangLinkerWrapper.rst
index 3bef558475735..5b6c1b1e362f1 100644
--- a/clang/docs/ClangLinkerWrapper.rst
+++ b/clang/docs/ClangLinkerWrapper.rst
@@ -46,6 +46,8 @@ only for the linker wrapper will be forwarded to the wrapped 
linker job.
 -lSearch for library 
 --opt-level=
Optimization level for LTO
+--override-image=
+Uses the provided file as if it were the output of 
the device link step
 -o   Path to file to write output
 --pass-remarks-analysis=
Pass remarks for LTO
@@ -87,6 +89,42 @@ other. Generally, this requires that the target triple and 
architecture match.
 An exception is made when the architecture is listed as ``generic``, which will
 cause it be linked with any other device code with the same target triple.
 
+Debugging
+=
+
+The linker wrapper performs a lot of steps internally, such as input matching, 
+symbol resolution, and image registration. This makes it difficult to debug in 
+some scenarios. The behavior of the linker-wrapper is controlled mostly through
+metadata, described in `clang documentation
+`_. Intermediate output can 
+be obtained from the linker-wrapper using the ``--save-temps`` flag. These 
files 
+can then be modified.
+
+.. code-block:: sh
+
+  $> clang openmp.c -fopenmp --offload-arch=gfx90a -c
+  $> clang openmp.o -fopenmp --offload-arch=gfx90a -Wl,--save-temps
+  $> ; Modify temp files.
+  $> llvm-objcopy --update-section=.llvm.offloading=out.bc openmp.o
+
+Doing this will allow you to override one of the input files by replacing its 
+embedded offloading metadata with a user-modified version. However, this will 
be 
+more difficult when there are multiple input files. For a very large hammer, 
the 
+``--override-image==`` flag can be used.
+
+In the following example, we use the ``--save-temps`` to obtain the LLVM-IR 
just 
+before running the backend. We then modify it to test altered behavior, and 
then 
+compile it to a binary. This can then be passed to the linker-wrapper which 
will 
+then ignore all embedded metadata and use the provided image as if it were the 
+result of the device linking phase.
+
+.. code-block:: sh
+
+  $> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--save-temps
+  $> ; Modify temp files.
+  $> clang --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib out.bc -o a.out
+  $> clang openmp.c -fopenmp --offload-arch=gfx90a 
-Wl,--override-image=openmp=a.out
+
 Example
 ===
 
diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index 51bf98b2ed39d..0d05f913aad63 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -226,3 +226,10 @@ __attribute__((visibility("protected"), used)) int x;
 // RELOCATABLE-LINK-CUDA: fatbinary{{.*}} -64 --create {{.*}}.fatbin 
--image=profile=sm_89,file={{.*}}.img
 //

[clang] [llvm] [OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-10 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [ClangOffloadBundler] make hipv4 and hip compatible (PR #91637)

2024-05-09 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/91637
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-09 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

I hacked around it in the runtime itself. Obviously this is very OpenMP 
specific behavior but so was the old method. Passes all tests now.

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-09 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [ClangOffloadBundler] make hipv4 and hip compatible (PR #91637)

2024-05-09 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

There's some code in the `clang-linker-wrapper` that creates the offloadbundler 
format for HIP offloading. I think it and the tests use `hipv4` which we could 
presumably remove now?

https://github.com/llvm/llvm-project/pull/91637
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] fa9e90f - [Reland][Libomptarget] Statically link all plugin runtimes (#87009)

2024-05-09 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-05-09T09:38:22-05:00
New Revision: fa9e90f5d23312587b3a17920941334e0d1a58a1

URL: 
https://github.com/llvm/llvm-project/commit/fa9e90f5d23312587b3a17920941334e0d1a58a1
DIFF: 
https://github.com/llvm/llvm-project/commit/fa9e90f5d23312587b3a17920941334e0d1a58a1.diff

LOG: [Reland][Libomptarget] Statically link all plugin runtimes (#87009)

This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971
https://github.com/llvm/llvm-project/pull/86875
https://github.com/llvm/llvm-project/pull/86868

Added: 


Modified: 
clang/test/Driver/linker-wrapper-image.c
llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
offload/include/PluginManager.h
offload/include/device.h
offload/plugins-nextgen/CMakeLists.txt
offload/plugins-nextgen/amdgpu/CMakeLists.txt
offload/plugins-nextgen/amdgpu/src/rtl.cpp
offload/plugins-nextgen/common/CMakeLists.txt
offload/plugins-nextgen/common/include/PluginInterface.h
offload/plugins-nextgen/common/include/Utils/ELF.h
offload/plugins-nextgen/common/src/JIT.cpp
offload/plugins-nextgen/common/src/PluginInterface.cpp
offload/plugins-nextgen/cuda/CMakeLists.txt
offload/plugins-nextgen/cuda/src/rtl.cpp
offload/plugins-nextgen/host/CMakeLists.txt
offload/plugins-nextgen/host/src/rtl.cpp
offload/src/CMakeLists.txt
offload/src/OffloadRTL.cpp
offload/src/OpenMP/InteropAPI.cpp
offload/src/PluginManager.cpp
offload/src/device.cpp
offload/src/interface.cpp
offload/tools/kernelreplay/llvm-omp-kernel-replay.cpp
offload/unittests/Plugins/NextgenPluginsTest.cpp

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04..5d5d62805e174 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 

diff  --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c67..8b6f9ea1f4cca 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module , GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.

diff  --git a/offload/include/PluginManager.h b/offload/include/PluginManager.h
index eece7525e25e7..1d6804da75d92 100644
--- a/offload/include/PluginManager.h
+++ b/offload/include/PluginManager.h
@@ -13,10 +13,11 @@

[clang] e5e6607 - Revert "[Libomptarget] Statically link all plugin runtimes (#87009)"

2024-05-09 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-05-09T07:05:23-05:00
New Revision: e5e66073c3d404f4dedf1b0be160b7815ccf8903

URL: 
https://github.com/llvm/llvm-project/commit/e5e66073c3d404f4dedf1b0be160b7815ccf8903
DIFF: 
https://github.com/llvm/llvm-project/commit/e5e66073c3d404f4dedf1b0be160b7815ccf8903.diff

LOG: Revert "[Libomptarget] Statically link all plugin runtimes (#87009)"

Caused failures on build-bots, reverting to investigate.

This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.

Added: 


Modified: 
clang/test/Driver/linker-wrapper-image.c
llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
offload/include/PluginManager.h
offload/include/device.h
offload/plugins-nextgen/CMakeLists.txt
offload/plugins-nextgen/amdgpu/CMakeLists.txt
offload/plugins-nextgen/amdgpu/src/rtl.cpp
offload/plugins-nextgen/common/CMakeLists.txt
offload/plugins-nextgen/common/include/PluginInterface.h
offload/plugins-nextgen/common/include/Utils/ELF.h
offload/plugins-nextgen/common/src/JIT.cpp
offload/plugins-nextgen/common/src/PluginInterface.cpp
offload/plugins-nextgen/cuda/CMakeLists.txt
offload/plugins-nextgen/cuda/src/rtl.cpp
offload/plugins-nextgen/host/CMakeLists.txt
offload/plugins-nextgen/host/src/rtl.cpp
offload/src/CMakeLists.txt
offload/src/OffloadRTL.cpp
offload/src/OpenMP/InteropAPI.cpp
offload/src/PluginManager.cpp
offload/src/device.cpp
offload/src/interface.cpp
offload/tools/kernelreplay/llvm-omp-kernel-replay.cpp
offload/unittests/Plugins/NextgenPluginsTest.cpp

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 5d5d62805e174..d01445e3aed04 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
 // OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
+// OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 

diff  --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 8b6f9ea1f4cca..7241d15ed1c67 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,13 +232,12 @@ void createRegisterFunction(Module , GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
-  Builder.CreateCall(RegFuncC, BinDesc);
-
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done after plugin initialization to ensure that it is
-  // called before the plugin runtime is destroyed.
+  // This needs to be done before the runtime is called and registers its own.
   Builder.CreateCall(AtExit, UnregFunc);
+
+  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.

diff  --git a/offload/include/PluginManager.h b/offload/include/PluginManager.h
index 1d6804da75d92..eece7525e25e7 100644
--- a/offload/include/PluginManager.h
+++ b/offload/include/PluginManager.h
@@ -13,11 +13,10 @@
 #ifndef OMPTARGET_PLUGIN_MANAGER_H
 #define OMPTARGET_PLUGIN_MANAGER_H
 
-#include "PluginInterface.h"
-
 #include "DeviceImage.h"
 #include "ExclusiveAccess.h"
 #include "Shared/APITypes.h"
+#include "Shared/PluginAPI.h"
 #include "Shared/Requirements.h"
 
 #include "device.h"
@@ -35,7 +34,38 @@
 #include 
 #include 
 
-using GenericPluginTy = llvm::omp::target::plugin::GenericPluginTy;
+struct PluginManager;
+
+/// Plugin adaptors should be created via `PluginAdaptorTy::create` which will
+/// invoke the constructor and call `PluginAdaptorTy::init`. Eventual errors 
are
+/// reported back to the caller, otherwise a valid and initialized adaptor is
+/// returned.
+struct PluginAdaptorTy {
+  /// Try to create a plugin adaptor from a filename.
+  static llvm::Expected>
+  create(const std::string );
+
+  /// Name of the shared object file representing the plugin.
+  std::string Name;
+
+  /// Access to the shared object file representing the plugin.
+  std::unique_ptr LibraryHandler;
+
+#define PLUGIN_API_HANDLE(NAME)
\
+  using NAME##_ty = decltype(__tgt_rtl_##NAME);
\
+  NAME##_ty *NAME = nullptr;
+
+#include "Shared/PluginAPI.inc"
+#undef PLUGIN_API_HANDLE
+
+  /// Create a plugin adaptor for filename \p Name with a dynamic library \p 
DL.
+  PluginAdaptorTy(const std::string ,
+

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-09 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CUDA] Mark CUDA-12.4 as supported and introduce ptx 8.4. (PR #91516)

2024-05-08 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/91516
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [WIP][OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> Hmm, hard to tell, need to debug it.

Somehow when I print it in the runtime it shows up as garbage, but the actual 
region seems to get correct values. There shouldn't be anything in-between the 
arguments I'm printing and the kernel launch however so I'm stumped.

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [WIP][OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

I'm getting the same kind of output on `main`, but the warning is mysteriously 
absent.

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [WIP][OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

>
> Maybe. The message is emitted on the host, so there is something wrong with 
> the host code or runtime library.

This might be some issue with the host codegen actually.
```console
> clang malloc.c -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu  
>   
> ./a.out 
10
131675107360774
131675107360778
110294760161286
18446744073709551615
10
131675107360774
131675107360778
110294760161286
OMP: Warning #96: Cannot form a team with 48 threads, using 21 instead.
OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), 
KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if any are set).
> clang malloc.c -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu -O3
> ./a.out 
10
6
10
6
18446744073709551615
10
6
10
6
```

With optimization on, I see what I expect. With `-O0` it seems to give me 
garbage. Looking at the ASM also suggests that only the `0x10` value is written 
for some reason? https://godbolt.org/z/86hTjjaa8 is the host-IR I get without 
optimizations. 

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CodeGen] Omit pre-opt link when post-opt is link requested (PR #85672)

2024-05-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

Hopefully in the future we can handle this in the linker better.

https://github.com/llvm/llvm-project/pull/85672
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [WIP][OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-07 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > > > ```llvm
> > > > struct.anon
> > > > ```
> > > 
> > > 
> > > Can you provide full IR dump here?
> > 
> > 
> > https://godbolt.org/z/48h5s3W6v
> 
> It does not look like the issue of the target code, I don't see any wrong 
> access for __context. Мост probably something wrong with the host 
> code/runtime.

Yeah, I think that's correct. Looking at the IR it seems to add the two extra 
arguments and call them as I'd expect, but for some reason it gets corrupted in 
the runtime layer. It might be doing something weird with the arguments.

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [WIP][OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > ```llvm
> > struct.anon
> > ```
> 
> Can you provide full IR dump here?

https://godbolt.org/z/48h5s3W6v

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 6dfa6dc2956ca714e98bf24b176315da42446553 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   5 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 -
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 125 insertions(+), 524 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module , GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h

[clang] [llvm] [WIP][OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > ```llvm
> > = load i32, ptr %.capture_expr., align 4
> > ```
> 
> Why do you think it reads beyond __context? %2 = getelementptr inbounds 
> %struct.anon, ptr %1, i32 0, i32 0 points to the first element in the 
> __context, if I'm not missing something. If it has the wrong value, looks 
> like it is not written correctly

I think I copied the wrong code somehow,
```llvm
; Function Attrs: convergent noinline norecurse nounwind optnone uwtable
define weak_odr protected void @__omp_offloading_10302_adc9471_main_l10(ptr 
noalias noundef %dyn_ptr, ptr noalias noundef %__context) #0 {
entry:
  %dyn_ptr.addr = alloca ptr, align 8
  %__context.addr = alloca ptr, align 8
  %Teams = alloca i32, align 4
  %Threads = alloca i32, align 4
  %.capture_expr. = alloca i32, align 4
  %.capture_expr.1 = alloca i32, align 4
  %Teams.casted = alloca i64, align 8
  %Threads.casted = alloca i64, align 8
  %0 = call i32 @__kmpc_global_thread_num(ptr @3)
  store ptr %dyn_ptr, ptr %dyn_ptr.addr, align 8
  store ptr %__context, ptr %__context.addr, align 8
  %1 = load ptr, ptr %__context.addr, align 8
  %2 = getelementptr inbounds %struct.anon, ptr %1, i32 0, i32 0
  %3 = load i32, ptr %2, align 4
  store i32 %3, ptr %Teams, align 4
  %4 = getelementptr inbounds %struct.anon, ptr %1, i32 0, i32 1
  %5 = load i32, ptr %4, align 4
  store i32 %5, ptr %Threads, align 4
  %6 = getelementptr inbounds %struct.anon, ptr %1, i32 0, i32 2
  %7 = load i32, ptr %6, align 4
  store i32 %7, ptr %.capture_expr., align 4
  %8 = getelementptr inbounds %struct.anon, ptr %1, i32 0, i32 3
  %9 = load i32, ptr %8, align 4
  store i32 %9, ptr %.capture_expr.1, align 4
  %10 = load i32, ptr %.capture_expr., align 4
  %11 = load i32, ptr %.capture_expr.1, align 4
  call void @__kmpc_push_num_teams(ptr @3, i32 %0, i32 %10, i32 %11)
  %12 = load i32, ptr %Teams, align 4
  store i32 %12, ptr %Teams.casted, align 4
  %13 = load i64, ptr %Teams.casted, align 8
  %14 = load i32, ptr %Threads, align 4
  store i32 %14, ptr %Threads.casted, align 4
  %15 = load i64, ptr %Threads.casted, align 8
  call void (ptr, i32, ptr, ...) @__kmpc_fork_teams(ptr @3, i32 2, ptr 
@__omp_offloading_10302_adc9471_main_l10.omp_outlined, i64 %13, i64 %15)
  ret void
}
```
This is what I get from the corresponding C code.
```c
#include 
#include 
#include 

int main() {
  int Threads = 6;
  int Teams = 10;

  long unsigned s = 0;
#pragma omp target teams distribute parallel for num_teams(Teams)  \
thread_limit(Threads)
  for (int i = 0; i < Threads * Teams; ++i) {
assert(Teams == 10);
  }

  return 0;
}
```
When I compile run it, I get the following. So it warns on some nonsense team 
value (It will be even more corrupt with other cases, but this was the simplest 
I could get).
```console
> clang malloc.c -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu  
>
> ./a.out 
OMP: Warning #96: Cannot form a team with 48 threads, using 21 instead.
OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), 
KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if any are set).
```
The LLVM-IR is confusing to me because it's doing a GEP up to 3, which is 
suggesting that the Teams / Threads values are appended but the number of 
arguments isn't expected to be that big.

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-06 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> I did not build upstream but looking at downstream, I think for some reason 
> they don't show up as duplicate symbols. But looking at the code, they should 
> be removed. There are uses of those variables in the plugin, so there should 
> be only 1 definition.

Does this apply for anything OMPT related inside of the plugin? There's a few 
places where we mark callbacks, but those can all be moved into `libomptarget` 
once this lands.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-06 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> There are duplicate definitions of the following
> 
> ```
> bool llvm::omp::target::ompt::Initialized = false;
> 
> ompt_get_callback_t llvm::omp::target::ompt::lookupCallbackByCode = nullptr;
> ompt_function_lookup_t llvm::omp::target::ompt::lookupCallbackByName = 
> nullptr;
> ```
> 
> in src/OpenMP/OMPT/Callback.cpp and 
> plugins-nextgen/common/OMPT/OmptCallback.cpp
> 
> Can you remove the ones in the plugin? Otherwise, it's not clear which 
> definition is being used.

Sure, do you get that upstream? If so, wonder why I didn't get it.

I guess we would just have it to where the `libomptarget` instance handles all 
the OMPT.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [WIP][OpenMP] Remove dependency on `libffi` from offloading runtime (PR #91264)

2024-05-06 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

I'm unsure how to resolve the issue of `CGF.EmitScalarExpr(NumTeams)` not 
returning the correct value now. For the following code
```c
#include 
#include 

int main() {
  int Teams = 10;
#pragma omp target teams distribute parallel for num_teams(Teams)
  for (int i = 0; i < 1; ++i)
;

  return 0;
}
```
I get this LLVM-IR, which suggests that it's reading beyond the expected 
`__context` struct.
```llvm
; Function Attrs: convergent noinline norecurse nounwind optnone uwtable
define weak_odr protected void @__omp_offloading_10302_af886a3_main_l9(ptr 
noalias noundef %dyn_ptr, ptr noalias noundef %__context) #0 {
entry:
  %dyn_ptr.addr = alloca ptr, align 8
  %__context.addr = alloca ptr, align 8
  %.capture_expr. = alloca i32, align 4
  %0 = call i32 @__kmpc_global_thread_num(ptr @3)
  store ptr %dyn_ptr, ptr %dyn_ptr.addr, align 8
  store ptr %__context, ptr %__context.addr, align 8
  %1 = load ptr, ptr %__context.addr, align 8
  %2 = getelementptr inbounds %struct.anon, ptr %1, i32 0, i32 0
  %3 = load i32, ptr %2, align 4
  store i32 %3, ptr %.capture_expr., align 4
  %4 = load i32, ptr %.capture_expr., align 4
  call void @__kmpc_push_num_teams(ptr @3, i32 %0, i32 %4, i32 0)
  call void (ptr, i32, ptr, ...) @__kmpc_fork_teams(ptr @3, i32 0, ptr 
@__omp_offloading_10302_af886a3_main_l9.omp_outlined)
  ret void
}
```
Any idea how to resolve this? I'm assuming the way we do this now is no longer 
valid somehow because of the struct indirection.

https://github.com/llvm/llvm-project/pull/91264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Allow the `__builtin_flt_rounds` functions on AMDGPU (PR #90994)

2024-05-03 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/90994
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Allow the `__builtin_flt_rounds` functions on AMDGPU (PR #90994)

2024-05-03 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/90994

Summary:
Previous patches added support for the LLVM rounding intrinsic
functions. This patch allows them to me emitted using the clang builtins
when targeting AMDGPU.


>From abceb892df93ccfbfe9392fc7de8c93822e85f92 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 3 May 2024 13:55:12 -0500
Subject: [PATCH] [AMDGPU] Allow the `__builtin_flt_rounds` functions on AMDGPU

Summary:
Previous patches added support for the LLVM rounding intrinsic
functions. This patch allows them to me emitted using the clang builtins
when targeting AMDGPU.
---
 clang/lib/Sema/SemaChecking.cpp | 16 
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl | 12 
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index cf8840c63024d4..f5af0de57b1628 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -2535,18 +2535,18 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, 
unsigned BuiltinID,
   case Builtin::BI_bittestandset64:
   case Builtin::BI_interlockedbittestandreset64:
   case Builtin::BI_interlockedbittestandset64:
-if (CheckBuiltinTargetInSupported(*this, BuiltinID, TheCall,
-  {llvm::Triple::x86_64, llvm::Triple::arm,
-   llvm::Triple::thumb,
-   llvm::Triple::aarch64}))
+if (CheckBuiltinTargetInSupported(
+*this, BuiltinID, TheCall,
+{llvm::Triple::x86_64, llvm::Triple::arm, llvm::Triple::thumb,
+ llvm::Triple::aarch64, llvm::Triple::amdgcn}))
   return ExprError();
 break;
 
   case Builtin::BI__builtin_set_flt_rounds:
-if (CheckBuiltinTargetInSupported(*this, BuiltinID, TheCall,
-  {llvm::Triple::x86, llvm::Triple::x86_64,
-   llvm::Triple::arm, llvm::Triple::thumb,
-   llvm::Triple::aarch64}))
+if (CheckBuiltinTargetInSupported(
+*this, BuiltinID, TheCall,
+{llvm::Triple::x86, llvm::Triple::x86_64, llvm::Triple::arm,
+ llvm::Triple::thumb, llvm::Triple::aarch64, 
llvm::Triple::amdgcn}))
   return ExprError();
 break;
 
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index bdca97c8878670..338d6bc95655a3 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -839,6 +839,18 @@ unsigned test_wavefrontsize() {
   return __builtin_amdgcn_wavefrontsize();
 }
 
+// CHECK-LABEL test_flt_rounds(
+unsigned test_flt_rounds() {
+
+  // CHECK: call i32 @llvm.get.rounding()
+  unsigned mode = __builtin_flt_rounds();
+
+  // CHECK: call void @llvm.set.rounding(i32 %0)
+  __builtin_set_flt_rounds(mode);
+
+  return mode;
+}
+
 // CHECK-LABEL test_get_fpenv(
 unsigned long test_get_fpenv() {
   // CHECK: call i64 @llvm.get.fpenv.i64()

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-03 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 3ea2ae0f5c438b38d0480cfb38a72d2f7a60142c Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 -
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 125 insertions(+), 523 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module , GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-03 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Going to land this soon.

@jplehr @estewart08 @ronlieb Applied this on the AMD fork, here the diff.
 https://gist.github.com/jhuber6/e856fbe9c73acea13b6d30b20605c73e

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 8c4b7ffb49c8f91768af3bec00669bac5433ec0f Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 +-
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 126 insertions(+), 522 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module , GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 473a4b9bad09bd9af8186932984be7696711692e Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 +-
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 127 insertions(+), 521 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module , GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h

[clang] [clang][CodeGen] Omit pre-opt link when post-opt is link requested (PR #85672)

2024-05-01 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > Currently if a user doesn't supply the new "-link-builtin-bitcodes-postopt" 
> > option, linking builtin bitcodes happens first, then the optimization 
> > pipeline follows. Does that cover the case you're talking about?
> 
> I'm thinking of an option that developers can use. If 
> -link-builtin-bitcodes-postopt, becomes the default, how can developers 
> disable it?

Presumably you'd add `--no-link-builtin-bitcodes-postopt`.

https://github.com/llvm/llvm-project/pull/85672
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Joseph Huber via cfe-commits



@@ -3476,3 +3472,9 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, 
TargetAllocTy Kind) {
 } // namespace target
 } // namespace omp
 } // namespace llvm
+
+extern "C" {
+llvm::omp::target::plugin::GenericPluginTy *createPlugin_amdgpu() {

jhuber6 wrote:

I tried that, but couldn't figure out how to set a name from a macro within a 
macro.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-04-29 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 4fd1510c2013fd975ac2ad94b3d201bcd5a9d029 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 +-
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 127 insertions(+), 521 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module , GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

LG

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -54,3 +56,76 @@ void SPIRV64TargetInfo::getTargetDefines(const LangOptions 
,
   BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
   DefineStd(Builder, "SPIRV64", Opts);
 }
+
+static const AMDGPUTargetInfo AMDGPUTI(llvm::Triple("amdgcn-amd-amdhsa"), {});
+
+ArrayRef SPIRV64AMDGCNTargetInfo::getGCCRegNames() const {
+  return AMDGPUTI.getGCCRegNames();
+}
+
+bool SPIRV64AMDGCNTargetInfo::initFeatureMap(
+llvm::StringMap , DiagnosticsEngine , StringRef,
+const std::vector ) const {
+  llvm::AMDGPU::fillAMDGPUFeatureMap({}, getTriple(), Features);
+
+  return TargetInfo::initFeatureMap(Features, Diags, {}, FeatureVec);
+}
+
+bool SPIRV64AMDGCNTargetInfo::validateAsmConstraint(
+const char *, TargetInfo::ConstraintInfo ) const {
+  return AMDGPUTI.validateAsmConstraint(Name, Info);
+}
+
+std::string
+SPIRV64AMDGCNTargetInfo::convertConstraint(const char *) const {
+  return AMDGPUTI.convertConstraint(Constraint);
+}
+
+ArrayRef SPIRV64AMDGCNTargetInfo::getTargetBuiltins() const {
+  return AMDGPUTI.getTargetBuiltins();
+}
+
+void SPIRV64AMDGCNTargetInfo::getTargetDefines(const LangOptions ,
+   MacroBuilder ) const {
+  BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
+  DefineStd(Builder, "SPIRV64", Opts);
+
+  Builder.defineMacro("__AMD__");
+  Builder.defineMacro("__AMDGPU__");
+  Builder.defineMacro("__AMDGCN__");

jhuber6 wrote:

I've always thought defining those for both targets was obtuse, since those 
should act like architecture macros, i.e. (__X86__). But considering this is 
how it's done already, I suppose it's a necessary evil.

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -54,3 +56,76 @@ void SPIRV64TargetInfo::getTargetDefines(const LangOptions 
,
   BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
   DefineStd(Builder, "SPIRV64", Opts);
 }
+
+static const AMDGPUTargetInfo AMDGPUTI(llvm::Triple("amdgcn-amd-amdhsa"), {});
+
+ArrayRef SPIRV64AMDGCNTargetInfo::getGCCRegNames() const {
+  return AMDGPUTI.getGCCRegNames();
+}
+
+bool SPIRV64AMDGCNTargetInfo::initFeatureMap(
+llvm::StringMap , DiagnosticsEngine , StringRef,
+const std::vector ) const {
+  llvm::AMDGPU::fillAMDGPUFeatureMap({}, getTriple(), Features);
+
+  return TargetInfo::initFeatureMap(Features, Diags, {}, FeatureVec);
+}
+
+bool SPIRV64AMDGCNTargetInfo::validateAsmConstraint(
+const char *, TargetInfo::ConstraintInfo ) const {
+  return AMDGPUTI.validateAsmConstraint(Name, Info);
+}
+
+std::string
+SPIRV64AMDGCNTargetInfo::convertConstraint(const char *) const {
+  return AMDGPUTI.convertConstraint(Constraint);
+}
+
+ArrayRef SPIRV64AMDGCNTargetInfo::getTargetBuiltins() const {
+  return AMDGPUTI.getTargetBuiltins();
+}
+
+void SPIRV64AMDGCNTargetInfo::getTargetDefines(const LangOptions ,
+   MacroBuilder ) const {
+  BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
+  DefineStd(Builder, "SPIRV64", Opts);
+
+  Builder.defineMacro("__AMD__");
+  Builder.defineMacro("__AMDGPU__");
+  Builder.defineMacro("__AMDGCN__");

jhuber6 wrote:

Are these defined on both the host and device? I remember having a quite 
annoying time with these macros because HIP was defining stuff like 
`__AMDGCN_WAVEFRONT_SIZE` on the host.

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -309,7 +309,45 @@ StringRef AMDGPU::getCanonicalArchName(const Triple , 
StringRef Arch) {
 void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple ,
   StringMap ) {
   // XXX - What does the member GPU mean if device name string passed here?
-  if (T.isAMDGCN()) {
+  if (T.isSPIRV() && T.getOS() == Triple::OSType::AMDHSA) {
+// AMDGCN SPIRV must support the union of all AMDGCN features.
+Features["atomic-ds-pk-add-16-insts"] = true;
+Features["atomic-flat-pk-add-16-insts"] = true;
+Features["atomic-buffer-global-pk-add-f16-insts"] = true;
+Features["atomic-global-pk-add-bf16-inst"] = true;
+Features["atomic-fadd-rtn-insts"] = true;
+Features["ci-insts"] = true;
+Features["dot1-insts"] = true;
+Features["dot2-insts"] = true;
+Features["dot3-insts"] = true;
+Features["dot4-insts"] = true;
+Features["dot5-insts"] = true;
+Features["dot7-insts"] = true;
+Features["dot8-insts"] = true;
+Features["dot9-insts"] = true;
+Features["dot10-insts"] = true;
+Features["dot11-insts"] = true;
+Features["dl-insts"] = true;
+Features["16-bit-insts"] = true;
+Features["dpp"] = true;
+Features["gfx8-insts"] = true;
+Features["gfx9-insts"] = true;
+Features["gfx90a-insts"] = true;
+Features["gfx940-insts"] = true;
+Features["gfx10-insts"] = true;
+Features["gfx10-3-insts"] = true;
+Features["gfx11-insts"] = true;
+Features["gfx12-insts"] = true;

jhuber6 wrote:

I see, so it's basically just pushing any target specific errors into when it's 
actually compiled to a binary.

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -309,7 +309,45 @@ StringRef AMDGPU::getCanonicalArchName(const Triple , 
StringRef Arch) {
 void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple ,
   StringMap ) {
   // XXX - What does the member GPU mean if device name string passed here?
-  if (T.isAMDGCN()) {
+  if (T.isSPIRV() && T.getOS() == Triple::OSType::AMDHSA) {
+// AMDGCN SPIRV must support the union of all AMDGCN features.
+Features["atomic-ds-pk-add-16-insts"] = true;
+Features["atomic-flat-pk-add-16-insts"] = true;
+Features["atomic-buffer-global-pk-add-f16-insts"] = true;
+Features["atomic-global-pk-add-bf16-inst"] = true;
+Features["atomic-fadd-rtn-insts"] = true;
+Features["ci-insts"] = true;
+Features["dot1-insts"] = true;
+Features["dot2-insts"] = true;
+Features["dot3-insts"] = true;
+Features["dot4-insts"] = true;
+Features["dot5-insts"] = true;
+Features["dot7-insts"] = true;
+Features["dot8-insts"] = true;
+Features["dot9-insts"] = true;
+Features["dot10-insts"] = true;
+Features["dot11-insts"] = true;
+Features["dl-insts"] = true;
+Features["16-bit-insts"] = true;
+Features["dpp"] = true;
+Features["gfx8-insts"] = true;
+Features["gfx9-insts"] = true;
+Features["gfx90a-insts"] = true;
+Features["gfx940-insts"] = true;
+Features["gfx10-insts"] = true;
+Features["gfx10-3-insts"] = true;
+Features["gfx11-insts"] = true;
+Features["gfx12-insts"] = true;

jhuber6 wrote:

What do these features even mean in the context of SPIR-V? It's basically a 
format for JIT, so can we really say stuff like GFX12 is available?

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -6088,6 +6088,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
   StringRef Prefix =
   llvm::Triple::getArchTypePrefix(getTarget().getTriple().getArch());
   if (!Prefix.empty()) {
+if (Prefix == "spv" &&
+getTarget().getTriple().getOS() == llvm::Triple::OSType::AMDHSA)
+  Prefix = "amdgcn";

jhuber6 wrote:

So I was just wondering if it would make more sense to put this in 
`Triple::getArchTypePrefix(ArchType Kind)` because I wasn't sure if this logic 
is the expected return value there.

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -6088,6 +6088,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
   StringRef Prefix =
   llvm::Triple::getArchTypePrefix(getTarget().getTriple().getArch());
   if (!Prefix.empty()) {
+if (Prefix == "spv" &&
+getTarget().getTriple().getOS() == llvm::Triple::OSType::AMDHSA)
+  Prefix = "amdgcn";

jhuber6 wrote:

What is this required for? I'm wondering why we'd need to reset the prefix here 
instead of updating the logic somewhere else.

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -54,3 +56,77 @@ void SPIRV64TargetInfo::getTargetDefines(const LangOptions 
,
   BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
   DefineStd(Builder, "SPIRV64", Opts);
 }
+
+namespace {
+const AMDGPUTargetInfo AMDGPUTI(llvm::Triple("amdgcn-amd-amdhsa"), {});
+
+} // anonymous namespace

jhuber6 wrote:

```suggestion
static const AMDGPUTargetInfo AMDGPUTI(llvm::Triple("amdgcn-amd-amdhsa"), {});
```

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -54,3 +56,77 @@ void SPIRV64TargetInfo::getTargetDefines(const LangOptions 
,
   BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
   DefineStd(Builder, "SPIRV64", Opts);
 }
+
+namespace {
+const AMDGPUTargetInfo AMDGPUTI(llvm::Triple("amdgcn-amd-amdhsa"), {});
+
+} // anonymous namespace
+
+ArrayRef SPIRV64AMDGCNTargetInfo::getGCCRegNames() const {
+  return AMDGPUTI.getGCCRegNames();
+}
+
+bool SPIRV64AMDGCNTargetInfo::initFeatureMap(
+llvm::StringMap , DiagnosticsEngine , StringRef,
+const std::vector ) const {
+  llvm::AMDGPU::fillAMDGPUFeatureMap({}, getTriple(), Features);
+
+  return TargetInfo::initFeatureMap(Features, Diags, {}, FeatureVec);
+}
+
+bool SPIRV64AMDGCNTargetInfo::validateAsmConstraint(
+const char *, TargetInfo::ConstraintInfo ) const {
+  return AMDGPUTI.validateAsmConstraint(Name, Info);
+}
+
+std::string
+SPIRV64AMDGCNTargetInfo::convertConstraint(const char *) const {
+  return AMDGPUTI.convertConstraint(Constraint);
+}
+
+ArrayRef SPIRV64AMDGCNTargetInfo::getTargetBuiltins() const {
+  return AMDGPUTI.getTargetBuiltins();
+}
+
+void SPIRV64AMDGCNTargetInfo::getTargetDefines(const LangOptions ,
+   MacroBuilder ) const {
+  BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
+  DefineStd(Builder, "SPIRV64", Opts);
+
+  Builder.defineMacro("__AMD__");
+  Builder.defineMacro("__AMDGPU__");
+  Builder.defineMacro("__AMDGCN__");
+}
+
+void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {

jhuber6 wrote:

Is `AUX` guaranteed non-null here? I know in the NVPTX target we only have an 
`Aux` when compiling for CUDA and use that to make sure type widths match up. 
However, if you have a direct compilation via `--target=nvptx64-nvidia-cuda` it 
will be null and not used.

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (PR #89796)

2024-04-28 Thread Joseph Huber via cfe-commits



@@ -673,8 +673,12 @@ std::unique_ptr AllocateTarget(const 
llvm::Triple ,
   }
   case llvm::Triple::spirv64: {
 if (os != llvm::Triple::UnknownOS ||
-Triple.getEnvironment() != llvm::Triple::UnknownEnvironment)
+Triple.getEnvironment() != llvm::Triple::UnknownEnvironment) {
+  if (os == llvm::Triple::OSType::AMDHSA)
+return std::make_unique(Triple, Opts);
+

jhuber6 wrote:

```suggestion
```

https://github.com/llvm/llvm-project/pull/89796
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] bfd269d - [AMDGPU] Fix test failing on Windows for `ld.lld.exe`

2024-04-28 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-04-28T06:36:09-05:00
New Revision: bfd269d0d0d6cb58235a838eb659eef97e4f2ebf

URL: 
https://github.com/llvm/llvm-project/commit/bfd269d0d0d6cb58235a838eb659eef97e4f2ebf
DIFF: 
https://github.com/llvm/llvm-project/commit/bfd269d0d0d6cb58235a838eb659eef97e4f2ebf.diff

LOG: [AMDGPU] Fix test failing on Windows for `ld.lld.exe`

Added: 


Modified: 
clang/test/Driver/amdgpu-toolchain.c

Removed: 




diff  --git a/clang/test/Driver/amdgpu-toolchain.c 
b/clang/test/Driver/amdgpu-toolchain.c
index faaff05004f6de..8ab6a071314745 100644
--- a/clang/test/Driver/amdgpu-toolchain.c
+++ b/clang/test/Driver/amdgpu-toolchain.c
@@ -27,4 +27,4 @@
 
 // RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx906 -nogpulib \
 // RUN:   -fuse-ld=ld %s 2>&1 | FileCheck -check-prefixes=LD %s
-// LD: ld.lld"
+// LD: ld.lld



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CMake] Change GCC_INSTALL_PREFIX from warning to fatal error (PR #85891)

2024-04-24 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> I disagree, `--gcc-install-dir` is sure an improvement over 
> `--gcc-toolchain`, but they're both weaker than the compile time option 
> `GCC_INSTALL_PREFIX` because of runtimes.
> 
> You're looking to remove `GCC_INSTALL_PREFIX`, then give a clear alternative 
> that's equivalent. The current error message about config files is misleading 
> since it potentially runtimes will use an undesired gcc version when they're 
> built, which is rather opaque.
> 
> A different point: `flang-new` does not accept `--gcc-install-dir`. Is that 
> an oversight?

Should've been added in https://github.com/llvm/llvm-project/pull/87360. I 
agree overall that this change made made runtime builds unnecessarily complex.

https://github.com/llvm/llvm-project/pull/85891
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] eaa2eac - [AMDGPU] Fix linker test on platforms without BFD

2024-04-24 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-04-24T07:03:51-05:00
New Revision: eaa2eac8ec73a0473655f2da73f347906d14b00f

URL: 
https://github.com/llvm/llvm-project/commit/eaa2eac8ec73a0473655f2da73f347906d14b00f
DIFF: 
https://github.com/llvm/llvm-project/commit/eaa2eac8ec73a0473655f2da73f347906d14b00f.diff

LOG: [AMDGPU] Fix linker test on platforms without BFD

Added: 


Modified: 
clang/test/Driver/amdgpu-toolchain.c

Removed: 




diff  --git a/clang/test/Driver/amdgpu-toolchain.c 
b/clang/test/Driver/amdgpu-toolchain.c
index d21ce857f3c57a..faaff05004f6de 100644
--- a/clang/test/Driver/amdgpu-toolchain.c
+++ b/clang/test/Driver/amdgpu-toolchain.c
@@ -25,8 +25,6 @@
 // LTO: clang{{.*}} "-flto=full"{{.*}}"-fconvergent-functions"
 // MCPU: ld.lld{{.*}}"-L."{{.*}}"-plugin-opt=mcpu=gfx906"
 
-// We do not suppor the BFD linker, but we should be able to override the
-// default even if it will error during linking.
 // RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx906 -nogpulib \
-// RUN:   -fuse-ld=bfd %s 2>&1 | FileCheck -check-prefixes=LD %s
-// LD: ld.bfd"
+// RUN:   -fuse-ld=ld %s 2>&1 | FileCheck -check-prefixes=LD %s
+// LD: ld.lld"



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Correctly determine the toolchain linker (PR #89803)

2024-04-24 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/89803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1296 matches

Mail list logo