@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui",
"nc")
//===--===//
TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/80066
>From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Tue, 30 Jan 2024 15:34:22 -0600
Subject: [PATCH] [LinkerWrapper] Support relocatable linking for offloading
@@ -181,5 +181,6 @@ __attribute__((visibility("protected"), used)) int x;
// RUN: --linker-path=/usr/bin/ld.lld -- -r --whole-archive %t.a
--no-whole-archive \
// RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE-LINK
jhuber6 wrote:
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/80183
Summary:
Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/80066
>From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Tue, 30 Jan 2024 15:34:22 -0600
Subject: [PATCH 1/3] [LinkerWrapper] Support relocatable linking for
offloading
jhuber6 wrote:
> > I'm assuming you're talking about GPU-side constructors? I don't think the
> > CUDA runtime supports those, but OpenMP runs them when the image is loaded,
> > so it would handle both independantly.
>
> Yes. I'm thinking of the expectations from a C++ user standpoint, and
jhuber6 wrote:
> Supporting such mixed mode opens an interesting set of issues we may need to
> consider going forward:
>
> who/where/how runs initializers in the fully linked parts?
I'm assuming you're talking about GPU-side constructors? I don't think the CUDA
runtime supports those, but
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui",
"nc")
//===--===//
TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui",
"nc")
//===--===//
TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
jhuber6 wrote:
> > the idea is that it would be the desired effect if someone went out of
> > their way to do this GPU subset linking thing.
>
> That would only be true when someone owns the whole build. That will not be
> the case in practice. A large enough project is usually a bunch of
https://github.com/jhuber6 approved this pull request.
Do we have any tests for this kind of stuff? We really should have some mock
ROCm installation in one of the `Inputs/` directories and then do
`--rocm-path=` or something.
https://github.com/llvm/llvm-project/pull/80190
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/80066
>From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Tue, 30 Jan 2024 15:34:22 -0600
Subject: [PATCH 1/4] [LinkerWrapper] Support relocatable linking for
offloading
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/80066
>From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Tue, 30 Jan 2024 15:34:22 -0600
Subject: [PATCH 1/5] [LinkerWrapper] Support relocatable linking for
offloading
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79660
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jhuber6 wrote:
> > This seems to have perturbed the HIP build.
> > https://lab.llvm.org/staging/#/builders/22/builds/22
> > The problem is that we used to set `__AMDGCN_WAVEFRONTSIZE` for the host
> > compilation as well in a bunch of the wave function macros. I think that
> > this is just
jhuber6 wrote:
Reverted. I don't think there's a "proper" solution here since this seems to
have leaked into the headers due to whoever set this up initially not properly
setting these on the host. That seems to be endemic now, so the best we can do
it just set it to some dummy values I
jhuber6 wrote:
This seems to have perturbed the HIP build.
https://lab.llvm.org/staging/#/builders/22/builds/22
The problem is that we used to set `__AMDGCN_WAVEFRONTSIZE` for the host
compilation as well in a bunch of the wave function macros. I think that this
is just poor programming,
https://github.com/jhuber6 edited
https://github.com/llvm/llvm-project/pull/79765
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Author: Joseph Huber
Date: 2024-01-29T11:11:25-06:00
New Revision: 72d4fc1b4d5cfc4f7d50cc5cf1b315543c088f4d
URL:
https://github.com/llvm/llvm-project/commit/72d4fc1b4d5cfc4f7d50cc5cf1b315543c088f4d
DIFF:
https://github.com/llvm/llvm-project/commit/72d4fc1b4d5cfc4f7d50cc5cf1b315543c088f4d.diff
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/79892
Summary:
We recently added builitin support for this function.
>From 5f316d30a179dd21cfadd50d232de622d394ccea Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Mon, 29 Jan 2024 14:28:35 -0600
Subject: [PATCH]
jhuber6 wrote:
> https://bugs.llvm.org/show_bug.cgi?id=35249
Yeah, there's constant issues with convergence analysis. I included one of the
tests to try to show that it won't merge with the covergent attribute. Since
this is a general issue for all of these things. In the past I usually add
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/79768
>From 2c7049defef3b62de7017640948cccfb07ff756c Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Sun, 28 Jan 2024 14:57:05 -0600
Subject: [PATCH 1/2] [NVPTX] Add 'activemask' builtin and intrinsic support
jhuber6 wrote:
Added side effects attribute, I believe this matches the current behavior of
the inline asm better.
https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>;
def : Proc<"sm_62", [SM62, PTX50]>;
def : Proc<"sm_70", [SM70, PTX60]>;
def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
+def : Proc<"sm_75", [SM75, PTX62, PTX63]>;
jhuber6 wrote:
@@ -4599,6 +4599,14 @@ def int_nvvm_vote_ballot_sync :
[IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback],
"llvm.nvvm.vote.ballot.sync">,
ClangBuiltin<"__nvvm_vote_ballot_sync">;
+//
+// ACTIVEMASK
+//
+def int_nvvm_activemask :
+
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/79765
>From 5c4fc3dd207e91210f76c158e9c99e9591dccb96 Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Mon, 29 Jan 2024 08:12:35 -0600
Subject: [PATCH] [NVPTX} Add builtin support for 'globaltimer'
Summary:
This
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79765
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79777
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/79768
>From 2c7049defef3b62de7017640948cccfb07ff756c Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Sun, 28 Jan 2024 14:57:05 -0600
Subject: [PATCH 1/3] [NVPTX] Add 'activemask' builtin and intrinsic support
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/79777
>From ea3b32593dd0f2035020313176c6e1a131ef8eb4 Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Sun, 28 Jan 2024 21:27:37 -0600
Subject: [PATCH] [NVPTX] Add builtin for 'exit' handling
Summary:
The PTX ISA has
jhuber6 wrote:
> Relying on something _not_ being defined is probably not the best way to
> handle 'generic' target. For starters it makes it hard or impossible to
> recreate the same compilation state by undoing already-specified option. It
> also breaks established assumption that there
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/79873
Summary:
The NVPTX tools require an architecture to be used, however if we are
creating generic LLVM-IR we should be able to leave it unspecified. This
will result in the `target-cpu` attributes not being set on
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>;
def : Proc<"sm_62", [SM62, PTX50]>;
def : Proc<"sm_70", [SM70, PTX60]>;
def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
+def : Proc<"sm_75", [SM75, PTX62, PTX63]>;
jhuber6 wrote:
jhuber6 wrote:
> Unlike the other PRs, this one has a CUDA function, `__activemask()`.
> Presumably we should make that one work by hacking our headers?
That is currently defined here
https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_intrinsics.h#L214.
I was
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/79888
Summary:
This patch adds a builtin for the `nanosleep` PTX function. It takes
either an immediate or a register and sleeps for [0, 2t] nanoseconds
given t. More information at the documentation:
jhuber6 wrote:
> > I think there's some precedent from both vendors to treat missing
> > attributes as a more generic target.
>
> It sounds more like a bug than a feature to me.
>
> The major difference between "you get sm_xx by default" and this "you get
> generic by default" is that With
jhuber6 wrote:
I've actually encountered some really strange behavior when trying to update
`libc` to use the new intrinsic. The following returns a common 64-bit value to
be compatible with AMDGPU's 64 lane wide mode. When I run this against the test
suite, it fails on tests that
jhuber6 wrote:
> > I was planning on updating this to use the new instrinsic for the newer
> > version. Alternatively we could make __activemask the builtin which expands
> > to both versions, but I'm somewhat averse since we should target the
> > instruction directly I feel.
>
> Yes, I
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>;
def : Proc<"sm_62", [SM62, PTX50]>;
def : Proc<"sm_70", [SM70, PTX60]>;
def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
+def : Proc<"sm_75", [SM75, PTX62, PTX63]>;
jhuber6 wrote:
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>;
def : Proc<"sm_62", [SM62, PTX50]>;
def : Proc<"sm_70", [SM70, PTX60]>;
def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
+def : Proc<"sm_75", [SM75, PTX62, PTX63]>;
jhuber6 wrote:
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79888
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jhuber6 approved this pull request.
https://github.com/llvm/llvm-project/pull/80190
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -4,13 +4,10 @@
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature
-wavefrontsize64 -verify -S -o - %s
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s
+// expected-no-diagnostics
+
typedef unsigned long ulong;
void
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/80183
>From 26b75cdba1aebc881e52dc82ca61e1082ef67a5e Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Wed, 31 Jan 2024 13:18:04 -0600
Subject: [PATCH] [AMDGPU] Allow w64 ballot to be used on w32 targets
Summary:
@@ -4,13 +4,10 @@
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature
-wavefrontsize64 -verify -S -o - %s
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s
+// expected-no-diagnostics
+
typedef unsigned long ulong;
void
jhuber6 wrote:
> After this change is there any value in having two different builtins? You
> could just have one that always return 64 bits.
I personally think it would be better to just have the one, but I figured that
decision was made earlier and it would break backwards compatibility.
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) {
Entry.size) != OFFLOAD_SUCCESS)
REPORT("Failed to write symbol for USM %s\n", Entry.name);
}
-} else {
+} else if (Entry.addr) {
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) {
Entry.size) != OFFLOAD_SUCCESS)
REPORT("Failed to write symbol for USM %s\n", Entry.name);
}
-} else {
+} else if (Entry.addr) {
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) {
Entry.size) != OFFLOAD_SUCCESS)
REPORT("Failed to write symbol for USM %s\n", Entry.name);
}
-} else {
+} else if (Entry.addr) {
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) {
Entry.size) != OFFLOAD_SUCCESS)
REPORT("Failed to write symbol for USM %s\n", Entry.name);
}
-} else {
+} else if (Entry.addr) {
jhuber6 wrote:
This is related to the discussions at the
https://github.com/llvm/llvm-project/issues/77018 issue.
https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/80066
Summary:
The standard GPU compilation process embeds each intermediate object
file into the host file at the `.llvm.offloading` section so it can be
linked later. We also use a sepcial section called something
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/79873
>From 35e12c3d83f3be93618805ffaf05e3424689f32f Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Mon, 29 Jan 2024 11:08:04 -0600
Subject: [PATCH 1/2] [NVPTX] Allow compiling LLVM-IR without `-march` set
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/79873
>From 35e12c3d83f3be93618805ffaf05e3424689f32f Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Mon, 29 Jan 2024 11:08:04 -0600
Subject: [PATCH 1/3] [NVPTX] Allow compiling LLVM-IR without `-march` set
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Author: Joseph Huber
Date: 2024-01-29T17:33:38-06:00
New Revision: 0a2b5b03c4084ac1fefd0e62db2ba49f5ac24ab9
URL:
https://github.com/llvm/llvm-project/commit/0a2b5b03c4084ac1fefd0e62db2ba49f5ac24ab9
DIFF:
https://github.com/llvm/llvm-project/commit/0a2b5b03c4084ac1fefd0e62db2ba49f5ac24ab9.diff
jhuber6 wrote:
Scratch that, I missed `Ui` in the builtin definition. I'll do a quick fix.
https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
jhuber6 wrote:
> > This method of compilation is not like CUDA, so we can't target all the
> > GPUs at the same time.
>
> I think this is the key fact I was missing. If the patch is only for a
> standalone compilation which does not do multi-GPU compilation in principle,
> then your approach
jhuber6 wrote:
> On the other hand, I'd be OK with providing --offload-arch=native translating
> into "compile for all present GPU variants", with a possibility to further
> adjust the selected set with the usual --no-offload-arch-foo, if the user
> needs to. This will at least produce code
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/79373
>From 145b7bc932ce3ffa46545cd7af29b1c93981429c Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Wed, 24 Jan 2024 15:34:00 -0600
Subject: [PATCH 1/3] [NVPTX] Add support for -march=native in standalone NVPTX
jhuber6 wrote:
> User confusion is only part of the issue here. With any single GPU choice we
> would still potentially produce a nonworking binary, if our GPU choice does
> not match what the user wants.
>
> "all GPUs" has the advantage of always producing the binary that's guaranteed
> to
jhuber6 wrote:
> > This method of compilation is not like CUDA, so we can't target all the
> > GPUs at the same time.
>
> Can you clarify for me -- what are you compiling where it's impossible to
> target multiple GPUs in the binary? I'm confused because Art is understanding
> that it's not
jhuber6 wrote:
> I...think I understand.
>
> Is the output of this compilation step a cubin, then?
Yes, it will spit out a simple `cubin` instead of a fatbinary. The NVIDIA
toolchain is much worse about this stuff than the AMD one, but in general it
works. You can check with `-###` or
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/79373
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jhuber6 wrote:
> > I think the semantics of native on other architectures are clear enough
> > here.
>
> I don't think we have the same idea about that. Let's spell it out, so
> there's no confusion.
>
> [GCC
> manual](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-march-16)
>
jhuber6 wrote:
> Got it, okay, thanks.
>
> Since this change only applies to `--target=nvptx64-nvidia-cuda`, fine by me.
> Thanks for putting up with our scrutiny. :)
No problem, I probably should've have been clearer in my commit messages.
https://github.com/llvm/llvm-project/pull/79373
jhuber6 wrote:
Some interesting points, I'll try to clarify some things.
> This option may not as well as one would hope.
>
> Problem #1 is that it will drastically slow down compilation for some users.
> NVIDIA GPU drivers are loaded on demand, and the process takes a while
> (O(second),
https://github.com/jhuber6 approved this pull request.
https://github.com/llvm/llvm-project/pull/78333
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -6872,35 +6883,6 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(StringRef
HostFilePath) {
loadOffloadInfoMetadata(*M.get());
}
-Function *OpenMPIRBuilder::createRegisterRequires(StringRef Name) {
jhuber6 wrote:
It was a very obvious problem. I mixed
Author: Joseph Huber
Date: 2024-02-05T09:08:31-06:00
New Revision: d1722868d34a69df8466b72098176f54a7af8823
URL:
https://github.com/llvm/llvm-project/commit/d1722868d34a69df8466b72098176f54a7af8823
DIFF:
https://github.com/llvm/llvm-project/commit/d1722868d34a69df8466b72098176f54a7af8823.diff
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/80183
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/80741
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -832,6 +832,13 @@ void test_atomic_inc_dec(local uint *lptr, global uint
*gptr, uint val) {
res = __builtin_amdgcn_atomic_dec32((volatile global uint*)gptr, val,
__ATOMIC_SEQ_CST, "");
}
+// CHECK-LABEL test_wavefrontsize(
+unsigned test_wavefrontsize() {
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/80741
Summary:
The backend supports the wavefrontsize intrinsic, and suggests that it
is tied to a corresponding clang builtin, but it is not actually
present. This simply adds it in so it can be used from clang. This
https://github.com/jhuber6 edited
https://github.com/llvm/llvm-project/pull/78333
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jhuber6 commented:
You should add a test that checks the output of `-ccc-print-phases` and
`-ccc-print-bindings`.
https://github.com/llvm/llvm-project/pull/78333
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
jhuber6 wrote:
> FYI. There is a failure in liner-wrapper.c in
> https://buildkite.com/llvm-project/github-pull-requests/builds/30337#018d1aaa-8225-4630-a5f0-527d1c7c129d
>
> ```
> # note: command had no output on stdout or stderr
> | # error: command failed with exit status: 1
> | #
jhuber6 wrote:
> > Right now if you specify target-cpu you get target-cpu attributes, which is
> > what we don't want.
>
> I'm fine handling 'generic' in a special way under the hood and not
> specifying target-CPU.
>
> My concern is about user-facing interface. Command line options must be
@@ -175,6 +175,8 @@ Predefined Macros
- Defined when the GPU default stream is set to per-thread mode.
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
- Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
+ * - ``__AMDGCN_WAVEFRONT_SIZE__``
https://github.com/jhuber6 updated
https://github.com/llvm/llvm-project/pull/80035
>From f606aaa9c711d2ece6b1600160a61232abb69eb4 Mon Sep 17 00:00:00 2001
From: Joseph Huber
Date: Mon, 29 Jan 2024 08:46:14 -0600
Subject: [PATCH 1/2] [AMDGPU] Do not emit arch dependent macros with
unspecified
@@ -175,6 +175,8 @@ Predefined Macros
- Defined when the GPU default stream is set to per-thread mode.
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
- Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
+ * - ``__AMDGCN_WAVEFRONT_SIZE__``
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/80035
Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into
jhuber6 wrote:
Rework of https://github.com/llvm/llvm-project/pull/79660 to handle old
behavior of these being defined for the host.
https://github.com/llvm/llvm-project/pull/80035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://github.com/jhuber6 closed
https://github.com/llvm/llvm-project/pull/80035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Author: Joseph Huber
Date: 2024-01-30T13:17:02-06:00
New Revision: 626fe71fa5ed79cbd41b7b29582560d7adb1220e
URL:
https://github.com/llvm/llvm-project/commit/626fe71fa5ed79cbd41b7b29582560d7adb1220e
DIFF:
https://github.com/llvm/llvm-project/commit/626fe71fa5ed79cbd41b7b29582560d7adb1220e.diff
jhuber6 wrote:
> This seems to break tests: http://45.33.8.238/linux/129493/step_7.txt
>
> Please take a look and revert for now if it takes a while to fix.
Is it still broken? I pushed a fix because I'm pretty sure the problem was not
passing `-nogpulib` `-nogpuinc` so the test runs on
jhuber6 wrote:
> i.e. it helped with Clang :: Preprocessor/predefined-arch-macros.c but not
> with:
>
> Failed Tests (2): Clang :: Driver/amdgpu-macros.cl Clang ::
> Driver/target-id-macros.cl
Thanks, seeing it locally now. I'll try to fix it quick and revert if it's not
working soon.
Author: Joseph Huber
Date: 2024-01-30T13:45:01-06:00
New Revision: 6fecfbc7b62f54bd633e83c22630d7c2a3e5741e
URL:
https://github.com/llvm/llvm-project/commit/6fecfbc7b62f54bd633e83c22630d7c2a3e5741e
DIFF:
https://github.com/llvm/llvm-project/commit/6fecfbc7b62f54bd633e83c22630d7c2a3e5741e.diff
jhuber6 wrote:
> i.e. it helped with Clang :: Preprocessor/predefined-arch-macros.c but not
> with:
>
> Failed Tests (2): Clang :: Driver/amdgpu-macros.cl Clang ::
> Driver/target-id-macros.cl
Pushed a fix, `check-clang` passes on my machine now. Let me know if it's still
broken.
@@ -205,6 +220,56 @@ class AtomicScopeHIPModel : public AtomicScopeModel {
}
};
+/// Defines the generic atomic scope model.
+class AtomicScopeGenericModel : public AtomicScopeModel {
+public:
+ /// The enum values match predefined built-in macros __ATOMIC_SCOPE_*.
+ enum
@@ -54,6 +59,16 @@ enum class SyncScope {
inline llvm::StringRef getAsString(SyncScope S) {
jhuber6 wrote:
I think it's because this is for AST printing purposes, while the backend
strings vary per target.
https://github.com/llvm/llvm-project/pull/72280
jhuber6 wrote:
> Is there any actual difference now between these and the HIP/OpenCL flavors
> other than dropping the language from the name?
Yes, these directly copy the GNU functions and names. The OpenCL / HIP ones use
a different format.
https://github.com/llvm/llvm-project/pull/72280
@@ -798,6 +798,13 @@ static void InitializePredefinedMacros(const TargetInfo
,
Builder.defineMacro("__ATOMIC_ACQ_REL", "4");
Builder.defineMacro("__ATOMIC_SEQ_CST", "5");
+ // Define macros for the clang atomic scopes.
+ Builder.defineMacro("__MEMORY_SCOPE_SYSTEM",
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/72442
Summary:
Currently the linker wrapper strictly assigns a single input binary to a
single link job based off of its input architecture. This is not
sufficient to implement the AMDGPU target ID correctly as this
https://github.com/jhuber6 commented:
This being in clang instead seems like a good change. Are there no CodeGen
tests changed? We should add one if so. Probably just take your `libomptarget`
test and run `update_cc_test_checks` on it with the arguments found in other
test files.
jhuber6 wrote:
> Overall I think it is the right way to go. Memory scope has been used by
> different offloading languages and the atomic clang builtins are essentially
> the same. Adding a generic clang atomic builtins with memory scope allows
> code sharing among offloading languages.
I
@@ -904,6 +904,32 @@ BUILTIN(__atomic_signal_fence, "vi", "n")
BUILTIN(__atomic_always_lock_free, "bzvCD*", "nE")
BUILTIN(__atomic_is_lock_free, "bzvCD*", "nE")
+// GNU atomic builtins with atomic scopes.
+ATOMIC_BUILTIN(__scoped_atomic_load, "v.", "t")
https://github.com/jhuber6 created
https://github.com/llvm/llvm-project/pull/72889
Summary:
The linker wrapper is a utility used to create offloading programs from
single-source offloading languages such as OpenMP or CUDA. This is done
by embedding device code into the host object, then feeding
801 - 900 of 1288 matches
Mail list logo