[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") //===--===// TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH] [LinkerWrapper] Support relocatable linking for offloading

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -181,5 +181,6 @@ __attribute__((visibility("protected"), used)) int x; // RUN: --linker-path=/usr/bin/ld.lld -- -r --whole-archive %t.a --no-whole-archive \ // RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE-LINK jhuber6 wrote:

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80183 Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH 1/3] [LinkerWrapper] Support relocatable linking for offloading

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > I'm assuming you're talking about GPU-side constructors? I don't think the > > CUDA runtime supports those, but OpenMP runs them when the image is loaded, > > so it would handle both independantly. > > Yes. I'm thinking of the expectations from a C++ user standpoint, and

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Supporting such mixed mode opens an interesting set of issues we may need to > consider going forward: > > who/where/how runs initializers in the fully linked parts? I'm assuming you're talking about GPU-side constructors? I don't think the CUDA runtime supports those, but

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") //===--===// TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") //===--===// TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > the idea is that it would be the desired effect if someone went out of > > their way to do this GPU subset linking thing. > > That would only be true when someone owns the whole build. That will not be > the case in practice. A large enough project is usually a bunch of

[clang] [HIP] fix HIP detection for /usr (PR #80190)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 approved this pull request. Do we have any tests for this kind of stuff? We really should have some mock ROCm installation in one of the `Inputs/` directories and then do `--rocm-path=` or something. https://github.com/llvm/llvm-project/pull/80190

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH 1/4] [LinkerWrapper] Support relocatable linking for offloading

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH 1/5] [LinkerWrapper] Support relocatable linking for offloading

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79660 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > This seems to have perturbed the HIP build. > > https://lab.llvm.org/staging/#/builders/22/builds/22 > > The problem is that we used to set `__AMDGCN_WAVEFRONTSIZE` for the host > > compilation as well in a bunch of the wave function macros. I think that > > this is just

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Reverted. I don't think there's a "proper" solution here since this seems to have leaked into the headers due to whoever set this up initially not properly setting these on the host. That seems to be endemic now, so the best we can do it just set it to some dummy values I

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22 The problem is that we used to set `__AMDGCN_WAVEFRONTSIZE` for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming,

[clang] [llvm] [NVPTX] Add builtin support for 'globaltimer' (PR #79765)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/79765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 72d4fc1 - Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)"

2024-01-29 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-01-29T11:11:25-06:00 New Revision: 72d4fc1b4d5cfc4f7d50cc5cf1b315543c088f4d URL: https://github.com/llvm/llvm-project/commit/72d4fc1b4d5cfc4f7d50cc5cf1b315543c088f4d DIFF: https://github.com/llvm/llvm-project/commit/72d4fc1b4d5cfc4f7d50cc5cf1b315543c088f4d.diff

[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/79892 Summary: We recently added builitin support for this function. >From 5f316d30a179dd21cfadd50d232de622d394ccea Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 14:28:35 -0600 Subject: [PATCH]

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > https://bugs.llvm.org/show_bug.cgi?id=35249 Yeah, there's constant issues with convergence analysis. I included one of the tests to try to show that it won't merge with the covergent attribute. Since this is a general issue for all of these things. In the past I usually add

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79768 >From 2c7049defef3b62de7017640948cccfb07ff756c Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 28 Jan 2024 14:57:05 -0600 Subject: [PATCH 1/2] [NVPTX] Add 'activemask' builtin and intrinsic support

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Added side effects attribute, I believe this matches the current behavior of the inline asm better. https://github.com/llvm/llvm-project/pull/79768 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>; def : Proc<"sm_62", [SM62, PTX50]>; def : Proc<"sm_70", [SM70, PTX60]>; def : Proc<"sm_72", [SM72, PTX61]>; -def : Proc<"sm_75", [SM75, PTX63]>; +def : Proc<"sm_75", [SM75, PTX62, PTX63]>; jhuber6 wrote:

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
@@ -4599,6 +4599,14 @@ def int_nvvm_vote_ballot_sync : [IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback], "llvm.nvvm.vote.ballot.sync">, ClangBuiltin<"__nvvm_vote_ballot_sync">; +// +// ACTIVEMASK +// +def int_nvvm_activemask : +

[llvm] [clang] [NVPTX] Add builtin support for 'globaltimer' (PR #79765)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79765 >From 5c4fc3dd207e91210f76c158e9c99e9591dccb96 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 08:12:35 -0600 Subject: [PATCH] [NVPTX} Add builtin support for 'globaltimer' Summary: This

[llvm] [clang] [NVPTX] Add builtin support for 'globaltimer' (PR #79765)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79765 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add builtin for 'exit' handling (PR #79777)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79777 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79768 >From 2c7049defef3b62de7017640948cccfb07ff756c Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 28 Jan 2024 14:57:05 -0600 Subject: [PATCH 1/3] [NVPTX] Add 'activemask' builtin and intrinsic support

[llvm] [clang] [NVPTX] Add builtin for 'exit' handling (PR #79777)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79777 >From ea3b32593dd0f2035020313176c6e1a131ef8eb4 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 28 Jan 2024 21:27:37 -0600 Subject: [PATCH] [NVPTX] Add builtin for 'exit' handling Summary: The PTX ISA has

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Relying on something _not_ being defined is probably not the best way to > handle 'generic' target. For starters it makes it hard or impossible to > recreate the same compilation state by undoing already-specified option. It > also breaks established assumption that there

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/79873 Summary: The NVPTX tools require an architecture to be used, however if we are creating generic LLVM-IR we should be able to leave it unspecified. This will result in the `target-cpu` attributes not being set on

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>; def : Proc<"sm_62", [SM62, PTX50]>; def : Proc<"sm_70", [SM70, PTX60]>; def : Proc<"sm_72", [SM72, PTX61]>; -def : Proc<"sm_75", [SM75, PTX63]>; +def : Proc<"sm_75", [SM75, PTX62, PTX63]>; jhuber6 wrote:

[clang] [llvm] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Unlike the other PRs, this one has a CUDA function, `__activemask()`. > Presumably we should make that one work by hacking our headers? That is currently defined here https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_intrinsics.h#L214. I was

[llvm] [clang] [NVPTX] Add builtin support for 'nanosleep' PTX instrunction (PR #79888)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/79888 Summary: This patch adds a builtin for the `nanosleep` PTX function. It takes either an immediate or a register and sleeps for [0, 2t] nanoseconds given t. More information at the documentation:

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > I think there's some precedent from both vendors to treat missing > > attributes as a more generic target. > > It sounds more like a bug than a feature to me. > > The major difference between "you get sm_xx by default" and this "you get > generic by default" is that With

[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: I've actually encountered some really strange behavior when trying to update `libc` to use the new intrinsic. The following returns a common 64-bit value to be compatible with AMDGPU's 64 lane wide mode. When I run this against the test suite, it fails on tests that

[clang] [llvm] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > I was planning on updating this to use the new instrinsic for the newer > > version. Alternatively we could make __activemask the builtin which expands > > to both versions, but I'm somewhat averse since we should target the > > instruction directly I feel. > > Yes, I

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>; def : Proc<"sm_62", [SM62, PTX50]>; def : Proc<"sm_70", [SM70, PTX60]>; def : Proc<"sm_72", [SM72, PTX61]>; -def : Proc<"sm_75", [SM75, PTX63]>; +def : Proc<"sm_75", [SM75, PTX62, PTX63]>; jhuber6 wrote:

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>; def : Proc<"sm_62", [SM62, PTX50]>; def : Proc<"sm_70", [SM70, PTX60]>; def : Proc<"sm_72", [SM72, PTX61]>; -def : Proc<"sm_75", [SM75, PTX63]>; +def : Proc<"sm_75", [SM75, PTX62, PTX63]>; jhuber6 wrote:

[llvm] [clang] [NVPTX] Add builtin support for 'nanosleep' PTX instrunction (PR #79888)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79888 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79768 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix HIP detection for /usr (PR #80190)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/80190 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
@@ -4,13 +4,10 @@ // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s +// expected-no-diagnostics + typedef unsigned long ulong; void

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80183 >From 26b75cdba1aebc881e52dc82ca61e1082ef67a5e Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Wed, 31 Jan 2024 13:18:04 -0600 Subject: [PATCH] [AMDGPU] Allow w64 ballot to be used on w32 targets Summary:

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
@@ -4,13 +4,10 @@ // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s +// expected-no-diagnostics + typedef unsigned long ulong; void

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > After this change is there any value in having two different builtins? You > could just have one that always return 64 bits. I personally think it would be better to just have the one, but I figured that decision was made earlier and it would break backwards compatibility.

[llvm] [openmp] [clang] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) {

[openmp] [clang] [llvm] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) {

[llvm] [openmp] [clang] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) {

[openmp] [clang] [llvm] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy ) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) {

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: This is related to the discussions at the https://github.com/llvm/llvm-project/issues/77018 issue. https://github.com/llvm/llvm-project/pull/80066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80066 Summary: The standard GPU compilation process embeds each intermediate object file into the host file at the `.llvm.offloading` section so it can be linked later. We also use a sepcial section called something

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79873 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79873 >From 35e12c3d83f3be93618805ffaf05e3424689f32f Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 11:08:04 -0600 Subject: [PATCH 1/2] [NVPTX] Allow compiling LLVM-IR without `-march` set

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79873 >From 35e12c3d83f3be93618805ffaf05e3424689f32f Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 11:08:04 -0600 Subject: [PATCH 1/3] [NVPTX] Allow compiling LLVM-IR without `-march` set

[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 0a2b5b0 - [NVPTX][Fix] Ensure the return value of 'activemask' is unsigned

2024-01-29 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-01-29T17:33:38-06:00 New Revision: 0a2b5b03c4084ac1fefd0e62db2ba49f5ac24ab9 URL: https://github.com/llvm/llvm-project/commit/0a2b5b03c4084ac1fefd0e62db2ba49f5ac24ab9 DIFF: https://github.com/llvm/llvm-project/commit/0a2b5b03c4084ac1fefd0e62db2ba49f5ac24ab9.diff

[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Scratch that, I missed `Ui` in the builtin definition. I'll do a quick fix. https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[flang] [clang] [clang-tools-extra] [llvm] [compiler-rt] [libcxx] [libc] [lldb] [lld] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > This method of compilation is not like CUDA, so we can't target all the > > GPUs at the same time. > > I think this is the key fact I was missing. If the patch is only for a > standalone compilation which does not do multi-GPU compilation in principle, > then your approach

[lld] [lldb] [llvm] [compiler-rt] [clang-tools-extra] [libc] [clang] [flang] [libcxx] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > On the other hand, I'd be OK with providing --offload-arch=native translating > into "compile for all present GPU variants", with a possibility to further > adjust the selected set with the usual --no-offload-arch-foo, if the user > needs to. This will at least produce code

[compiler-rt] [flang] [libcxx] [clang] [llvm] [clang-tools-extra] [lldb] [lld] [libc] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79373 >From 145b7bc932ce3ffa46545cd7af29b1c93981429c Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Wed, 24 Jan 2024 15:34:00 -0600 Subject: [PATCH 1/3] [NVPTX] Add support for -march=native in standalone NVPTX

[clang] [lld] [libcxx] [flang] [compiler-rt] [libc] [clang-tools-extra] [llvm] [lldb] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > User confusion is only part of the issue here. With any single GPU choice we > would still potentially produce a nonworking binary, if our GPU choice does > not match what the user wants. > > "all GPUs" has the advantage of always producing the binary that's guaranteed > to

[lld] [lldb] [libcxx] [compiler-rt] [clang-tools-extra] [llvm] [libc] [clang] [flang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > This method of compilation is not like CUDA, so we can't target all the > > GPUs at the same time. > > Can you clarify for me -- what are you compiling where it's impossible to > target multiple GPUs in the binary? I'm confused because Art is understanding > that it's not

[lld] [lldb] [libcxx] [compiler-rt] [clang-tools-extra] [llvm] [libc] [clang] [flang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > I...think I understand. > > Is the output of this compilation step a cubin, then? Yes, it will spit out a simple `cubin` instead of a fatbinary. The NVIDIA toolchain is much worse about this stuff than the AMD one, but in general it works. You can check with `-###` or

[clang-tools-extra] [llvm] [libc] [clang] [libcxx] [lldb] [lld] [flang] [compiler-rt] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [lldb] [libc] [libcxx] [lld] [llvm] [flang] [compiler-rt] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > I think the semantics of native on other architectures are clear enough > > here. > > I don't think we have the same idea about that. Let's spell it out, so > there's no confusion. > > [GCC > manual](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-march-16) >

[flang] [clang] [libc] [compiler-rt] [clang-tools-extra] [llvm] [lld] [lldb] [libcxx] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-25 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Got it, okay, thanks. > > Since this change only applies to `--target=nvptx64-nvidia-cuda`, fine by me. > Thanks for putting up with our scrutiny. :) No problem, I probably should've have been clearer in my commit messages. https://github.com/llvm/llvm-project/pull/79373

[clang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

2024-01-24 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Some interesting points, I'll try to clarify some things. > This option may not as well as one would hope. > > Problem #1 is that it will drastically slow down compilation for some users. > NVIDIA GPU drivers are loaded on demand, and the process takes a while > (O(second),

[clang] [Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (PR #78333)

2024-01-23 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/78333 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[mlir] [clang] [llvm] [openmp] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-05 Thread Joseph Huber via cfe-commits
@@ -6872,35 +6883,6 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(StringRef HostFilePath) { loadOffloadInfoMetadata(*M.get()); } -Function *OpenMPIRBuilder::createRegisterRequires(StringRef Name) { jhuber6 wrote: It was a very obvious problem. I mixed

[clang] d172286 - [Clang] Make AMDGPU OpenCL tests require AMD registered target

2024-02-05 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-02-05T09:08:31-06:00 New Revision: d1722868d34a69df8466b72098176f54a7af8823 URL: https://github.com/llvm/llvm-project/commit/d1722868d34a69df8466b72098176f54a7af8823 DIFF: https://github.com/llvm/llvm-project/commit/d1722868d34a69df8466b72098176f54a7af8823.diff

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/80183 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/80741 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Joseph Huber via cfe-commits
@@ -832,6 +832,13 @@ void test_atomic_inc_dec(local uint *lptr, global uint *gptr, uint val) { res = __builtin_amdgcn_atomic_dec32((volatile global uint*)gptr, val, __ATOMIC_SEQ_CST, ""); } +// CHECK-LABEL test_wavefrontsize( +unsigned test_wavefrontsize() {

[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80741 Summary: The backend supports the wavefrontsize intrinsic, and suggests that it is tied to a corresponding clang builtin, but it is not actually present. This simply adds it in so it can be used from clang. This

[clang] [Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (PR #78333)

2024-01-22 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/78333 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (PR #78333)

2024-01-22 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 commented: You should add a test that checks the output of `-ccc-print-phases` and `-ccc-print-bindings`. https://github.com/llvm/llvm-project/pull/78333 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[llvm] [clang] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-22 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > FYI. There is a failure in liner-wrapper.c in > https://buildkite.com/llvm-project/github-pull-requests/builds/30337#018d1aaa-8225-4630-a5f0-527d1c7c129d > > ``` > # note: command had no output on stdout or stderr > | # error: command failed with exit status: 1 > | #

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > Right now if you specify target-cpu you get target-cpu attributes, which is > > what we don't want. > > I'm fine handling 'generic' in a special way under the hood and not > specifying target-CPU. > > My concern is about user-facing interface. Command line options must be

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
@@ -175,6 +175,8 @@ Predefined Macros - Defined when the GPU default stream is set to per-thread mode. * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. + * - ``__AMDGCN_WAVEFRONT_SIZE__``

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80035 >From f606aaa9c711d2ece6b1600160a61232abb69eb4 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 08:46:14 -0600 Subject: [PATCH 1/2] [AMDGPU] Do not emit arch dependent macros with unspecified

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
@@ -175,6 +175,8 @@ Predefined Macros - Defined when the GPU default stream is set to per-thread mode. * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. + * - ``__AMDGCN_WAVEFRONT_SIZE__``

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80035 Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Rework of https://github.com/llvm/llvm-project/pull/79660 to handle old behavior of these being defined for the host. https://github.com/llvm/llvm-project/pull/80035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/80035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 626fe71 - [Clang] Fix test failing on systems without ROCm installed

2024-01-30 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-01-30T13:17:02-06:00 New Revision: 626fe71fa5ed79cbd41b7b29582560d7adb1220e URL: https://github.com/llvm/llvm-project/commit/626fe71fa5ed79cbd41b7b29582560d7adb1220e DIFF: https://github.com/llvm/llvm-project/commit/626fe71fa5ed79cbd41b7b29582560d7adb1220e.diff

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > This seems to break tests: http://45.33.8.238/linux/129493/step_7.txt > > Please take a look and revert for now if it takes a while to fix. Is it still broken? I pushed a fix because I'm pretty sure the problem was not passing `-nogpulib` `-nogpuinc` so the test runs on

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > i.e. it helped with Clang :: Preprocessor/predefined-arch-macros.c but not > with: > > Failed Tests (2): Clang :: Driver/amdgpu-macros.cl Clang :: > Driver/target-id-macros.cl Thanks, seeing it locally now. I'll try to fix it quick and revert if it's not working soon.

[clang] 6fecfbc - [AMDGPU] Correctly exclude the HIP host from arch macros

2024-01-30 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-01-30T13:45:01-06:00 New Revision: 6fecfbc7b62f54bd633e83c22630d7c2a3e5741e URL: https://github.com/llvm/llvm-project/commit/6fecfbc7b62f54bd633e83c22630d7c2a3e5741e DIFF: https://github.com/llvm/llvm-project/commit/6fecfbc7b62f54bd633e83c22630d7c2a3e5741e.diff

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > i.e. it helped with Clang :: Preprocessor/predefined-arch-macros.c but not > with: > > Failed Tests (2): Clang :: Driver/amdgpu-macros.cl Clang :: > Driver/target-id-macros.cl Pushed a fix, `check-clang` passes on my machine now. Let me know if it's still broken.

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits
@@ -205,6 +220,56 @@ class AtomicScopeHIPModel : public AtomicScopeModel { } }; +/// Defines the generic atomic scope model. +class AtomicScopeGenericModel : public AtomicScopeModel { +public: + /// The enum values match predefined built-in macros __ATOMIC_SCOPE_*. + enum

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits
@@ -54,6 +59,16 @@ enum class SyncScope { inline llvm::StringRef getAsString(SyncScope S) { jhuber6 wrote: I think it's because this is for AST printing purposes, while the backend strings vary per target. https://github.com/llvm/llvm-project/pull/72280

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Is there any actual difference now between these and the HIP/OpenCL flavors > other than dropping the language from the name? Yes, these directly copy the GNU functions and names. The OpenCL / HIP ones use a different format. https://github.com/llvm/llvm-project/pull/72280

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits
@@ -798,6 +798,13 @@ static void InitializePredefinedMacros(const TargetInfo , Builder.defineMacro("__ATOMIC_ACQ_REL", "4"); Builder.defineMacro("__ATOMIC_SEQ_CST", "5"); + // Define macros for the clang atomic scopes. + Builder.defineMacro("__MEMORY_SCOPE_SYSTEM",

[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)

2023-11-15 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/72442 Summary: Currently the linker wrapper strictly assigns a single input binary to a single link job based off of its input architecture. This is not sufficient to implement the AMDGPU target ID correctly as this

[clang] [openmp] [Clang][OpenMP] Fix ordering of processing of map clauses when mapping a struct. (PR #72410)

2023-11-15 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 commented: This being in clang instead seems like a good change. Are there no CodeGen tests changed? We should add one if so. Probably just take your `libomptarget` test and run `update_cc_test_checks` on it with the arguments found in other test files.

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Overall I think it is the right way to go. Memory scope has been used by > different offloading languages and the atomic clang builtins are essentially > the same. Adding a generic clang atomic builtins with memory scope allows > code sharing among offloading languages. I

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits
@@ -904,6 +904,32 @@ BUILTIN(__atomic_signal_fence, "vi", "n") BUILTIN(__atomic_always_lock_free, "bzvCD*", "nE") BUILTIN(__atomic_is_lock_free, "bzvCD*", "nE") +// GNU atomic builtins with atomic scopes. +ATOMIC_BUILTIN(__scoped_atomic_load, "v.", "t")

[clang] [LinkerWrapper] Accenp some neede COFF linker argument (PR #72889)

2023-11-20 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/72889 Summary: The linker wrapper is a utility used to create offloading programs from single-source offloading languages such as OpenMP or CUDA. This is done by embedding device code into the host object, then feeding

<    4   5   6   7   8   9   10   11   12   13   >