[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)
jhuber6 wrote: Scratch that, I missed `Ui` in the builtin definition. I'll do a quick fix. https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)
jhuber6 wrote: I've actually encountered some really strange behavior when trying to update `libc` to use the new intrinsic. The following returns a common 64-bit value to be compatible with AMDGPU's 64 lane wide mode. When I run this against the test suite, it fails on tests that specifically check against divergence. This works ```c++ [[clang::convergent, gnu::noinline]] uint64_t get_lane_mask() { uint32_t mask; mask = __nvvm_activemask(); return mask; } ``` But this does not ```c++ [[clang::convergent, gnu::noinline]] uint64_t get_lane_mask() { return __nvvm_activemask(); } ``` If I check the PTX, the main difference seems to be the `cvt` instruction, here's the output respectively. ```asm .weak .func (.param .b64 func_retval0) _ZN22__llvm_libc_19_0_0_git3gpu13get_lane_maskEv() { .reg .b32 %r<2>; .reg .b64 %rd<2>; // %bb.0: // %entry activemask.b32 %r1; cvt.u64.u32 %rd1, %r1; st.param.b64 [func_retval0+0], %rd1; ret; } ``` ```asm .weak .func (.param .b64 func_retval0) _ZN22__llvm_libc_19_0_0_git3gpu13get_lane_maskEv() { .reg .b32 %r<2>; .reg .b64 %rd<2>; // %bb.0: // %entry activemask.b32 %r1; cvt.s64.s32 %rd1, %r1; st.param.b64 [func_retval0+0], %rd1; ret; } ``` So, the difference is that the version that works uses `cvt.u64.u32` while the version that's broken uses `cvt.s64.s32`. This means that likely this is returning a "signed" value, and the conversion is treating it like a negative number when all threads are active. @Artem-B is there a correct way to assert that this is unsigned so it does the correct thing? https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)
https://github.com/jlebar approved this pull request. https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) Changes Summary: We recently added builitin support for this function. --- Full diff: https://github.com/llvm/llvm-project/pull/79892.diff 1 Files Affected: - (modified) clang/lib/Headers/__clang_cuda_intrinsics.h (+1-3) ``diff diff --git a/clang/lib/Headers/__clang_cuda_intrinsics.h b/clang/lib/Headers/__clang_cuda_intrinsics.h index 3c3948863c1d453..a04e8b6de44d053 100644 --- a/clang/lib/Headers/__clang_cuda_intrinsics.h +++ b/clang/lib/Headers/__clang_cuda_intrinsics.h @@ -215,9 +215,7 @@ inline __device__ unsigned int __activemask() { #if CUDA_VERSION < 9020 return __nvvm_vote_ballot(1); #else - unsigned int mask; - asm volatile("activemask.b32 %0;" : "=r"(mask)); - return mask; + return __nvvm_activemask(); #endif } `` https://github.com/llvm/llvm-project/pull/79892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/79892 Summary: We recently added builitin support for this function. >From 5f316d30a179dd21cfadd50d232de622d394ccea Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 14:28:35 -0600 Subject: [PATCH] [CUDA] Change '__activemask' to use '__nvvm_activemask()' Summary: We recently added builitin support for this function. --- clang/lib/Headers/__clang_cuda_intrinsics.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/clang/lib/Headers/__clang_cuda_intrinsics.h b/clang/lib/Headers/__clang_cuda_intrinsics.h index 3c3948863c1d45..a04e8b6de44d05 100644 --- a/clang/lib/Headers/__clang_cuda_intrinsics.h +++ b/clang/lib/Headers/__clang_cuda_intrinsics.h @@ -215,9 +215,7 @@ inline __device__ unsigned int __activemask() { #if CUDA_VERSION < 9020 return __nvvm_vote_ballot(1); #else - unsigned int mask; - asm volatile("activemask.b32 %0;" : "=r"(mask)); - return mask; + return __nvvm_activemask(); #endif } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits