[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Scratch that, I missed `Ui` in the builtin definition. I'll do a quick fix.

https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

I've actually encountered some really strange behavior when trying to update 
`libc` to use the new intrinsic. The following returns a common 64-bit value to 
be compatible with AMDGPU's 64 lane wide mode. When I run this against the test 
suite, it fails on tests that specifically check against divergence.

This works
```c++
[[clang::convergent, gnu::noinline]]  uint64_t get_lane_mask() {
  uint32_t mask;  
  mask = __nvvm_activemask();
  return mask;   
} 
```

But this does not
```c++
[[clang::convergent, gnu::noinline]] uint64_t get_lane_mask() {
  return __nvvm_activemask(); 
} 
```

If I check the PTX, the main difference seems to be the `cvt` instruction, 
here's the output respectively.

```asm
.weak .func  (.param .b64 func_retval0) 
_ZN22__llvm_libc_19_0_0_git3gpu13get_lane_maskEv()
{
  .reg .b32   %r<2>;
  .reg .b64   %rd<2>;

// %bb.0:   // %entry
  activemask.b32  %r1;
  cvt.u64.u32   %rd1, %r1;
  st.param.b64  [func_retval0+0], %rd1;
  ret;
}
```

```asm
.weak .func  (.param .b64 func_retval0) 
_ZN22__llvm_libc_19_0_0_git3gpu13get_lane_maskEv()
{
  .reg .b32   %r<2>;
  .reg .b64   %rd<2>;

// %bb.0:   // %entry
  activemask.b32  %r1;
  cvt.s64.s32   %rd1, %r1;
  st.param.b64  [func_retval0+0], %rd1;
  ret;
}
```

So, the difference is that the version that works uses `cvt.u64.u32` while the 
version that's broken uses `cvt.s64.s32`. This means that likely this is 
returning a "signed" value, and the conversion is treating it like a negative 
number when all threads are active. @Artem-B is there a correct way to assert 
that this is unsigned so it does the correct thing?

https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Justin Lebar via cfe-commits

https://github.com/jlebar approved this pull request.


https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)


Changes

Summary:
We recently added builitin support for this function.


---
Full diff: https://github.com/llvm/llvm-project/pull/79892.diff


1 Files Affected:

- (modified) clang/lib/Headers/__clang_cuda_intrinsics.h (+1-3) 


``diff
diff --git a/clang/lib/Headers/__clang_cuda_intrinsics.h 
b/clang/lib/Headers/__clang_cuda_intrinsics.h
index 3c3948863c1d453..a04e8b6de44d053 100644
--- a/clang/lib/Headers/__clang_cuda_intrinsics.h
+++ b/clang/lib/Headers/__clang_cuda_intrinsics.h
@@ -215,9 +215,7 @@ inline __device__ unsigned int __activemask() {
 #if CUDA_VERSION < 9020
   return __nvvm_vote_ballot(1);
 #else
-  unsigned int mask;
-  asm volatile("activemask.b32 %0;" : "=r"(mask));
-  return mask;
+  return __nvvm_activemask();
 #endif
 }
 

``




https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Change '__activemask' to use '__nvvm_activemask()' (PR #79892)

2024-01-29 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/79892

Summary:
We recently added builitin support for this function.


>From 5f316d30a179dd21cfadd50d232de622d394ccea Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 29 Jan 2024 14:28:35 -0600
Subject: [PATCH] [CUDA] Change '__activemask' to use '__nvvm_activemask()'

Summary:
We recently added builitin support for this function.
---
 clang/lib/Headers/__clang_cuda_intrinsics.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/clang/lib/Headers/__clang_cuda_intrinsics.h 
b/clang/lib/Headers/__clang_cuda_intrinsics.h
index 3c3948863c1d45..a04e8b6de44d05 100644
--- a/clang/lib/Headers/__clang_cuda_intrinsics.h
+++ b/clang/lib/Headers/__clang_cuda_intrinsics.h
@@ -215,9 +215,7 @@ inline __device__ unsigned int __activemask() {
 #if CUDA_VERSION < 9020
   return __nvvm_vote_ballot(1);
 #else
-  unsigned int mask;
-  asm volatile("activemask.b32 %0;" : "=r"(mask));
-  return mask;
+  return __nvvm_activemask();
 #endif
 }
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits