[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/81033
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Okay, `__nvvm_reflect` doesn't work fully here because the `nanosleep` 
> builtin I added requires `sm_70` at the clang level. Either means I'd need to 
> go back to inline assembly or remove that requirement at least from clang so 
> it's a backend failure.

The question is -- who's going to provide a fallback implementation for the 
nanosleepbuiltin for the older GPUs. I do not think it's LLVM's job, so 
constraining the builtin is appropriate. However, nothing stops you from 
providing your own implementation in libc using inline asm. Something along 
these lines:
```
__device__ void my_nanosleep(int N) {
  if (__nvvm_reflect(SM_70)) {
asm volatile("nanosleep")
  } else {
 while(N--) {
volatile asm("something unoptimizable")
 }
  }
}
```

https://github.com/llvm/llvm-project/pull/81033
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Okay, `__nvvm_reflect` doesn't work fully here because the `nanosleep` builtin 
I added requires `sm_70` at the clang level. Either means I'd need to go back 
to inline assembly or remove that requirement at least from clang so it's a 
backend failure.

https://github.com/llvm/llvm-project/pull/81033
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > This patch, which simply makes it legal on all architectures but do nothing 
> > is it's older than sm_70.
> 
> I do not think this is the right thing to do. "do nothing" is not what one 
> would expect from a `nanosleep`.

Thanks, I made this a draft because I figured it wasn't the correct thing to do 
but wanted to pose the question.

> Let's unpack your problem a bit.
> 
> __nvvm_reflect() is probably closest to what you would need. However, IIUIC, 
> if you use it to provide nanosleep-based variant and an alternative for the 
> older GPUs, the `nanosleep` variant code will still hang off the dead branch 
> of if(__nvvm_reflect()) and if it's not eliminated by DCE (which it would not 
> if optimizations are off), the resulting PTX will be invalid for the older 
> GPUs.
> 
> In other words, pushing nanosleep implementation into an intrinsic makes 
> things compile everywhere at the expense of doing a wrong thing on the older 
> GPUs. I do not think it's a good trade-off.
> 
> Perhaps a better approach would be to incorporate dead branch elimination 
> onto NVVMReflect pass itself. We do know that it is the explicit intent of 
> `__nvvm_reflect()`. If NVVMReflect explicitly guarantees that the dead branch 
> will be gone, it should allow you to use approach `#1` w/o concerns for 
> whether optimizations are enabled and you should be able to provide whatever 
> alternative implementation you need (even if it's a null one), without 
> affecting correctness of LLVM itself.

I think that would be a good solution if possible. Would this simply mean 
scheduling a global DCE pass right after the NVVM reflect pass? Since that 
seems to be run at `O0` that seems like the easiest solution, though it 
somewhat breaks `O0` semantics.

Or, maybe we just have a really shallow implementation in the NVVM reflect pass 
that collapses the branch?

https://github.com/llvm/llvm-project/pull/81033
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> This patch, which simply makes it legal on all architectures but do nothing 
> is it's older than sm_70.

I do not think this is the right thing to do. "do nothing" is not what one 
would expect from a `nanosleep`.

Let's unpack your problem a bit.

__nvvm_reflect() is probably closest to what you would need. However, IIUIC, if 
you use it to provide nanosleep-based variant and an alternative for the older 
GPUs, the `nanosleep` variant code will still hang off the dead branch of 
if(__nvvm_reflect()) and if it's not eliminated by DCE (which it would not if 
optimizations are off), the resulting PTX will be invalid for the older GPUs.

In other words, pushing nanosleep implementation into an intrinsic makes things 
compile everywhere at the expense of doing a wrong thing on the older GPUs. I 
do not think it's a good trade-off.

Perhaps a better approach would be to incorporate dead branch elimination onto 
NVVMReflect pass itself. We do know that it is the explicit intent of 
`__nvvm_reflect()`. If NVVMReflect explicitly guarantees that the dead branch 
will be gone, it should allow you to use approach `#1` w/o concerns for whether 
optimizations are enabled and you should be able to provide whatever 
alternative implementation you need (even if it's a null one), without 
affecting correctness of LLVM itself. 



https://github.com/llvm/llvm-project/pull/81033
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/81033

Summary;
The LLVM C library currently uses `nanosleep` in the RPC interface and
for the C library `nanosleep` function. We build the LLVM C library for
every single NVPTX architecture individually currently, which is not
ideal. The goal is to make the LLVM-IR target independent, unfortunately
the one snag is the `nanosleep` function which will crash if used on a
GPU older than sm_70. There are three possible solutions to this.

1. Use `__nvvm_reflect(__CUDA_ARCH__)` like the libdevice functions.
   This will work as long as optimizations are on, not ideal.
2. Get rid of the use of nanosleep in `libc`. This isn't ideal as
   sleeping during the busy-wait loops is helpful for thread scheduling
   and it prevents us from providing `nanosleep` as a C library
   function.
3. This patch, which simply makes it legal on all architectures but do
   nothing is it's older than sm_70.

This is a draft to question if this is an acceptable hack, as an
intrinsic silently doing nothing is not always a good idea. Potentially
a new intrinsic could be added instead, but there is also a desire to
have intrinsics map 1-to-1 with hardware.


>From 10447352c68c666c51cfba7d84a06cb23327bc8a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 7 Feb 2024 14:03:00 -0600
Subject: [PATCH] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported

Summary;
The LLVM C library currently uses `nanosleep` in the RPC interface and
for the C library `nanosleep` function. We build the LLVM C library for
every single NVPTX architecture individually currently, which is not
ideal. The goal is to make the LLVM-IR target independent, unfortunately
the one snag is the `nanosleep` function which will crash if used on a
GPU older than sm_70. There are three possible solutions to this.

1. Use `__nvvm_reflect(__CUDA_ARCH__)` like the libdevice functions.
   This will work as long as optimizations are on, not ideal.
2. Get rid of the use of nanosleep in `libc`. This isn't ideal as
   sleeping during the busy-wait loops is helpful for thread scheduling
   and it prevents us from providing `nanosleep` as a C library
   function.
3. This patch, which simply makes it legal on all architectures but do
   nothing is it's older than sm_70.

This is a draft to question if this is an acceptable hack, as an
intrinsic silently doing nothing is not always a good idea. Potentially
a new intrinsic could be added instead, but there is also a desire to
have intrinsics map 1-to-1 with hardware.
---
 clang/include/clang/Basic/BuiltinsNVPTX.def | 2 +-
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td| 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def 
b/clang/include/clang/Basic/BuiltinsNVPTX.def
index 7819e71d7fe2aa..5fd17a1f5b8552 100644
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ b/clang/include/clang/Basic/BuiltinsNVPTX.def
@@ -159,7 +159,7 @@ BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n")
 
 BUILTIN(__nvvm_prmt, "UiUiUiUi", "")
 BUILTIN(__nvvm_exit, "v", "r")
-TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", AND(SM_70, PTX63))
+TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", PTX63)
 
 // Min Max
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td 
b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 2330d7213c26dc..fd786a12c78eba 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -646,6 +646,15 @@ def INT_NVVM_NANOSLEEP_I : NVPTXInst<(outs), (ins 
i32imm:$i), "nanosleep.u32 \t$
 def INT_NVVM_NANOSLEEP_R : NVPTXInst<(outs), (ins Int32Regs:$i), 
"nanosleep.u32 \t$i;",
  [(int_nvvm_nanosleep Int32Regs:$i)]>,
 Requires<[hasPTX<63>, hasSM<70>]>;
+
+// Make 'nanosleep' a no-op on older architectures.
+def INT_NVVM_NANOSLEEP_I_NOOP : NVPTXInst<(outs), (ins i32imm:$i), "/* no-op 
*/",
+ [(int_nvvm_nanosleep imm:$i)]>,
+Requires<[hasPTX<63>]>;
+def INT_NVVM_NANOSLEEP_R_NOOP : NVPTXInst<(outs), (ins Int32Regs:$i), "/* 
no-op */",
+ [(int_nvvm_nanosleep Int32Regs:$i)]>,
+Requires<[hasPTX<63>]>;
+
 //
 // Min Max
 //

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits