https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/141036
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -177,6 +177,7 @@ let Attributes = [NoReturn] in {
}
let Attributes = [NoThrow] in {
def __nvvm_nanosleep : NVPTXBuiltinSMAndPTX<"void(unsigned int)", SM_70,
PTX63>;
+ def __nvvm_pm_event_mask : NVPTXBuiltin<"void(unsigned short)">;
Artem-B wrote:
The ar
https://github.com/Artem-B approved this pull request.
Builtin signature needs a fix, but LGTM otherwise.
https://github.com/llvm/llvm-project/pull/141278
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/li
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/141278
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/141143
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F,
Function *&NewFn,
else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" ||
Name == "swap.lo.hi.b64")
Expand = true;
+ else if (Name == "barrier0" || Name == "b
@@ -170,6 +170,8 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public
TargetInfo {
Opts["cl_khr_global_int32_extended_atomics"] = true;
Opts["cl_khr_local_int32_base_atomics"] = true;
Opts["cl_khr_local_int32_extended_atomics"] = true;
+
+Opts["__opencl_c_
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/138706
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -2927,6 +2928,20 @@ void Verifier::visitFunction(const Function &F) {
"Calling convention does not support varargs or "
"perfect forwarding!",
&F);
+if (F.getCallingConv() == CallingConv::PTX_Kernel &&
+TT.getOS() == Triple::CUDA) {
@@ -5734,6 +5734,9 @@ def nobuiltininc : Flag<["-"], "nobuiltininc">,
def nogpuinc : Flag<["-"], "nogpuinc">, Group,
HelpText<"Do not add include paths for CUDA/HIP and"
" do not include the default CUDA/HIP wrapper headers">;
+def gpuinc : Flag<["-"], "gpuinc">, Group,
+
@@ -5734,6 +5734,9 @@ def nobuiltininc : Flag<["-"], "nobuiltininc">,
def nogpuinc : Flag<["-"], "nogpuinc">, Group,
HelpText<"Do not add include paths for CUDA/HIP and"
" do not include the default CUDA/HIP wrapper headers">;
+def gpuinc : Flag<["-"], "gpuinc">, Group,
+
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/140106
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B commented:
Being able to override a flag is a good thing to have, IMO. There are builds
where the owner of the leaf targets do not have much control over which options
are set by the "default" compilation, so they need to rely on being able to
override preceding opti
@@ -1399,19 +1399,27 @@ void NVPTXAsmPrinter::emitFunctionParamList(const
Function *F, raw_ostream &O) {
if (PTy) {
O << "\t.param .u" << PTySizeInBits << " .ptr";
+bool IsCUDA = static_cast(TM).getDrvInterface()
==
+ NVPTX::CUDA;
https://github.com/Artem-B closed
https://github.com/llvm/llvm-project/pull/139164
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
No wrappers -- no problems. :-)
https://github.com/llvm/llvm-project/pull/139164
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/139164
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B updated
https://github.com/llvm/llvm-project/pull/139164
>From a1d60feed11174b9d2106b57ee15ff6d9bc56fa4 Mon Sep 17 00:00:00 2001
From: Artem Belevich
Date: Thu, 8 May 2025 14:43:47 -0700
Subject: [PATCH] [CUDA] remove obsolete GPU-side __constexpr* wrappers
libc++ no
Artem-B wrote:
> Right now this checks for `libc++` less than 14. Is that still relevant
> following that change?
That's a very good point. Looks like those `__constexpr_fmin/fmax` are gone now
and we do not heed them any more.
https://github.com/llvm/llvm-project/pull/139164
Artem-B wrote:
@jhuber6 @ldionne One concern I have for this change is that it will break
folks who will use older libc++ with the new Clang + wrapper headers.
Is older libc++ expected to work with non-matching clang version? If the
expectation is that libc++ and clang are from the same versio
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/139244
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
@cgmb
> I would suggest that we should either (a) change the default GPU target to
> native and make the failure to detect the user’s GPU into a hard compiler
> error, or (b) change the default GPU target to SPIR-V so that it works on
> every machine.
The thing is that the se
Artem-B wrote:
@jhuber6 do you think can we use `native` instead? I think it would be a
somewhat better option here.
If we have to choose a GPU variant by default, we may as well choose the actual
GPU, rather than a conditional choice between generic SPIR-V or an old GPU,
which has the disadva
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/138162
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -109,3 +109,48 @@ void func2(void) {
void func3(void) {
float a[16][1] = {{0.}};
}
+
+// CL12-LABEL: define dso_local void @wrong_store_type_private_pointer_alloca(
+// CL12-SAME: ) #[[ATTR0]] {
+// CL12-NEXT: [[ENTRY:.*:]]
+// CL12-NEXT:[[PLONG:%.*]] = alloca i64, al
@@ -109,3 +109,48 @@ void func2(void) {
void func3(void) {
float a[16][1] = {{0.}};
}
+
+// CL12-LABEL: define dso_local void @wrong_store_type_private_pointer_alloca(
+// CL12-SAME: ) #[[ATTR0]] {
+// CL12-NEXT: [[ENTRY:.*:]]
+// CL12-NEXT:[[PLONG:%.*]] = alloca i64, al
@@ -2376,9 +2376,14 @@ NamedDecl *Sema::LazilyCreateBuiltin(IdentifierInfo *II,
unsigned ID,
return nullptr;
}
+ // Warn for implicit uses of header dependent libraries,
+ // except in system headers.
if (!ForRedeclaration &&
(Context.BuiltinInfo.isPredefine
Artem-B wrote:
OK. This makes sense.
> sorry this change is so drawn out :)
What matters is that you're making progress, and I appreciate your work on
getting this issue sorted out the right way.
https://github.com/llvm/llvm-project/pull/138205
_
Artem-B wrote:
Something does not add up here. AFAICT, using builtins w/o explicitly declaring
them is something that's done all the time. https://godbolt.org/z/ha47W53dh
In that sense, we should not be needing to filter out the diagnostics coming
from the system headers only. There should not
@@ -0,0 +1,23 @@
+// expected-no-diagnostics
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -aux-triple
amdgcn-amd-amdhsa -fsyntax-only -verify -xhip %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fsyntax-only -fcuda-is-device
-verify -xhip %s
+
+#include "Inputs/cuda
https://github.com/Artem-B commented:
LGTM in principle.
Now the question is -- how do we test it? There are multiple libstdc++ library
versions in the wild and we must not break any of them. We do have some testing
on CUDA test bots (which I've just discovered to be silently broken for a whil
@@ -0,0 +1,35 @@
+// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail()
+// to trigger compilation errors when the __glibcxx_assert(cond) macro
+// is used in a constexpr context.
+// Compilation fails when using code from the libstdc++ (such as std::array) on
https://github.com/Artem-B approved this pull request.
LGTM w/ a nit.
https://github.com/llvm/llvm-project/pull/136645
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1100,3 +1101,49 @@ std::string SemaCUDA::getConfigureFuncName() const {
// Legacy CUDA kernel configuration call
return "cudaConfigureCall";
}
+
+// Record any local constexpr variables that are passed one way on the host
+// and another on the device.
+void SemaCUDA::r
@@ -1100,3 +1101,49 @@ std::string SemaCUDA::getConfigureFuncName() const {
// Legacy CUDA kernel configuration call
return "cudaConfigureCall";
}
+
+// Record any local constexpr variables that are passed one way on the host
+// and another on the device.
+void SemaCUDA::r
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/136645
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -25,6 +25,7 @@ enum AddressSpace : unsigned {
ADDRESS_SPACE_CONST = 4,
ADDRESS_SPACE_LOCAL = 5,
ADDRESS_SPACE_TENSOR = 6,
+ ADDRESS_SPACE_SHARED_CLUSTER = 7,
Artem-B wrote:
PTX docs say:
```
If no sub-qualifier is specified with the .shared state sp
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/128222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -0,0 +1,35 @@
+// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail()
+// to trigger compilation errors when the __glibcxx_assert(cond) macro
+// is used in a constexpr context.
+// Compilation fails when using code from the libstdc++ (such as std::array) on
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/136133
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -36,6 +36,28 @@ typedef __SIZE_TYPE__ size_t;
#include
+#ifdef __ARM_ACLE
+// arm_acle.h needs some stdint types, but -ffreestanding prevents us from
Artem-B wrote:
Shouldn't that be fixed in arm_acle.h itself so it includes the headers with
the types i
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/136133
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -0,0 +1,35 @@
+// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail()
+// to trigger compilation errors when the __glibcxx_assert(cond) macro
+// is used in a constexpr context.
+// Compilation fails when using code from the libstdc++ (such as std::array) on
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) {
case ADDRESS_SPACE_SHARED:
Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared;
break;
-case ADDRESS_SPACE_DSHARED:
- Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 :
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/135644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/135644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) {
case ADDRESS_SPACE_SHARED:
Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared;
break;
-case ADDRESS_SPACE_DSHARED:
- Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 :
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned
BuiltinID,
case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2:
return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E,
*this);
+ case NVPTX::BI__nvvm_abs_bf16
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned
BuiltinID,
case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2:
return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E,
*this);
+ case NVPTX::BI__nvvm_abs_bf16
@@ -411,6 +412,13 @@ static Instruction
*convertNvvmIntrinsicToLlvm(InstCombiner &IC,
}
return nullptr;
}
+ case SPC_Fabs: {
+if (!II->getType()->isDoubleTy())
+ return nullptr;
+auto *Fabs = Intrinsic::getOrInsertDeclaration(
+II->getModule(),
Artem-B wrote:
I wish PTX would be a bit more consistent about naming things. Documentation
calls it distributed shared memory (and it is distributed, and is shared), but
the PTX instructions, compiler builtins and intrinsics use shared::cluster (as
opposed to regular shared AKA shared::cta).
@@ -703,6 +703,41 @@ let hasSideEffects = false in {
defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>,
hasSM<100>]>;
defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite",
[hasPTX<86>, hasSM<100>]>;
defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
LGTM in general, with an intrinsic naming nit.
https://github.com/llvm/llvm-project/pull/134345
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listin
@@ -596,6 +605,28 @@ def __nvvm_e4m3x2_to_f16x2_rn_relu :
NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(sh
def __nvvm_e5m2x2_to_f16x2_rn : NVPTXBuiltinSMAndPTX<"_Vector<2,
__fp16>(short)", SM_89, PTX81>;
def __nvvm_e5m2x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2,
__fp16>
https://github.com/Artem-B approved this pull request.
LGTM.
https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1233,6 +1233,10 @@ def offload_compression_level_EQ : Joined<["--"],
"offload-compression-level=">,
Flags<[HelpHidden]>,
HelpText<"Compression level for offload device binaries (HIP only)">;
+def offload_jobs_EQ : Joined<["--"], "offload-jobs=">,
+ HelpText<"Set the
@@ -1,26 +1,53 @@
-; Verify that when passing in command-line options to NVVMReflect, that
reflect calls are replaced with
-; the appropriate command line values.
+; Test the NVVM reflect pass functionality: verifying that reflect calls are
replaced with
+; appropriate values b
@@ -1,26 +1,53 @@
-; Verify that when passing in command-line options to NVVMReflect, that
reflect calls are replaced with
-; the appropriate command line values.
+; Test the NVVM reflect pass functionality: verifying that reflect calls are
replaced with
+; appropriate values b
https://github.com/Artem-B commented:
Almost there. Few more test nits.
https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1,26 +1,53 @@
-; Verify that when passing in command-line options to NVVMReflect, that
reflect calls are replaced with
-; the appropriate command line values.
+; Test the NVVM reflect pass functionality: verifying that reflect calls are
replaced with
+; appropriate values b
@@ -1233,6 +1233,10 @@ def offload_compression_level_EQ : Joined<["--"],
"offload-compression-level=">,
Flags<[HelpHidden]>,
HelpText<"Compression level for offload device binaries (HIP only)">;
+def offload_jobs_EQ : Joined<["--"], "offload-jobs=">,
+ HelpText<"Set the
@@ -0,0 +1,26 @@
+; Verify that when passing in command-line options to NVVMReflect, that
reflect calls are replaced with
+; the appropriate command line values.
+
+declare i32 @__nvvm_reflect(ptr)
+@ftz = private unnamed_addr addrspace(1) constant [11 x i8] c"__CUDA_FTZ\00"
+@ar
@@ -0,0 +1,26 @@
+; Verify that when passing in command-line options to NVVMReflect, that
reflect calls are replaced with
Artem-B wrote:
The test is functionally fine, but it also makes me stop and think "what
exactly are we doing here and why?".
Two points:
-
@@ -39,186 +39,201 @@
#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"
-#include
#define NVVM_REFLECT_FUNCTION "__nvvm_reflect"
#define NVVM_REFLECT_OCL_FUNCTION "__nvvm_reflect_ocl"
+// Argument
https://github.com/Artem-B approved this pull request.
LGTM for the code. Tests could use a bit more polishing.
https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mail
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/134416
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
@AustinSchuh One thing I've missed during review is that the test
clang/test/CodeGen/nvptx-surface.cu should probably go into
clang/test/CodeGenCUDA
This would also obviate the need for #134459.
Can you send the patch to move the test to the right location?
https://github.com
@@ -0,0 +1,24 @@
+// REQUIRES: amdgpu-registered-target
Artem-B wrote:
I've just checked using experimental target `csky-unknown-elf ` that's not
enabled by default, and clang indeed errors out if we attempt to generate code,
but works OK with `-fsyntax-only`.
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
LGTM. Thank you!
https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -2,6 +2,170 @@
// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o -
%s -emit-llvm | FileCheck %s
#include "Inputs/cuda.h"
+struct char1 {
Artem-B wrote:
See above. propagate-attributes.cu just needs to apply `extern "C"` to the
fu
@@ -2,6 +2,170 @@
// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o -
%s -emit-llvm | FileCheck %s
#include "Inputs/cuda.h"
+struct char1 {
Artem-B wrote:
Those are actually *useful* failures and expose real issues in those tests.
-
@@ -2,6 +2,170 @@
// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o -
%s -emit-llvm | FileCheck %s
#include "Inputs/cuda.h"
+struct char1 {
Artem-B wrote:
These type declarations should go into Inputs/cuda.h
https://github.com/llvm/
https://github.com/Artem-B requested changes to this pull request.
Hold on a sec.
https://github.com/llvm/llvm-project/pull/134758
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
Fixes test break introduced by #134459
https://github.com/llvm/llvm-project/pull/134706
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/134706
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
@AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some
thoughts on this.
In general, making it per-function attribute makes sense on LLVM level.
We will also need to reconcile it with the
https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b0
https://github.com/Artem-B unassigned
https://github.com/llvm/llvm-project/pull/134244
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B closed
https://github.com/llvm/llvm-project/pull/132883
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
Nice.
Now we're missing the two last steps;
- that ptxas accepts the inline asm instructions we generate
- that those instructions actually do what they are intended to do.
Can you manually verify that the test file actually compiles to a G
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/134111
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B edited
https://github.com/llvm/llvm-project/pull/133522
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/134459
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1021,6 +1036,174 @@ __device__ void nvvm_cvt_sm89() {
__nvvm_e5m2x2_to_f16x2_rn(0x4c4c);
// CHECK_PTX81_SM89: call <2 x half> @llvm.nvvm.e5m2x2.to.f16x2.rn.relu(i16
19532)
__nvvm_e5m2x2_to_f16x2_rn_relu(0x4c4c);
+
+ // CHECK_PTX81_SM89: call i32 @llvm.nvvm.f2tf32.rn
@@ -596,6 +605,28 @@ def __nvvm_e4m3x2_to_f16x2_rn_relu :
NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(sh
def __nvvm_e5m2x2_to_f16x2_rn : NVPTXBuiltinSMAndPTX<"_Vector<2,
__fp16>(short)", SM_89, PTX81>;
def __nvvm_e5m2x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2,
__fp16>
@@ -580,6 +580,15 @@ def __nvvm_f2bf16_rz :
NVPTXBuiltinSMAndPTX<"__bf16(float)", SM_80, PTX70>;
def __nvvm_f2bf16_rz_relu : NVPTXBuiltinSMAndPTX<"__bf16(float)", SM_80,
PTX70>;
def __nvvm_f2tf32_rna : NVPTXBuiltinSMAndPTX<"int32_t(float)", SM_80, PTX70>;
+def __nvvm_f2tf32_
@@ -703,6 +703,53 @@ let hasSideEffects = false in {
defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>,
hasSM<100>]>;
defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite",
[hasPTX<86>, hasSM<100>]>;
defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF
@@ -0,0 +1,3329 @@
+// RUN: %clang_cc1 -triple nvptx-unknown-unknown -fcuda-is-device -O3 -o - %s
-emit-llvm | FileCheck %s
+// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o -
%s -emit-llvm | FileCheck %s
+#include "../Headers/Inputs/include/cuda.h"
---
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/133590
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -6397,7 +6397,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction
&JA,
Args.AddLastArg(CmdArgs, options::OPT_fconvergent_functions,
options::OPT_fno_convergent_functions);
- addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArg
https://github.com/Artem-B approved this pull request.
LGTM with a comment nit.
https://github.com/llvm/llvm-project/pull/133522
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
The crash is blocking our compiler updates. If nothing depends on this change
yet, it would be great to revert the patch and re-land it once it's fixed.
https://github.com/llvm/llvm-project/pull/94268
___
cfe-commits mailing list
cfe-co
Artem-B wrote:
@jhuber6 @jdoerfert I propose reverting the change, unless it can be quickly
fixed forward so it does not affect CUDA/NVPTX.
https://github.com/llvm/llvm-project/pull/94268
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https:/
Artem-B wrote:
This is breaking CUDA/NVPTX.
Enabling PGO results in compiler generating PGO-related data which references
itself, and NVPTX can't compile those.
E.g. we see data like this which includes a reference to itself:
```
@__profd__ZN12cuda_helpers13memcmp_kernelEPjS0_mPb = protected g
https://github.com/Artem-B approved this pull request.
https://github.com/llvm/llvm-project/pull/133296
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Artem-B wrote:
> > LGTM in principle, but it could use some tests. The change is surprisingly
> > nicely compact. Thank you for filling in one of the long-standing gaps in
> > clang's cuda support story.
>
> I might need some hints on where to start. How would you go about testing
> this, or
Artem-B wrote:
> > @AustinSchuh would you like me to merge the change for you, once the checks
> > are done?
>
> That would be wonderful. I don't know how to merge it (happy to learn, but I
> suspect I won't do it more than a couple of times)
https://llvm.org/docs/DeveloperPolicy.html#obtaini
https://github.com/Artem-B closed
https://github.com/llvm/llvm-project/pull/132881
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
1 - 100 of 1269 matches
Mail list logo