[clang] [clang] Move opt level in clang toolchain to clang::ConstructJob start (PR #141036)

2025-05-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/141036 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add pm_event intrinsics (PR #141278)

2025-05-27 Thread Artem Belevich via cfe-commits
@@ -177,6 +177,7 @@ let Attributes = [NoReturn] in { } let Attributes = [NoThrow] in { def __nvvm_nanosleep : NVPTXBuiltinSMAndPTX<"void(unsigned int)", SM_70, PTX63>; + def __nvvm_pm_event_mask : NVPTXBuiltin<"void(unsigned short)">; Artem-B wrote: The ar

[clang] [llvm] [NVPTX] Add pm_event intrinsics (PR #141278)

2025-05-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. Builtin signature needs a fix, but LGTM otherwise. https://github.com/llvm/llvm-project/pull/141278 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/li

[clang] [llvm] [NVPTX] Add pm_event intrinsics (PR #141278)

2025-05-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/141278 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (PR #141143)

2025-05-22 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/141143 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-19 Thread Artem Belevich via cfe-commits
@@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" || Name == "swap.lo.hi.b64") Expand = true; + else if (Name == "barrier0" || Name == "b

[clang] [NVPTX] Support the OpenCL generic addrspace feature by default (PR #137940)

2025-05-19 Thread Artem Belevich via cfe-commits
@@ -170,6 +170,8 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo { Opts["cl_khr_global_int32_extended_atomics"] = true; Opts["cl_khr_local_int32_base_atomics"] = true; Opts["cl_khr_local_int32_extended_atomics"] = true; + +Opts["__opencl_c_

[clang] [llvm] [NVPTX] Add errors for incorrect CUDA addrpaces (PR #138706)

2025-05-19 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/138706 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add errors for incorrect CUDA addrpaces (PR #138706)

2025-05-19 Thread Artem Belevich via cfe-commits
@@ -2927,6 +2928,20 @@ void Verifier::visitFunction(const Function &F) { "Calling convention does not support varargs or " "perfect forwarding!", &F); +if (F.getCallingConv() == CallingConv::PTX_Kernel && +TT.getOS() == Triple::CUDA) {

[clang] [CUDA][HIP] add option -gpuinc (PR #140106)

2025-05-15 Thread Artem Belevich via cfe-commits
@@ -5734,6 +5734,9 @@ def nobuiltininc : Flag<["-"], "nobuiltininc">, def nogpuinc : Flag<["-"], "nogpuinc">, Group, HelpText<"Do not add include paths for CUDA/HIP and" " do not include the default CUDA/HIP wrapper headers">; +def gpuinc : Flag<["-"], "gpuinc">, Group, +

[clang] [CUDA][HIP] add option -gpuinc (PR #140106)

2025-05-15 Thread Artem Belevich via cfe-commits
@@ -5734,6 +5734,9 @@ def nobuiltininc : Flag<["-"], "nobuiltininc">, def nogpuinc : Flag<["-"], "nogpuinc">, Group, HelpText<"Do not add include paths for CUDA/HIP and" " do not include the default CUDA/HIP wrapper headers">; +def gpuinc : Flag<["-"], "gpuinc">, Group, +

[clang] [CUDA][HIP] add option -gpuinc (PR #140106)

2025-05-15 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/140106 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] add option -gpuinc (PR #140106)

2025-05-15 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B commented: Being able to override a flag is a good thing to have, IMO. There are builds where the owner of the leaf targets do not have much control over which options are set by the "default" compilation, so they need to rely on being able to override preceding opti

[clang] [llvm] [NVPTX] Add errors for incorrect CUDA addrpaces (PR #138706)

2025-05-13 Thread Artem Belevich via cfe-commits
@@ -1399,19 +1399,27 @@ void NVPTXAsmPrinter::emitFunctionParamList(const Function *F, raw_ostream &O) { if (PTy) { O << "\t.param .u" << PTySizeInBits << " .ptr"; +bool IsCUDA = static_cast(TM).getDrvInterface() == + NVPTX::CUDA;

[clang] [CUDA] Remove obsolete GPU-side __constexpr_* wrappers. (PR #139164)

2025-05-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/139164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Remove obsolete GPU-side __constexpr_* wrappers. (PR #139164)

2025-05-12 Thread Artem Belevich via cfe-commits
Artem-B wrote: No wrappers -- no problems. :-) https://github.com/llvm/llvm-project/pull/139164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Remove obsolete GPU-side __constexpr_* wrappers. (PR #139164)

2025-05-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/139164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] fix wrapper cmath header to match #136101 (PR #139164)

2025-05-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/139164 >From a1d60feed11174b9d2106b57ee15ff6d9bc56fa4 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Thu, 8 May 2025 14:43:47 -0700 Subject: [PATCH] [CUDA] remove obsolete GPU-side __constexpr* wrappers libc++ no

[clang] [CUDA] fix wrapper cmath header to match #136101 (PR #139164)

2025-05-12 Thread Artem Belevich via cfe-commits
Artem-B wrote: > Right now this checks for `libc++` less than 14. Is that still relevant > following that change? That's a very good point. Looks like those `__constexpr_fmin/fmax` are gone now and we do not heed them any more. https://github.com/llvm/llvm-project/pull/139164

[clang] [CUDA] fix wrapper cmath header to match #136101 (PR #139164)

2025-05-12 Thread Artem Belevich via cfe-commits
Artem-B wrote: @jhuber6 @ldionne One concern I have for this change is that it will break folks who will use older libc++ with the new Clang + wrapper headers. Is older libc++ expected to work with non-matching clang version? If the expectation is that libc++ and clang are from the same versio

[clang] [llvm] [NVPTX] Add intrinsics and clang builtins for conversions of f4x2 type (PR #139244)

2025-05-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/139244 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] change default offload archs (PR #139281)

2025-05-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: @cgmb > I would suggest that we should either (a) change the default GPU target to > native and make the failure to detect the user’s GPU into a hard compiler > error, or (b) change the default GPU target to SPIR-V so that it works on > every machine. The thing is that the se

[clang] [HIP] change default offload archs (PR #139281)

2025-05-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: @jhuber6 do you think can we use `native` instead? I think it would be a somewhat better option here. If we have to choose a GPU variant by default, we may as well choose the actual GPU, rather than a conditional choice between generic SPIR-V or an old GPU, which has the disadva

[clang] [CUDA][HIP] Fix host/device attribute of builtin (PR #138162)

2025-05-07 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/138162 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] clang/OpenCL: Add baseline test showing broken codegen (PR #138862)

2025-05-07 Thread Artem Belevich via cfe-commits
@@ -109,3 +109,48 @@ void func2(void) { void func3(void) { float a[16][1] = {{0.}}; } + +// CL12-LABEL: define dso_local void @wrong_store_type_private_pointer_alloca( +// CL12-SAME: ) #[[ATTR0]] { +// CL12-NEXT: [[ENTRY:.*:]] +// CL12-NEXT:[[PLONG:%.*]] = alloca i64, al

[clang] clang/OpenCL: Add baseline test showing broken codegen (PR #138862)

2025-05-07 Thread Artem Belevich via cfe-commits
@@ -109,3 +109,48 @@ void func2(void) { void func3(void) { float a[16][1] = {{0.}}; } + +// CL12-LABEL: define dso_local void @wrong_store_type_private_pointer_alloca( +// CL12-SAME: ) #[[ATTR0]] { +// CL12-NEXT: [[ENTRY:.*:]] +// CL12-NEXT:[[PLONG:%.*]] = alloca i64, al

[clang] [clang][Sema] Don't warn for implicit uses of builtins in system headers (PR #138205)

2025-05-02 Thread Artem Belevich via cfe-commits
@@ -2376,9 +2376,14 @@ NamedDecl *Sema::LazilyCreateBuiltin(IdentifierInfo *II, unsigned ID, return nullptr; } + // Warn for implicit uses of header dependent libraries, + // except in system headers. if (!ForRedeclaration && (Context.BuiltinInfo.isPredefine

[clang] [clang][Sema] Don't warn for implicit uses of builtins in system headers (PR #138205)

2025-05-02 Thread Artem Belevich via cfe-commits
Artem-B wrote: OK. This makes sense. > sorry this change is so drawn out :) What matters is that you're making progress, and I appreciate your work on getting this issue sorted out the right way. https://github.com/llvm/llvm-project/pull/138205 _

[clang] [clang][Sema] Don't warn for implicit uses of builtins in system headers (PR #138205)

2025-05-02 Thread Artem Belevich via cfe-commits
Artem-B wrote: Something does not add up here. AFAICT, using builtins w/o explicitly declaring them is something that's done all the time. https://godbolt.org/z/ha47W53dh In that sense, we should not be needing to filter out the diagnostics coming from the system headers only. There should not

[clang] [CUDA][HIP] Fix implicit attribute of builtin (PR #138162)

2025-05-01 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,23 @@ +// expected-no-diagnostics + +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -fsyntax-only -verify -xhip %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fsyntax-only -fcuda-is-device -verify -xhip %s + +#include "Inputs/cuda

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-30 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B commented: LGTM in principle. Now the question is -- how do we test it? There are multiple libstdc++ library versions in the wild and we must not break any of them. We do have some testing on CUDA test bots (which I've just discovered to be silently broken for a whil

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-30 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,35 @@ +// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail() +// to trigger compilation errors when the __glibcxx_assert(cond) macro +// is used in a constexpr context. +// Compilation fails when using code from the libstdc++ (such as std::array) on

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM w/ a nit. https://github.com/llvm/llvm-project/pull/136645 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
@@ -1100,3 +1101,49 @@ std::string SemaCUDA::getConfigureFuncName() const { // Legacy CUDA kernel configuration call return "cudaConfigureCall"; } + +// Record any local constexpr variables that are passed one way on the host +// and another on the device. +void SemaCUDA::r

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
@@ -1100,3 +1101,49 @@ std::string SemaCUDA::getConfigureFuncName() const { // Legacy CUDA kernel configuration call return "cudaConfigureCall"; } + +// Record any local constexpr variables that are passed one way on the host +// and another on the device. +void SemaCUDA::r

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/136645 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-21 Thread Artem Belevich via cfe-commits
@@ -25,6 +25,7 @@ enum AddressSpace : unsigned { ADDRESS_SPACE_CONST = 4, ADDRESS_SPACE_LOCAL = 5, ADDRESS_SPACE_TENSOR = 6, + ADDRESS_SPACE_SHARED_CLUSTER = 7, Artem-B wrote: PTX docs say: ``` If no sub-qualifier is specified with the .shared state sp

[clang] [clang][ARM][AArch64] Define intrinsics guarded by __has_builtin on all platforms (PR #128222)

2025-04-21 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/128222 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-21 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,35 @@ +// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail() +// to trigger compilation errors when the __glibcxx_assert(cond) macro +// is used in a constexpr context. +// Compilation fails when using code from the libstdc++ (such as std::array) on

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-18 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/136133 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][ARM][AArch64] Define intrinsics guarded by __has_builtin on all platforms (PR #128222)

2025-04-17 Thread Artem Belevich via cfe-commits
@@ -36,6 +36,28 @@ typedef __SIZE_TYPE__ size_t; #include +#ifdef __ARM_ACLE +// arm_acle.h needs some stdint types, but -ffreestanding prevents us from Artem-B wrote: Shouldn't that be fixed in arm_acle.h itself so it includes the headers with the types i

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-17 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/136133 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-17 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,35 @@ +// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail() +// to trigger compilation errors when the __glibcxx_assert(cond) macro +// is used in a constexpr context. +// Compilation fails when using code from the libstdc++ (such as std::array) on

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-16 Thread Artem Belevich via cfe-commits
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 :

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-16 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/135644 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-16 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/135644 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-15 Thread Artem Belevich via cfe-commits
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 :

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Artem Belevich via cfe-commits
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2: return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); + case NVPTX::BI__nvvm_abs_bf16

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Artem Belevich via cfe-commits
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2: return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); + case NVPTX::BI__nvvm_abs_bf16

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Artem Belevich via cfe-commits
@@ -411,6 +412,13 @@ static Instruction *convertNvvmIntrinsicToLlvm(InstCombiner &IC, } return nullptr; } + case SPC_Fabs: { +if (!II->getType()->isDoubleTy()) + return nullptr; +auto *Fabs = Intrinsic::getOrInsertDeclaration( +II->getModule(),

[clang] [llvm] [mlir] [NVPTX] Add support for Distributed Shared Memory address space. (PR #135444)

2025-04-11 Thread Artem Belevich via cfe-commits
Artem-B wrote: I wish PTX would be a bit more consistent about naming things. Documentation calls it distributed shared memory (and it is distributed, and is shared), but the PTX instructions, compiler builtins and intrinsics use shared::cluster (as opposed to regular shared AKA shared::cta).

[clang] [llvm] [clang][NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-11 Thread Artem Belevich via cfe-commits
@@ -703,6 +703,41 @@ let hasSideEffects = false in { defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF

[clang] [llvm] [clang][NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-11 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-11 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM in general, with an intrinsic naming nit. https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listin

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -596,6 +605,28 @@ def __nvvm_e4m3x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(sh def __nvvm_e5m2x2_to_f16x2_rn : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(short)", SM_89, PTX81>; def __nvvm_e5m2x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] add option --offload-jobs=N (PR #135229)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1233,6 +1233,10 @@ def offload_compression_level_EQ : Joined<["--"], "offload-compression-level=">, Flags<[HelpHidden]>, HelpText<"Compression level for offload device binaries (HIP only)">; +def offload_jobs_EQ : Joined<["--"], "offload-jobs=">, + HelpText<"Set the

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1,26 +1,53 @@ -; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with -; the appropriate command line values. +; Test the NVVM reflect pass functionality: verifying that reflect calls are replaced with +; appropriate values b

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1,26 +1,53 @@ -; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with -; the appropriate command line values. +; Test the NVVM reflect pass functionality: verifying that reflect calls are replaced with +; appropriate values b

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B commented: Almost there. Few more test nits. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1,26 +1,53 @@ -; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with -; the appropriate command line values. +; Test the NVVM reflect pass functionality: verifying that reflect calls are replaced with +; appropriate values b

[clang] [Clang] add option --offload-jobs=N (PR #135229)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1233,6 +1233,10 @@ def offload_compression_level_EQ : Joined<["--"], "offload-compression-level=">, Flags<[HelpHidden]>, HelpText<"Compression level for offload device binaries (HIP only)">; +def offload_jobs_EQ : Joined<["--"], "offload-jobs=">, + HelpText<"Set the

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,26 @@ +; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with +; the appropriate command line values. + +declare i32 @__nvvm_reflect(ptr) +@ftz = private unnamed_addr addrspace(1) constant [11 x i8] c"__CUDA_FTZ\00" +@ar

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,26 @@ +; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with Artem-B wrote: The test is functionally fine, but it also makes me stop and think "what exactly are we doing here and why?". Two points: -

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -39,186 +39,201 @@ #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/Local.h" -#include #define NVVM_REFLECT_FUNCTION "__nvvm_reflect" #define NVVM_REFLECT_OCL_FUNCTION "__nvvm_reflect_ocl" +// Argument

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM for the code. Tests could use a bit more polishing. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mail

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: @AustinSchuh One thing I've missed during review is that the test clang/test/CodeGen/nvptx-surface.cu should probably go into clang/test/CodeGenCUDA This would also obviate the need for #134459. Can you send the patch to move the test to the right location? https://github.com

[clang] [Clang][AMDGPU] Accept builtins in lambda declarations (PR #135027)

2025-04-09 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,24 @@ +// REQUIRES: amdgpu-registered-target Artem-B wrote: I've just checked using experimental target `csky-unknown-elf ` that's not enabled by default, and clang indeed errors out if we attempt to generate code, but works OK with `-fsyntax-only`.

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/134758 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. Thank you! https://github.com/llvm/llvm-project/pull/134758 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
@@ -2,6 +2,170 @@ // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s #include "Inputs/cuda.h" +struct char1 { Artem-B wrote: See above. propagate-attributes.cu just needs to apply `extern "C"` to the fu

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
@@ -2,6 +2,170 @@ // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s #include "Inputs/cuda.h" +struct char1 { Artem-B wrote: Those are actually *useful* failures and expose real issues in those tests. -

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
@@ -2,6 +2,170 @@ // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s #include "Inputs/cuda.h" +struct char1 { Artem-B wrote: These type declarations should go into Inputs/cuda.h https://github.com/llvm/

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B requested changes to this pull request. Hold on a sec. https://github.com/llvm/llvm-project/pull/134758 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Move CodeGen cuda.h to Inputs from include (PR #134706)

2025-04-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: Fixes test break introduced by #134459 https://github.com/llvm/llvm-project/pull/134706 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Move CodeGen cuda.h to Inputs from include (PR #134706)

2025-04-07 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/134706 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

2025-04-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: @AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some thoughts on this. In general, making it per-function attribute makes sense on LLVM level. We will also need to reconcile it with the https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b0

[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

2025-04-07 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B unassigned https://github.com/llvm/llvm-project/pull/134244 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/132883 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. Nice. Now we're missing the two last steps; - that ptxas accepts the inline asm instructions we generate - that those instructions actually do what they are intended to do. Can you manually verify that the test file actually compiles to a G

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/134111 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [PGO][Offload] Disable PGO on NVPTX (PR #133522)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/133522 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Clean up test dependency for CUDA surfaces (PR #134459)

2025-04-04 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/134459 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add intrinsics for cvt .f6x2 and .ue8m0x2 variants (PR #134345)

2025-04-04 Thread Artem Belevich via cfe-commits
@@ -1021,6 +1036,174 @@ __device__ void nvvm_cvt_sm89() { __nvvm_e5m2x2_to_f16x2_rn(0x4c4c); // CHECK_PTX81_SM89: call <2 x half> @llvm.nvvm.e5m2x2.to.f16x2.rn.relu(i16 19532) __nvvm_e5m2x2_to_f16x2_rn_relu(0x4c4c); + + // CHECK_PTX81_SM89: call i32 @llvm.nvvm.f2tf32.rn

[clang] [llvm] [NVPTX] Add intrinsics for cvt .f6x2 and .ue8m0x2 variants (PR #134345)

2025-04-04 Thread Artem Belevich via cfe-commits
@@ -596,6 +605,28 @@ def __nvvm_e4m3x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(sh def __nvvm_e5m2x2_to_f16x2_rn : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(short)", SM_89, PTX81>; def __nvvm_e5m2x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>

[clang] [llvm] [NVPTX] Add intrinsics for cvt .f6x2 and .ue8m0x2 variants (PR #134345)

2025-04-04 Thread Artem Belevich via cfe-commits
@@ -580,6 +580,15 @@ def __nvvm_f2bf16_rz : NVPTXBuiltinSMAndPTX<"__bf16(float)", SM_80, PTX70>; def __nvvm_f2bf16_rz_relu : NVPTXBuiltinSMAndPTX<"__bf16(float)", SM_80, PTX70>; def __nvvm_f2tf32_rna : NVPTXBuiltinSMAndPTX<"int32_t(float)", SM_80, PTX70>; +def __nvvm_f2tf32_

[clang] [llvm] [NVPTX] Add intrinsics for cvt .f6x2 and .ue8m0x2 variants (PR #134345)

2025-04-04 Thread Artem Belevich via cfe-commits
@@ -703,6 +703,53 @@ let hasSideEffects = false in { defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-04 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,3329 @@ +// RUN: %clang_cc1 -triple nvptx-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s +// RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s +#include "../Headers/Inputs/include/cuda.h" ---

[clang] Reland [HIP] fix host min/max in header (PR #133590)

2025-03-31 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/133590 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [PGO][Offload] Disable PGO on NVPTX (PR #133522)

2025-03-28 Thread Artem Belevich via cfe-commits
@@ -6397,7 +6397,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, Args.AddLastArg(CmdArgs, options::OPT_fconvergent_functions, options::OPT_fno_convergent_functions); - addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArg

[clang] [llvm] [PGO][Offload] Disable PGO on NVPTX (PR #133522)

2025-03-28 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM with a comment nit. https://github.com/llvm/llvm-project/pull/133522 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [llvm] [PGO][Offload] Allow PGO flags to be used on GPU targets (PR #94268)

2025-03-28 Thread Artem Belevich via cfe-commits
Artem-B wrote: The crash is blocking our compiler updates. If nothing depends on this change yet, it would be great to revert the patch and re-land it once it's fixed. https://github.com/llvm/llvm-project/pull/94268 ___ cfe-commits mailing list cfe-co

[clang] [compiler-rt] [llvm] [PGO][Offload] Allow PGO flags to be used on GPU targets (PR #94268)

2025-03-28 Thread Artem Belevich via cfe-commits
Artem-B wrote: @jhuber6 @jdoerfert I propose reverting the change, unless it can be quickly fixed forward so it does not affect CUDA/NVPTX. https://github.com/llvm/llvm-project/pull/94268 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https:/

[clang] [compiler-rt] [llvm] [PGO][Offload] Allow PGO flags to be used on GPU targets (PR #94268)

2025-03-28 Thread Artem Belevich via cfe-commits
Artem-B wrote: This is breaking CUDA/NVPTX. Enabling PGO results in compiler generating PGO-related data which references itself, and NVPTX can't compile those. E.g. we see data like this which includes a reference to itself: ``` @__profd__ZN12cuda_helpers13memcmp_kernelEPjS0_mPb = protected g

[clang] [Clang] Add 'Joseph Huber' as offloading driver maintainer (PR #133296)

2025-03-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/133296 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-26 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > LGTM in principle, but it could use some tests. The change is surprisingly > > nicely compact. Thank you for filling in one of the long-standing gaps in > > clang's cuda support story. > > I might need some hints on where to start. How would you go about testing > this, or

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-26 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > @AustinSchuh would you like me to merge the change for you, once the checks > > are done? > > That would be wonderful. I don't know how to merge it (happy to learn, but I > suspect I won't do it more than a couple of times) https://llvm.org/docs/DeveloperPolicy.html#obtaini

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-26 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/132881 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

  1   2   3   4   5   6   7   8   9   10   >