from:"Artem Belevich via cfe\-commits"

[clang] [llvm] [CUDA] Mark CUDA-12.4 as supported and introduce ptx 8.4. (PR #91516)

2024-05-08 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B closed 
https://github.com/llvm/llvm-project/pull/91516
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CUDA] Mark CUDA-12.4 as supported and introduce ptx 8.4. (PR #91516)

2024-05-08 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B created 
https://github.com/llvm/llvm-project/pull/91516

None

>From 6bb4800a5ed7c5f2ffeaded874d72f7624539122 Mon Sep 17 00:00:00 2001
From: Artem Belevich 
Date: Wed, 8 May 2024 11:07:34 -0700
Subject: [PATCH] [CUDA] Mark CUDA-12.4 as supported and introduce ptx 8.4.

---
 clang/docs/ReleaseNotes.rst | 1 +
 clang/include/clang/Basic/BuiltinsNVPTX.def | 5 -
 clang/include/clang/Basic/Cuda.h| 3 ++-
 clang/lib/Basic/Cuda.cpp| 5 +++--
 clang/lib/Driver/ToolChains/Cuda.cpp| 3 +++
 llvm/lib/Target/NVPTX/NVPTX.td  | 2 +-
 6 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 0f9728c00e648..a3c8e4141ca54 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -798,6 +798,7 @@ CUDA/HIP Language Changes
 
 CUDA Support
 
+- Clang now supports CUDA SDK up to 12.4
 
 AIX Support
 ^^^
diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def 
b/clang/include/clang/Basic/BuiltinsNVPTX.def
index 8d3c5e69d55cf..9e243d740ed7a 100644
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ b/clang/include/clang/Basic/BuiltinsNVPTX.def
@@ -61,7 +61,9 @@
 #pragma push_macro("PTX81")
 #pragma push_macro("PTX82")
 #pragma push_macro("PTX83")
-#define PTX83 "ptx83"
+#pragma push_macro("PTX84")
+#define PTX84 "ptx84"
+#define PTX83 "ptx83|" PTX84
 #define PTX82 "ptx82|" PTX83
 #define PTX81 "ptx81|" PTX82
 #define PTX80 "ptx80|" PTX81
@@ -1091,3 +1093,4 @@ TARGET_BUILTIN(__nvvm_getctarank_shared_cluster, "iv*3", 
"", AND(SM_90,PTX78))
 #pragma pop_macro("PTX81")
 #pragma pop_macro("PTX82")
 #pragma pop_macro("PTX83")
+#pragma pop_macro("PTX84")
diff --git a/clang/include/clang/Basic/Cuda.h b/clang/include/clang/Basic/Cuda.h
index ba0e4465a0f5a..2d67c4181d129 100644
--- a/clang/include/clang/Basic/Cuda.h
+++ b/clang/include/clang/Basic/Cuda.h
@@ -41,9 +41,10 @@ enum class CudaVersion {
   CUDA_121,
   CUDA_122,
   CUDA_123,
+  CUDA_124,
   FULLY_SUPPORTED = CUDA_123,
   PARTIALLY_SUPPORTED =
-  CUDA_123, // Partially supported. Proceed with a warning.
+  CUDA_124, // Partially supported. Proceed with a warning.
   NEW = 1,  // Too new. Issue a warning, but allow using it.
 };
 const char *CudaVersionToString(CudaVersion V);
diff --git a/clang/lib/Basic/Cuda.cpp b/clang/lib/Basic/Cuda.cpp
index 113483db5729b..e8ce15eb0decb 100644
--- a/clang/lib/Basic/Cuda.cpp
+++ b/clang/lib/Basic/Cuda.cpp
@@ -14,7 +14,7 @@ struct CudaVersionMapEntry {
 };
 #define CUDA_ENTRY(major, minor)   
\
   {
\
-#major "." #minor, CudaVersion::CUDA_##major##minor,   \
+#major "." #minor, CudaVersion::CUDA_##major##minor,   
\
 llvm::VersionTuple(major, minor)   
\
   }
 
@@ -41,6 +41,7 @@ static const CudaVersionMapEntry CudaNameVersionMap[] = {
 CUDA_ENTRY(12, 1),
 CUDA_ENTRY(12, 2),
 CUDA_ENTRY(12, 3),
+CUDA_ENTRY(12, 4),
 {"", CudaVersion::NEW, 
llvm::VersionTuple(std::numeric_limits::max())},
 {"unknown", CudaVersion::UNKNOWN, {}} // End of list tombstone.
 };
@@ -241,7 +242,7 @@ CudaVersion MaxVersionForCudaArch(CudaArch A) {
   }
 }
 
-bool CudaFeatureEnabled(llvm::VersionTuple  Version, CudaFeature Feature) {
+bool CudaFeatureEnabled(llvm::VersionTuple Version, CudaFeature Feature) {
   return CudaFeatureEnabled(ToCudaVersion(Version), Feature);
 }
 
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 6634e6d818b33..d5f93c9c830fa 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -82,6 +82,8 @@ CudaVersion getCudaVersion(uint32_t raw_version) {
 return CudaVersion::CUDA_122;
   if (raw_version < 12040)
 return CudaVersion::CUDA_123;
+  if (raw_version < 12050)
+return CudaVersion::CUDA_124;
   return CudaVersion::NEW;
 }
 
@@ -688,6 +690,7 @@ void NVPTX::getNVPTXTargetFeatures(const Driver , const 
llvm::Triple ,
   case CudaVersion::CUDA_##CUDA_VER:   
\
 PtxFeature = "+ptx" #PTX_VER;  
\
 break;
+CASE_CUDA_VERSION(124, 84);
 CASE_CUDA_VERSION(123, 83);
 CASE_CUDA_VERSION(122, 82);
 CASE_CUDA_VERSION(121, 81);
diff --git a/llvm/lib/Target/NVPTX/NVPTX.td b/llvm/lib/Target/NVPTX/NVPTX.td
index 6aa98543e5e22..05457c71cd392 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.td
+++ b/llvm/lib/Target/NVPTX/NVPTX.td
@@ -41,7 +41,7 @@ foreach sm = [20, 21, 30, 32, 35, 37, 50, 52, 53,
 def SM90a: FeatureSM<"90a", 901>;
 
 foreach version = [32, 40, 41, 42, 43, 50, 60, 61, 62, 63, 64, 65,
-   70, 71, 72, 73, 74, 75, 76, 77, 78, 80, 81, 82, 83] in
+   70, 71, 72, 73, 74, 75, 76, 77,

[clang] [CUDA][HIP] Fix record layout on Windows (PR #87651)

2024-04-17 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/87651
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Rename SM_32 to SM_32_ to work around AIX headers (PR #88779)

2024-04-15 Thread Artem Belevich via cfe-commits



@@ -86,7 +88,7 @@ static const CudaArchToStringMap arch_names[] = {
 // clang-format off
 {CudaArch::UNUSED, "", ""},
 SM2(20, "compute_20"), SM2(21, "compute_20"), // Fermi
-SM(30), SM(32), SM(35), SM(37),  // Kepler
+SM(30), SM3(32, "compute_32"), SM(35), SM(37),  // Kepler

Artem-B wrote:

Nit. We don't really need SM3 here. For one-off we could Just use 
`{CudaArch::SM_32_, "sm_32" , "compute_32"}}`

https://github.com/llvm/llvm-project/pull/88779
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Rename SM_32 to SM_32_ to work around AIX headers (PR #88779)

2024-04-15 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/88779
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Fix name conflict with `sys/mac.h` on AIX (PR #88644)

2024-04-15 Thread Artem Belevich via cfe-commits



@@ -50,6 +50,10 @@ const char *CudaVersionToString(CudaVersion V);
 // Input is "Major.Minor"
 CudaVersion CudaStringToVersion(const llvm::Twine );
 
+// We have a name conflict with sys/mac.h on AIX
+#ifdef SM_32
+#undef SM_32
+#endif

Artem-B wrote:

SGTM. Thank you for taking care of this issue.

On a side note, do we know if there's a way to file a bug for AIX? They should 
not be setting macros with names that could conceivably be defined by a user. 
In theory. I think normally they should be double-underscore-prefixed.

https://github.com/llvm/llvm-project/pull/88644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Fix name conflict with `sys/mac.h` on AIX (PR #88644)

2024-04-15 Thread Artem Belevich via cfe-commits



@@ -50,6 +50,10 @@ const char *CudaVersionToString(CudaVersion V);
 // Input is "Major.Minor"
 CudaVersion CudaStringToVersion(const llvm::Twine );
 
+// We have a name conflict with sys/mac.h on AIX
+#ifdef SM_32
+#undef SM_32
+#endif

Artem-B wrote:

Deprecating and removing support for old GPUs needs to be done, but it's not 
going to happen here and now, so we still need a better short-term fix. 

Undefining a macro set by external headers is not it.


https://github.com/llvm/llvm-project/pull/88644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Fix name conflict with `sys/mac.h` on AIX (PR #88644)

2024-04-15 Thread Artem Belevich via cfe-commits



@@ -50,6 +50,10 @@ const char *CudaVersionToString(CudaVersion V);
 // Input is "Major.Minor"
 CudaVersion CudaStringToVersion(const llvm::Twine );
 
+// We have a name conflict with sys/mac.h on AIX
+#ifdef SM_32
+#undef SM_32
+#endif

Artem-B wrote:

> We could always just make all of these lower case instead?

That would be odd. LLVM style wants them to be CamelCased.
This enum is rarely used, so renaming them to something more CUDA/NVPTXspecific 
would be best, IMO.
E.g `NVSM_32` 

Or we could rename only `SM_32`. The constant is rather inconsequential and is 
used in a few places only. Renaming it to `_SM_32` with a comment that AIX 
headers have `SM_32` defined.





https://github.com/llvm/llvm-project/pull/88644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Fix name conflict with `sys/mac.h` on AIX (PR #88644)

2024-04-15 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/88644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Fix name conflict with `sys/mac.h` on AIX (PR #88644)

2024-04-15 Thread Artem Belevich via cfe-commits



@@ -50,6 +50,10 @@ const char *CudaVersionToString(CudaVersion V);
 // Input is "Major.Minor"
 CudaVersion CudaStringToVersion(const llvm::Twine );
 
+// We have a name conflict with sys/mac.h on AIX
+#ifdef SM_32
+#undef SM_32
+#endif

Artem-B wrote:

Ugh. What could possibly go wrong, if someone who needed the original 
definition of SM_32 ends up transitively including this header and losing the 
macro definition?

A beeter way to handle it as a workaround would be to push the macro 
definition, undef it, and then pop it back at the end of the header.

Even better would be to add prefixes to the macros and/or the enum here to 
disambiguate them

https://github.com/llvm/llvm-project/pull/88644
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Introduce `SemaCUDA` (PR #88559)

2024-04-12 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

LGTM. The changes appear to be mechanical in nature, so `check clang` tests 
should be sufficient to verify we've re-connected things correctly.

https://github.com/llvm/llvm-project/pull/88559
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Artem Belevich via cfe-commits



@@ -6867,8 +6867,14 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-nogpulib");
 
   if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
-CmdArgs.push_back(
-Args.MakeArgString(Twine("-fcf-protection=") + A->getValue()));
+// Do not pass this argument to the offloading device if the target does 
not
+// support it.
+// TODO: We need a better way to detect incompatible options for 
offloading.
+if (JA.getOffloadingDeviceKind() == Action::OFK_None ||
+(!TC.getTriple().isAMDGPU() && !TC.getTriple().isNVPTX() &&
+ !TC.getTriple().isSPIRV()))

Artem-B wrote:

+1. We have grown too many offloading cases all over the place over time. It 
was fine when there was only CUDA/NVPTX, was sort of OK when AMDGPU got added, 
now it gets to be a bit too much.

https://github.com/llvm/llvm-project/pull/88402
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B commented:

LGTM in principle.

https://github.com/llvm/llvm-project/pull/88402
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/88402
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Artem Belevich via cfe-commits



@@ -6867,8 +6867,14 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-nogpulib");
 
   if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
-CmdArgs.push_back(
-Args.MakeArgString(Twine("-fcf-protection=") + A->getValue()));
+// Do not pass this argument to the offloading device if the target does 
not
+// support it.
+// TODO: We need a better way to detect incompatible options for 
offloading.
+if (JA.getOffloadingDeviceKind() == Action::OFK_None ||
+(!TC.getTriple().isAMDGPU() && !TC.getTriple().isNVPTX() &&

Artem-B wrote:

Nit: I'd collapse negations into one:

```
!(TC.getTriple().isAMDGPU() || TC.getTriple().isNVPTX() || 
TC.getTriple().isSPIRV())
```

https://github.com/llvm/llvm-project/pull/88402
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits



@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module , 
ArrayRef> Bufs,
 ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module , GlobalVariable *BinDesc,
-StringRef Suffix) {
+Function *createUnregisterFunction(Module , GlobalVariable *BinDesc,
+   StringRef Suffix) {
   LLVMContext  = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
-".omp_offloading.descriptor_reg" + Suffix, );
+  auto *Func =
+  Function::Create(FuncTy, GlobalValue::InternalLinkage,
+   ".omp_offloading.descriptor_unreg" + Suffix, );
   Func->setSection(".text.startup");
 
-  // Get __tgt_register_lib function declaration.
-  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-  /*isVarArg*/ false);
-  FunctionCallee RegFuncC =
-  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+  // Get __tgt_unregister_lib function declaration.
+  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+/*isVarArg*/ false);
+  FunctionCallee UnRegFuncC =
+  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(RegFuncC, BinDesc);
+  Builder.CreateCall(UnRegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
-  // Add this function to constructors.
-  // Set priority to 1 so that __tgt_register_lib is executed AFTER
-  // __tgt_register_requires (we want to know what requirements have been
-  // asked for before we load a libomptarget plugin so that by the time the
-  // plugin is loaded it can report how many devices there are which can
-  // satisfy these requirements).
-  appendToGlobalCtors(M, Func, /*Priority*/ 1);
+  return Func;
 }
 
-void createUnregisterFunction(Module , GlobalVariable *BinDesc,
-  StringRef Suffix) {
+void createRegisterFunction(Module , GlobalVariable *BinDesc,
+StringRef Suffix) {
   LLVMContext  = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func =
-  Function::Create(FuncTy, GlobalValue::InternalLinkage,
-   ".omp_offloading.descriptor_unreg" + Suffix, );
+  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
+".omp_offloading.descriptor_reg" + Suffix, );
   Func->setSection(".text.startup");
 
-  // Get __tgt_unregister_lib function declaration.
-  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-/*isVarArg*/ false);
-  FunctionCallee UnRegFuncC =
-  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
+  // Get __tgt_register_lib function declaration.
+  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+  /*isVarArg*/ false);
+  FunctionCallee RegFuncC =
+  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+
+  auto *AtExitTy = FunctionType::get(
+  Type::getInt32Ty(C), PointerType::getUnqual(C), /*isVarArg=*/false);
+  FunctionCallee AtExit = M.getOrInsertFunction("atexit", AtExitTy);
+
+  Function *UnregFunc = createUnregisterFunction(M, BinDesc, Suffix);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(UnRegFuncC, BinDesc);
+
+  // Register the destructors with 'atexit', This is expected by the CUDA

Artem-B wrote:

Typo. `,` -> `.`

https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits



@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module , 
ArrayRef> Bufs,
 ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module , GlobalVariable *BinDesc,
-StringRef Suffix) {
+Function *createUnregisterFunction(Module , GlobalVariable *BinDesc,
+   StringRef Suffix) {
   LLVMContext  = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
-".omp_offloading.descriptor_reg" + Suffix, );
+  auto *Func =
+  Function::Create(FuncTy, GlobalValue::InternalLinkage,
+   ".omp_offloading.descriptor_unreg" + Suffix, );
   Func->setSection(".text.startup");
 
-  // Get __tgt_register_lib function declaration.
-  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-  /*isVarArg*/ false);
-  FunctionCallee RegFuncC =
-  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+  // Get __tgt_unregister_lib function declaration.
+  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+/*isVarArg*/ false);
+  FunctionCallee UnRegFuncC =
+  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(RegFuncC, BinDesc);
+  Builder.CreateCall(UnRegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
-  // Add this function to constructors.
-  // Set priority to 1 so that __tgt_register_lib is executed AFTER
-  // __tgt_register_requires (we want to know what requirements have been
-  // asked for before we load a libomptarget plugin so that by the time the
-  // plugin is loaded it can report how many devices there are which can
-  // satisfy these requirements).
-  appendToGlobalCtors(M, Func, /*Priority*/ 1);
+  return Func;
 }
 
-void createUnregisterFunction(Module , GlobalVariable *BinDesc,
-  StringRef Suffix) {
+void createRegisterFunction(Module , GlobalVariable *BinDesc,
+StringRef Suffix) {
   LLVMContext  = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func =
-  Function::Create(FuncTy, GlobalValue::InternalLinkage,
-   ".omp_offloading.descriptor_unreg" + Suffix, );
+  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
+".omp_offloading.descriptor_reg" + Suffix, );
   Func->setSection(".text.startup");
 
-  // Get __tgt_unregister_lib function declaration.
-  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-/*isVarArg*/ false);
-  FunctionCallee UnRegFuncC =
-  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
+  // Get __tgt_register_lib function declaration.
+  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+  /*isVarArg*/ false);
+  FunctionCallee RegFuncC =
+  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+
+  auto *AtExitTy = FunctionType::get(
+  Type::getInt32Ty(C), PointerType::getUnqual(C), /*isVarArg=*/false);
+  FunctionCallee AtExit = M.getOrInsertFunction("atexit", AtExitTy);
+
+  Function *UnregFunc = createUnregisterFunction(M, BinDesc, Suffix);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(UnRegFuncC, BinDesc);
+
+  // Register the destructors with 'atexit', This is expected by the CUDA

Artem-B wrote:

> This is expected by the CUDA runtime

I'd add a reference to clang/lib/CodeGen/CGCUDANV.cpp which provides some 
history why we switched to `atexit`.

https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-20 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

LGTM, sans the "NFC" part in the description.

https://github.com/llvm/llvm-project/pull/85976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-20 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/85976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-20 Thread Artem Belevich via cfe-commits



@@ -1160,9 +1152,8 @@ void CGNVCUDARuntime::createOffloadingEntries() {
 
 // Returns module constructor to be added.
 llvm::Function *CGNVCUDARuntime::finalizeModule() {
+  transformManagedVars();

Artem-B wrote:

This does not look like "NFC" as we now perform the transform for the host 
compilation, too.

I assume we do have existing tests covering generation of the variables.

https://github.com/llvm/llvm-project/pull/85976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0` (PR #82404)

2024-03-14 Thread Artem Belevich via cfe-commits


Artem-B wrote:

We happen have a back-end where we do not have conversion instructions between 
unsigned int and FP, so this patch complicates things. Would it make sense to 
enable this canonicalization only if the target wants it?


https://github.com/llvm/llvm-project/pull/82404
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-08 Thread Artem Belevich via cfe-commits



@@ -2863,3 +2863,18 @@ void tools::addOutlineAtomicsArgs(const Driver , const 
ToolChain ,
 CmdArgs.push_back("+outline-atomics");
   }
 }
+
+void tools::addOffloadCompressArgs(const llvm::opt::ArgList ,
+   llvm::opt::ArgStringList ) {
+  if (TCArgs.hasFlag(options::OPT_offload_compress,
+ options::OPT_no_offload_compress, false))
+CmdArgs.push_back("-compress");
+  if (TCArgs.hasArg(options::OPT_v))
+CmdArgs.push_back("-verbose");
+  if (auto *Arg =
+  TCArgs.getLastArg(options::OPT_offload_compression_level_EQ)) {
+std::string CompressionLevelArg =
+std::string("-compression-level=") + Arg->getValue();
+CmdArgs.push_back(TCArgs.MakeArgString(CompressionLevelArg));

Artem-B wrote:

This may be collapsed to just 
```
CmdArgs.push_back(TCArgs.MakeArgString("-compression-level=" + 
Arg->getValue()))`. 
```
Maybe with a `Twine` or `StringRef` wrapping the string literal.

https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-08 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-08 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

LGTM overall, with docs/comment nits.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits



@@ -503,18 +503,20 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   Exec, CmdArgs, Inputs, Output));
 }
 
-static bool shouldIncludePTX(const ArgList , const char *gpu_arch) {
-  bool includePTX = true;
-  for (Arg *A : Args) {
-if (!(A->getOption().matches(options::OPT_cuda_include_ptx_EQ) ||
-  A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ)))
-  continue;
+static bool shouldIncludePTX(const ArgList , StringRef InputArch) {
+  // The new driver does not include PTX by default.
+  bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver,

Artem-B wrote:

I'd add a comment on why we're making this decision based on the new vs old 
driver.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits


Artem-B wrote:

> > > Should I make `shouldIncludePTX` default to `false` for the new driver?
> > 
> > 
> > Yes, I think that's a better default.
> 
> Done, now requires `--cuda-include-ptx=`.

This may be worth adding to the release notes.


https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits


Artem-B wrote:

> Should I make `shouldIncludePTX` default to `false` for the new driver?

Yes, I think that's a better default.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits



@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation ,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.

Artem-B wrote:

Do we still respect `--cuda-include-ptx=...` ?



https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Include PTX in non-RDC mode using the new driver (PR #84367)

2024-03-07 Thread Artem Belevich via cfe-commits



@@ -4625,7 +4625,15 @@ Action *Driver::BuildOffloadingActions(Compilation ,
   DDeps.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadAction::DeviceDependences DDep;
   DDep.add(*A, *TCAndArch->first, TCAndArch->second.data(), Kind);
+
+  // Compiling CUDA in non-RDC mode uses the PTX output if available.
+  for (Action *Input : A->getInputs())
+if (Kind == Action::OFK_Cuda && A->getType() == types::TY_Object &&
+!Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,

Artem-B wrote:

I'm not quite sure why we would need to include PTX for RDC compilation.

In retrospect, including PTX by default with all compilations turned out to be 
a wrong default choice.
It's just a waste of space for most of the users, and it allows problems to go 
unnoticed for longer than they should (e.g. something was compiled for a wrong 
GPU).

Switching to the new driver is a good point to make a better choice. I would 
argue that we should not be including PTX by default or, if we do deem that it 
may be useful, only add it for the most recent chosen GPU variant, to provide 
some forward compatibility, not for all of them.

https://github.com/llvm/llvm-project/pull/84367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-06 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/83918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-05 Thread Artem Belevich via cfe-commits



@@ -0,0 +1,9 @@
+// CPU-side compilation on x86 (no errors expected).
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -aux-triple nvptx64 -x 
cuda -fsyntax-only -verify %s
+
+// GPU-side compilation on x86 (no errors expected)
+// RUN: %clang_cc1 -triple nvptx64 -aux-triple x86_64-unknown-linux-gnu 
-fcuda-is-device -x cuda -fsyntax-only -verify %s

Artem-B wrote:

I'd add a test verifying that we do emit diagnostics if fp128 is used in the 
GPU code.
It would probably need to be done somewhere in the codegen tests as it will not 
fire in the syntax-only checks.

https://github.com/llvm/llvm-project/pull/83918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-05 Thread Artem Belevich via cfe-commits



@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo 
,
 NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType);
 
   if (NewElemTy.isNull()) {
-Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name;
+// Only emit diagnostic on host for 128-bit mode attribute

Artem-B wrote:

OK. As long as you're sure that the remaining diag covers all possible uses of 
fp128 on the GPU, it should be fine.

https://github.com/llvm/llvm-project/pull/83918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Correctly set CUDA default architecture (PR #84017)

2024-03-05 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/84017
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-04 Thread Artem Belevich via cfe-commits



@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo 
,
 NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType);
 
   if (NewElemTy.isNull()) {
-Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name;
+// Only emit diagnostic on host for 128-bit mode attribute

Artem-B wrote:

> This is going to error out like this:
> 
> ```
> error: 'a' requires 128 bit size '__float128' type support, but target 
> 'nvptx64-nvidia-cuda' does not support it
> ```

Something does not add up. How would we get `target 'nvptx64-nvidia-cuda'` if 
the diag below only fires if we're compiling for the host?


https://github.com/llvm/llvm-project/pull/83918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CUDA] Disable float128 diagnostics for device compilation (PR #83918)

2024-03-04 Thread Artem Belevich via cfe-commits



@@ -4877,7 +4877,9 @@ void Sema::AddModeAttr(Decl *D, const AttributeCommonInfo 
,
 NewElemTy = Context.getRealTypeForBitwidth(DestWidth, ExplicitType);
 
   if (NewElemTy.isNull()) {
-Diag(AttrLoc, diag::err_machine_mode) << 1 /*Unsupported*/ << Name;
+// Only emit diagnostic on host for 128-bit mode attribute

Artem-B wrote:

What do you expect to see if __float128 is used from a GPU function.

Can you check on a toy example.

```
__attribute__((device)) __float128 f(__float128 a, float b) {
  __float128 c = b + 1.0;
  return a + c;
}
```

https://github.com/llvm/llvm-project/pull/83918
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits



@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer ,
 }
 
 OffloadBundlerConfig::OffloadBundlerConfig() {
+  if (llvm::compression::zstd::isAvailable()) {
+CompressionFormat = llvm::compression::Format::Zstd;
+// Use a high zstd compress level by default for better size reduction.

Artem-B wrote:

Also, I've just discovered that zstd already has 
https://github.com/facebook/zstd/blob/b293d2ebc3a5d29309390a70b3e7861b6f5133ec/lib/zstd.h#L394

```
ZSTD_c_enableLongDistanceMatching=160, /* Enable long distance matching.
 * This parameter is designed to improve 
compression ratio
 * for large inputs, by finding large 
matches at long distance.
 * It increases memory usage and window 
size.
 * Note: enabling this parameter increases 
default ZSTD_c_windowLog to 128 MB
 * except when expressly set to a different 
value.
 * Note: will be enabled by default if 
ZSTD_c_windowLog >= 128 MB and
 * compression strategy >= ZSTD_btopt (== 
compression level 16+) */
```

This sounds like something we could use here.

https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host-used external kernel (PR #83870)

2024-03-04 Thread Artem Belevich via cfe-commits



@@ -24,6 +24,7 @@
 
 // NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel2v
 // NEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel3v
+// XEG-NOT: @__clang_gpu_used_external = {{.*}} @_Z7kernel5v

Artem-B wrote:

Did you mean `NEG-NOT` ?

https://github.com/llvm/llvm-project/pull/83870
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host-used external kernel (PR #83870)

2024-03-04 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/83870
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host-used external kernel (PR #83870)

2024-03-04 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

LGTM in principle, but I'd run it by someone with more familiarity with linking 
quirks.

@MaskRay PTAL, when you get a chance.

https://github.com/llvm/llvm-project/pull/83870
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits



@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer ,
 }
 
 OffloadBundlerConfig::OffloadBundlerConfig() {
+  if (llvm::compression::zstd::isAvailable()) {
+CompressionFormat = llvm::compression::Format::Zstd;
+// Use a high zstd compress level by default for better size reduction.

Artem-B wrote:

I'd add more details here. While higher compression levels usually do improve 
compression ratio, in typical use case it's an incremental improvement. Here, 
we do it to achieve dramatic increase in compression ratio by exploiting the 
fact that we carry multiple sets of very similar large bitcode blobs, and that 
we need compression level high enough to fit one complete blob into compression 
window. At least that's the theory. 

Should we print a warning (or just document it?) when compression level ends up 
being below of what we'd expect? Considering that good compression starts at 
zstd-20, I suspect that compression level will go back to ~2.5x if the binary 
size for one GPU doubles in size and no longer fits. On top of that compression 
time will also increase, a lot. That will be a rather unpleasant surprise for 
whoever runs into it.

ZSTD's current compression parameters are set this way:
https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h#L47

```
{ 23, 24, 22,  7,  3,256, ZSTD_btultra2},  /* level 19 */
{ 25, 25, 23,  7,  3,256, ZSTD_btultra2,  /* level 20 */
```
First three numbers are log2 of (largest match distance, fully searched 
segment, dispatch table).

2^25 = 32MB which happens to be about the size of the single GPU binary in your 
example. I'm pretty sure this explains why `zstd-20` works so well on it, while 
zstd-19 does not. It will work well for the smaller binaries, but I'm pretty 
sure it will regress for a slightly larger binary.

I think it may be worth experimenting with fine-tuning compression settings and 
instead of blindly setting `zstd-20`, consider the size of the binary we need 
to deal with, and adjust only windowLog/chainLog appropriately.

Or we could set the default to lower compression level + large windowLog. This 
should still give us most of the compression benefits for the binaries that 
would fit into the window, but would avoid the performance cliff if the binary 
is too large.

I may be overcomplicating it too much, too. If someone does run into the 
problem, they now have a way to work around it by tweaking the compression 
level.


https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

2024-03-04 Thread Artem Belevich via cfe-commits



@@ -942,20 +942,28 @@ CompressedOffloadBundle::compress(const 
llvm::MemoryBuffer ,
   Input.getBuffer().size());
 
   llvm::compression::Format CompressionFormat;
+  int Level;
 
-  if (llvm::compression::zstd::isAvailable())
+  if (llvm::compression::zstd::isAvailable()) {
 CompressionFormat = llvm::compression::Format::Zstd;
-  else if (llvm::compression::zlib::isAvailable())
+// Use a high zstd compress level by default for better size reduction.
+const int DefaultZstdLevel = 20;

Artem-B wrote:

> compiling kernels to bitcode for 6 GPU takes 30s. compression with zstd level 
> 20 takes 2s.

This looks acceptable for me.

> unless zstd can be parallelized.

zstd does support multithreaded compression, but enabling it would run into the 
same issue we had with enabling multi-threaded compilation -- it will interfere 
with the build system's idea of resource usage. 


https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] change compress level (PR #83605)

2024-03-01 Thread Artem Belevich via cfe-commits



@@ -942,20 +942,28 @@ CompressedOffloadBundle::compress(const 
llvm::MemoryBuffer ,
   Input.getBuffer().size());
 
   llvm::compression::Format CompressionFormat;
+  int Level;
 
-  if (llvm::compression::zstd::isAvailable())
+  if (llvm::compression::zstd::isAvailable()) {
 CompressionFormat = llvm::compression::Format::Zstd;
-  else if (llvm::compression::zlib::isAvailable())
+// Use a high zstd compress level by default for better size reduction.
+const int DefaultZstdLevel = 20;

Artem-B wrote:

What's the default compression level for zstd? 

It would be great if we could override the compression level. I'm somewhat 
reluctant to impose max compression level on everyone by default, without any 
way out, if it turns out to be a problem.

@MaskRay WDYT? 

Max compression level may be fine. If we produce enough stuff for compression 
to take long, compilation time itself will likely dwarf the compression time. 
For the small TUs, even slow compression may be fine.

@yxsamliu how long the compilation w/o compression takes in your benchmarks?

https://github.com/llvm/llvm-project/pull/83605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-28 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Probably I need to define those functions with mixed args by default to avoid 
> regressions.

Are there any other regressions? Can hupCUB be fixed intsead? While their use 
case is probably benign, I'd rather fix the user code, than propagate CUDA bugs 
into HIP.

https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits



@@ -1306,15 +1306,73 @@ float min(float __x, float __y) { return 
__builtin_fminf(__x, __y); }
 __DEVICE__
 double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
 
-#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __arg1, int __arg2) {
-  return __arg1 < __arg2 ? __arg1 : __arg2;
+// Define host min/max functions.
+#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) &&  
\
+!defined(__HIP_NO_HOST_MIN_MAX_IN_GLOBAL_NAMESPACE__)
+
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#define DEFINE_MIN_MAX_FUNCTIONS(ret_type, type1, type2)   
\
+  inline ret_type min(const type1 __a, const type2 __b) {  
\
+return (__a < __b) ? __a : __b;
\
+  }
\
+  inline ret_type max(const type1 __a, const type2 __b) {  
\
+return (__a > __b) ? __a : __b;
\
+  }
+
+// Define min and max functions for same type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(int, int, int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, unsigned int)
+DEFINE_MIN_MAX_FUNCTIONS(long, long, long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, unsigned long)
+DEFINE_MIN_MAX_FUNCTIONS(long long, long long, long long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long,
+ unsigned long long)
+
+// CUDA defines host min/max functions with mixed signed/unsgined integer
+// parameters where signed integers are casted to unsigned integers. However,
+// this may not be users' intention. Therefore do not define them by default
+// unless users specify -D__HIP_DEFINE_MIXED_HOST_MIN_MAX__.

Artem-B wrote:

Nit: signed integers are implicitly promoted to unsigned ones due to the 
integer promotion rules. Cast would imply intentional cast and we're not doing 
that.

I'd rephrase it a bit along the lines of:

The routines below will perform unsigned comparison, which may produce invalid 
results if a signed integer was passed unintentionally. We do not want it 
happen silently, and do not provide these overloads by default. However for 
compatibility with CUDA, we allow them, if explicitly requested by the user by 
defining `__HIP_DEFINE_MIXED_HOST_MIN_MAX__`. 


https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits



@@ -1306,15 +1306,68 @@ float min(float __x, float __y) { return 
__builtin_fminf(__x, __y); }
 __DEVICE__
 double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
 
-#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __arg1, int __arg2) {
-  return __arg1 < __arg2 ? __arg1 : __arg2;
+// Define host min/max functions.
+#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) &&  
\
+!defined(__HIP_NO_HOST_MIN_MAX_IN_GLOBAL_NAMESPACE__)
+
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#define DEFINE_MIN_MAX_FUNCTIONS(ret_type, type1, type2)   
\
+  static inline ret_type min(const type1 __a, const type2 __b) {   
\
+return (__a < __b) ? __a : __b;
\
+  }
\
+  static inline ret_type max(const type1 __a, const type2 __b) {   
\
+return (__a > __b) ? __a : __b;
\
+  }
+
+// Define min and max functions for same type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(int, int, int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, unsigned int)
+DEFINE_MIN_MAX_FUNCTIONS(long, long, long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, unsigned long)
+DEFINE_MIN_MAX_FUNCTIONS(long long, long long, long long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long,
+ unsigned long long)
+
+// Define min and max functions for all mixed type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, int, unsigned int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, long, unsigned long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, long long, unsigned long long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long, long long)

Artem-B wrote:

Not everything CUDA does is the right model to follow. This may be one of the 
cases where we should improve things, if we can, instead of just copying the 
broken behavior. Not adding problematic things is easier than removing them 
later, when they are used, intentionally or not.

Considering that HIP currently does not have those functions, it would suggest 
that there is probably no existing HIP code depending on them. Existing cuda 
code which may need those functions will need some amount of porting to HIP, 
anyway, so fixing the source code could be done as part of the porting effort.

We could put those mixed min/max functions under some preprocessor guard, which 
would keep them disabled by default. If someone desperately needs them, they 
would have to specify 
`-DPLEASE_ENABLE_BROKEN_MINMAX_ON_MIXED_SIGNED_UNSIGNED_TYPES`.



https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Artem Belevich via cfe-commits



@@ -1306,15 +1306,68 @@ float min(float __x, float __y) { return 
__builtin_fminf(__x, __y); }
 __DEVICE__
 double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
 
-#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __arg1, int __arg2) {
-  return __arg1 < __arg2 ? __arg1 : __arg2;
+// Define host min/max functions.
+#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) &&  
\
+!defined(__HIP_NO_HOST_MIN_MAX_IN_GLOBAL_NAMESPACE__)
+
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#define DEFINE_MIN_MAX_FUNCTIONS(ret_type, type1, type2)   
\
+  static inline ret_type min(const type1 __a, const type2 __b) {   
\
+return (__a < __b) ? __a : __b;
\
+  }
\
+  static inline ret_type max(const type1 __a, const type2 __b) {   
\
+return (__a > __b) ? __a : __b;
\
+  }
+
+// Define min and max functions for same type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(int, int, int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, unsigned int)
+DEFINE_MIN_MAX_FUNCTIONS(long, long, long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, unsigned long)
+DEFINE_MIN_MAX_FUNCTIONS(long long, long long, long long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long,
+ unsigned long long)
+
+// Define min and max functions for all mixed type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, int, unsigned int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, long, unsigned long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, long long, unsigned long long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long, long long)

Artem-B wrote:

I assume these are needed in order to avoid errors about ambiguous overload 
resolution when we pass signed/unsigned arguments.

Normally, if we were to use `std::min()` function, the user would have to 
explicitly cast arguments or use `std::min()` to resolve the issue.

Implicitly converting int->unsigned under the hood is probably not a good idea 
here as we do not know what the user needs/wants and whether it's a WAI or an 
error. For min/max converting a negative argument into an unsigned would 
probably be an error. I think we do need to force users to use one of the 
all-signed or all-unsigned variants here, too, same as with std::min/max.



https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Enable the _Float16 type for NVPTX compilation (PR #82436)

2024-02-20 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/82436
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][NVPTX] Allow passing arguments to the linker while standalone (PR #73030)

2024-02-20 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/73030
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

Overall LGTM. Please wait for @jhuber6's to double check the partial linking 
mechanics details.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits



@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.
+if (!BufferOrErr)
+  continue;
+
+processInput(BufferOrErr.get()->getMemBufferRef());
+  } else
+WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(),
+CurrentAction->getInputs().end());
+}
+  }
+
+  void processInput(const llvm::MemoryBufferRef ) {
+// Try processing as object file first.
+auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer);
+if (ObjFileOrErr) {
+  processSymbols(**ObjFileOrErr);
+  return;
+}
+
+// Then try processing as archive files.
+llvm::consumeError(ObjFileOrErr.takeError());
+auto ArchiveOrErr = llvm::object::Archive::create(Buffer);
+if (ArchiveOrErr) {
+  llvm::Error Err = llvm::Error::success();
+  llvm::object::Archive  = *ArchiveOrErr.get();
+  for (auto  : Archive.children(Err)) {
+auto ChildBufOrErr = Child.getMemoryBufferRef();
+if (ChildBufOrErr)
+  processInput(*ChildBufOrErr);
+else
+  errorHandler(ChildBufOrErr.takeError());
+  }
+
+  if (Err)
+errorHandler(std::move(Err));
+  return;
+}
+
+// Ignore other files.
+llvm::consumeError(ArchiveOrErr.takeError());
+  }
+  void processSymbols(const llvm::object::ObjectFile ) {
+for (const auto  : Obj.symbols()) {
+  auto FlagOrErr = Symbol.getFlags();
+  if (!FlagOrErr) {
+errorHandler(FlagOrErr.takeError());
+continue;
+  }
+
+  // Filter only undefined symbols
+  if (!(FlagOrErr.get() & llvm::object::SymbolRef::SF_Undefined)) {

Artem-B wrote:

style nit: remove `{}` around single-statement body. 

Applies here and in a handful of other places throughout the patch.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits



@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.

Artem-B wrote:

Comment could use some editing. `therefore, ignore an error if we fail to read 
the file`.

This makes me ask -- what if the argument *is* an input file, and we do fail to 
read it. How do we tell apart the linker options from the input file? Relying 
on a failure to read it does not seem to be a good way to handle it.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits



@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.
+if (!BufferOrErr)
+  continue;
+
+processInput(BufferOrErr.get()->getMemBufferRef());
+  } else
+WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(),
+CurrentAction->getInputs().end());
+}
+  }
+
+  void processInput(const llvm::MemoryBufferRef ) {
+// Try processing as object file first.
+auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer);
+if (ObjFileOrErr) {
+  processSymbols(**ObjFileOrErr);
+  return;
+}
+
+// Then try processing as archive files.
+llvm::consumeError(ObjFileOrErr.takeError());
+auto ArchiveOrErr = llvm::object::Archive::create(Buffer);
+if (ArchiveOrErr) {
+  llvm::Error Err = llvm::Error::success();
+  llvm::object::Archive  = *ArchiveOrErr.get();
+  for (auto  : Archive.children(Err)) {
+auto ChildBufOrErr = Child.getMemoryBufferRef();
+if (ChildBufOrErr)
+  processInput(*ChildBufOrErr);
+else
+  errorHandler(ChildBufOrErr.takeError());
+  }
+
+  if (Err)
+errorHandler(std::move(Err));
+  return;
+}
+
+// Ignore other files.
+llvm::consumeError(ArchiveOrErr.takeError());
+  }
+  void processSymbols(const llvm::object::ObjectFile ) {

Artem-B wrote:

Nit -- add an empty line to separate functions.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits



@@ -104,6 +104,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) 
const {
   case ISD::ATOMIC_STORE:   return "AtomicStore";
   case ISD::PCMARKER:   return "PCMarker";
   case ISD::READCYCLECOUNTER:   return "ReadCycleCounter";
+  case ISD::READSTEADYCOUNTER: return "ReadFixedTimer";

Artem-B wrote:

Should it be "ReadSteadyCounter" ? 

Also, whitespace/alignment looks off.

https://github.com/llvm/llvm-project/pull/81331
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits



@@ -2764,6 +2764,37 @@ Query for this feature with 
``__has_builtin(__builtin_readcyclecounter)``. Note
 that even if present, its use may depend on run-time privilege or other OS
 controlled state.
 
+``__builtin_readsteadycounter``
+--
+
+``__builtin_readsteadycounter`` is used to access the fixed frequency counter
+register (or a similar steady-rate clock) on those targets that support it.
+The function is similar to ``__builtin_readcyclecounter`` above except that the
+frequency is fixed, making it suitable for measuring elapsed time.

Artem-B wrote:

Should we mention that we do not guarantee any particular frequency, just that 
it's stable and it's up to the user to figure out the actual frequency, if they 
need to.

https://github.com/llvm/llvm-project/pull/81331
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B commented:

LGTM with few nits for general and NVPTX parts.

https://github.com/llvm/llvm-project/pull/81331
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/81331
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add builtin support for 'globaltimer' (PR #79765)

2024-02-09 Thread Artem Belevich via cfe-commits



@@ -140,6 +140,17 @@ define void @test_exit() {
   ret void
 }
 
+; CHECK-LABEL: test_globaltimer
+define i64 @test_globaltimer() {
+; CHECK: mov.u64 %r{{.*}}, %globaltimer;
+  %a = tail call i64 @llvm.nvvm.read.ptx.sreg.globaltimer()

Artem-B wrote:

Thise need sm_30+. Right now the test runs with sm_30. LLVM does compile these 
intrinsics, but ptxas fails because the register is not available on sm_20.

The test needs to be updated to use a reasonably new GPU target. Probably sm_60 
is the oldest one anybody still cares about.

https://github.com/llvm/llvm-project/pull/79765
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits


Artem-B wrote:

LGTM

https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits



@@ -159,6 +159,7 @@ BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n")
 
 BUILTIN(__nvvm_prmt, "UiUiUiUi", "")
 BUILTIN(__nvvm_exit, "v", "r")
+BUILTIN(__nvvm_reflect, "UicC*", "r")

Artem-B wrote:

Now that we're exposing it to the end users. We should probably document what 
it does.
Probably somewhere in https://clang.llvm.org/docs/LanguageExtensions.html



https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits



@@ -1624,8 +1624,9 @@ def int_nvvm_compiler_error :
 def int_nvvm_compiler_warn :
 Intrinsic<[], [llvm_anyptr_ty], [], "llvm.nvvm.compiler.warn">;
 
-def int_nvvm_reflect :
-  Intrinsic<[llvm_i32_ty], [llvm_anyptr_ty], [IntrNoMem], "llvm.nvvm.reflect">;
+def int_nvvm_reflect : 
+  Intrinsic<[llvm_i32_ty], [llvm_ptr_ty], [IntrNoMem], "llvm.nvvm.reflect">,
+  ClangBuiltin<"__nvvm_reflect">;

Artem-B wrote:

I vaguely recall that OpenCL folks had to use it with a slightly different 
signature. I think their pointer argument was in an unusual address space, 
where OCL keeps their string constants. It would be great to double check that 
the new builtin does not break them.

https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

LGTM overall.

https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (PR #81277)

2024-02-09 Thread Artem Belevich via cfe-commits


Artem-B wrote:

> We should expose it as an intrinsic

I think you mean `builtin` here.

https://github.com/llvm/llvm-project/pull/81277
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Allow 'all' as a generic bundled architecture (PR #81193)

2024-02-08 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/81193
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Allow 'all' as a generic bundled architecture (PR #81193)

2024-02-08 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/81193
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Okay, `__nvvm_reflect` doesn't work fully here because the `nanosleep` 
> builtin I added requires `sm_70` at the clang level. Either means I'd need to 
> go back to inline assembly or remove that requirement at least from clang so 
> it's a backend failure.

The question is -- who's going to provide a fallback implementation for the 
nanosleepbuiltin for the older GPUs. I do not think it's LLVM's job, so 
constraining the builtin is appropriate. However, nothing stops you from 
providing your own implementation in libc using inline asm. Something along 
these lines:
```
__device__ void my_nanosleep(int N) {
  if (__nvvm_reflect(SM_70)) {
asm volatile("nanosleep")
  } else {
 while(N--) {
volatile asm("something unoptimizable")
 }
  }
}
```

https://github.com/llvm/llvm-project/pull/81033
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)

2024-02-07 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> This patch, which simply makes it legal on all architectures but do nothing 
> is it's older than sm_70.

I do not think this is the right thing to do. "do nothing" is not what one 
would expect from a `nanosleep`.

Let's unpack your problem a bit.

__nvvm_reflect() is probably closest to what you would need. However, IIUIC, if 
you use it to provide nanosleep-based variant and an alternative for the older 
GPUs, the `nanosleep` variant code will still hang off the dead branch of 
if(__nvvm_reflect()) and if it's not eliminated by DCE (which it would not if 
optimizations are off), the resulting PTX will be invalid for the older GPUs.

In other words, pushing nanosleep implementation into an intrinsic makes things 
compile everywhere at the expense of doing a wrong thing on the older GPUs. I 
do not think it's a good trade-off.

Perhaps a better approach would be to incorporate dead branch elimination onto 
NVVMReflect pass itself. We do know that it is the explicit intent of 
`__nvvm_reflect()`. If NVVMReflect explicitly guarantees that the dead branch 
will be gone, it should allow you to use approach `#1` w/o concerns for whether 
optimizations are enabled and you should be able to provide whatever 
alternative implementation you need (even if it's a null one), without 
affecting correctness of LLVM itself. 

https://github.com/llvm/llvm-project/pull/81033
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [flang] [InstCombine] Canonicalize constant GEPs to i8 source element type (PR #68882)

2024-02-02 Thread Artem Belevich via cfe-commits


Artem-B wrote:

Another corner case here. Untyped GEP resulted in SimpifyCFG producing a 
`load(gep(argptr, cond ? 24 : 0))` instead of `load( cond ? gep(argptr, 24) : 
argptr)` it produced before the patch, and that eventually prevented SROA from 
processing that load.

While it's not a bug in this patch, the consequence is a pretty serious 
performance regression in some GPU code. And we do not have a workaround. :-/ 

Minimized reproducer:
```
# opt 
-passes='inline,simplifycfg,instcombine,sroa'
 -S https://github.com/llvm/llvm-project/pull/68882
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Diagnose unaligned atomic (PR #80322)

2024-02-01 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

You may want to check that we can still disable the error with 
`-Wno-error=atomic-alignment` passed via top-level options.

Other than that LGTM.

https://github.com/llvm/llvm-project/pull/80322
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> the idea is that it would be the desired effect if someone went out of their 
> way to do this GPU subset linking thing.

That would only be true when someone owns the whole build. That will not be the 
case in practice. A large enough project is usually a bunch of libraries 
created by different teams and vendors. They may or may not be built together 
and how a particular library is built is often controlled by its owner and may 
not be visible to the end user. The owners may consider switching to device 
linking to be benign or irrelevant to the end users, but it will be observable 
by those upstream users.

Being aware of the quirks introduced by device linking will be required for the 
owners of those libraries. You do know how it all works under the hood. Pretty 
much nobody else on the planet does. :-)

Anyways. I think we're in agreement that we do need to document possible 
implications.  clang-linker-wrapper docs would do. 

https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> I'm assuming you're talking about GPU-side constructors? I don't think the 
> CUDA runtime supports those, but OpenMP runs them when the image is loaded, 
> so it would handle both independantly.

Yes. I'm thinking of the expectations from a C++ user standpoint, and this is 
one of the areas where there will be observable differences. First, because 
there will be subsets of the code that are no longer part of the main GPU-side 
executable. Second, the side effects of the initializers will be different 
depending on whether we do link such subsets separately or not. E.g. the 
initializer call order will change. The global state changes in one subset will 
not be visible in the other. Weak symbol resolution will produce different 
results. Etc.

> The idea is that users already get C++-like behavior with the new driver and 
> -fgpu-rdc generally

Yes. That will set the default expectations that things work just like in C++, 
which is a great thing. But introduction of partial subset linking will break 
some of those "just works" assumptions and it may be triggered by the parts of 
the build  outside of user's control (e.g. by a third-party library). 

Side note: we do need a good term for this kind of subset linking. "partial 
linking" already has established meaning and it's not a good fit here as we 
actually produce a fully linked GPU executable.

> we don't need to worry about people being confused so long as we document 
> what it does.

We do need to document how it works. Documenting what does not work, or works 
differently is also important, IMO. 
We *do* need to worry about users and their expectations. 

https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits


Artem-B wrote:

Supporting such mixed mode opens an interesting set of issues we may need to 
consider going forward:
* who/where/how runs initializers in the fully linked parts?
* Are public functions in the fully linked parts visible to the functions in 
partially linked parts? In the full-rdc mode they would, as if it's a plain C++ 
compilation. In partial they would not as the main GPU executable and the 
partial parts will be in separate executables. 

This would be OK for something like CUDA where cross-TU references are usually 
limited to host, but would be surprising for someone who would expect C++-like 
behavior, which sort of the ultimate goal for offloading use case. This will 
eventually become a problem if/when we grow large enough subset of independent 
offload-enabled libraries. The top-level user will have a hard time figuring 
out what's visible and what is not, unless the libraries deliberately expose 
only host-level APIs, if/when they fully link GPU side code.

 

https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits



@@ -265,6 +329,11 @@ Error runLinker(ArrayRef Files, const ArgList 
) {
 LinkerArgs.push_back(Arg);
   if (Error Err = executeCommands(LinkerPath, LinkerArgs))
 return Err;
+
+  if (Args.hasArg(OPT_relocatable))
+if (Error Err = relocateOffloadSection(Args, ExecutableName))

Artem-B wrote:

We could just `return relocateOffloadSection(Args, ExecutableName)`

https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits



@@ -20,10 +20,12 @@ using EntryArrayTy = std::pair;
 /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
 /// symbols holding the `__tgt_offload_entry` array.
 /// \param Suffix An optional suffix appended to the emitted symbols.
+/// \param Relocatable Indicate if we need to change the offloading section.

Artem-B wrote:

Nit: "Indicate whether the binary is a relocatable object" may work a bit 
better for describing intent. Current description seems to describe an 
implementation detail.


https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Artem Belevich via cfe-commits


Artem-B wrote:

So, the idea is to carry two separate embedded offloading sections -- one for 
already fully linked GPU executables, and another for GPU objects to be linked 
at the final link stage.

> We also use a sepcial section called something like omp_offloading_entries

Typo in 'special' in the description.


https://github.com/llvm/llvm-project/pull/80066
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/79873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Artem Belevich via cfe-commits


Artem-B wrote:

Considering that it's for the stand-alone compilation only, I'm not going to 
block this patch.
That said, please add a `TODO` somewhere to address an issue w/ explicitly 
targeting generic variant.
 

https://github.com/llvm/llvm-project/pull/79873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-29 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Right now if you specify target-cpu you get target-cpu attributes, which is 
> what we don't want. 

I'm fine handling 'generic' in a special way under the hood and not specifying 
target-CPU.

My concern is about user-facing interface. Command line options must be 
overridable. 
For the CPU I would be able to specify the variant that matches the default.
For GPU I'll have no way to explicitly pick 'generic' as the target. I think 
this is important.

https://github.com/llvm/llvm-project/pull/79873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-29 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> I think there's some precedent from both vendors to treat missing attributes 
> as a more generic target.

It sounds more like a bug than a feature to me.

The major difference between "you get sm_xx by default" and this "you get 
generic by default" is that With specific sm_XX, I can override it both ways -- 
I wan enable/disable it if I need to regardless of how it was specified before 
my overriding options.

With the magic unnameable 'generic' target, I can only disable it by specifying 
it, but there's no way to enable it once a preceding option names some specific 
architecture.

It makes little difference where you control complete build, but that is not 
the case for all builds. E.g. Tensorflow builds with bazel and the end user 
does not have access to whatever compiler flags global build rules may set. So 
if you want to build for generic GPU target, you will have to jump through way 
more hoops than is reasonable, as opposed to specifying a few overriding 
options you're interested in.

I'm fine with defaulting to such generic target, but I do believe we need to 
handle it the same way as specific targets.

https://github.com/llvm/llvm-project/pull/79873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Change 'activemask' to use 'nvvm_activemask()' (PR #79892)

2024-01-29 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/79892
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-29 Thread Artem Belevich via cfe-commits


Artem-B wrote:

Relying on something *not* being defined is probably not the best way to handle 
'generic' target. For starters it makes it hard or impossible to recreate the 
same compilation state by undoing already-specified option. It also breaks 
established assumption that there *is* a default target CPU/GPU. If we do want 
to have a generic GPU target, then we should grow an explicit 'generic' GPU 
variant, IMO. It would be a functional opposite of 'native'.

https://github.com/llvm/llvm-project/pull/79873
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add builtin support for 'globaltimer' (PR #79765)

2024-01-29 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/79765
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add builtin support for 'nanosleep' PTX instrunction (PR #79888)

2024-01-29 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/79888
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits


https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits



@@ -4599,6 +4599,14 @@ def int_nvvm_vote_ballot_sync :
 [IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback], 
"llvm.nvvm.vote.ballot.sync">,
   ClangBuiltin<"__nvvm_vote_ballot_sync">;
 
+//
+// ACTIVEMASK
+//
+def int_nvvm_activemask :
+  Intrinsic<[llvm_i32_ty], [],
+[IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback, 
IntrHasSideEffects], "llvm.nvvm.activemask">,
+  ClangBuiltin<"__nvvm_activemask">;

Artem-B wrote:

Separate patch is fine, too. 

https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits



@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>;
 def : Proc<"sm_62", [SM62, PTX50]>;
 def : Proc<"sm_70", [SM70, PTX60]>;
 def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
+def : Proc<"sm_75", [SM75, PTX62, PTX63]>;

Artem-B wrote:

I'm confused a bit here. Constraints on PTX version for GPU and for 
instrunctions are independent. You need both satisfied in order to use a given 
instruction on a given GPU.

So, to use activemask on  sm_75, you do need PTX63.
To use it on sm_52, you only need PTX62.

You do not need to change anything here. You already have correct predicates 
applied to the instruction itself and to the target builtin.

https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits



@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>;
 def : Proc<"sm_62", [SM62, PTX50]>;
 def : Proc<"sm_70", [SM70, PTX60]>;
 def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
+def : Proc<"sm_75", [SM75, PTX62, PTX63]>;

Artem-B wrote:

What are you trying to do with PTX62 feature to start with? Why do you need to 
add it here to start with?

In general, the features will be supplied externally. This particular place 
just sets the minimum required to support this particular GPU variant.

https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits



@@ -65,7 +65,7 @@ def : Proc<"sm_61", [SM61, PTX50]>;
 def : Proc<"sm_62", [SM62, PTX50]>;
 def : Proc<"sm_70", [SM70, PTX60]>;
 def : Proc<"sm_72", [SM72, PTX61]>;
-def : Proc<"sm_75", [SM75, PTX63]>;
+def : Proc<"sm_75", [SM75, PTX62, PTX63]>;

Artem-B wrote:

Why are we adding PTX62 here?

According to [PTX 
docs](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history)
 sm_75 has been introduced in PTX ISA 6.3 in CUDA-10.0.

https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits



@@ -4599,6 +4599,14 @@ def int_nvvm_vote_ballot_sync :
 [IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback], 
"llvm.nvvm.vote.ballot.sync">,
   ClangBuiltin<"__nvvm_vote_ballot_sync">;
 
+//
+// ACTIVEMASK
+//
+def int_nvvm_activemask :
+  Intrinsic<[llvm_i32_ty], [],
+[IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback, 
IntrHasSideEffects], "llvm.nvvm.activemask">,
+  ClangBuiltin<"__nvvm_activemask">;

Artem-B wrote:

Should we shange `__activemask` to use the new builtin instead of inline asm?
https://github.com/llvm/llvm-project/blob/eac8d713a6682417d06f5ee7f90a8ce54a281df8/clang/lib/Headers/__clang_cuda_intrinsics.h#L214

https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits


Artem-B wrote:

https://bugs.llvm.org/show_bug.cgi?id=35249

https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [NVPTX] Add 'activemask' builtin and intrinsic support (PR #79768)

2024-01-29 Thread Artem Belevich via cfe-commits


Artem-B wrote:

'activemask' is a rather peculiar instruction which may not be a good candidate 
for exposing it to LLVM.

The problem is that it can 'observe' the past branch decisions and reflects the 
state of not-yet-reconverged conditional branches. LLVM does not take it into 
account. Opaque inline assembly is the sledgehammer which stops LLVM from doing 
anything fancy with it. The intrinsic will need to have appropriately 
conservative attributes, at the very least.

I think we've had a bug about that and, if I recall correctly, we could not 
come up with a good way to handle activemask. Let me try finding the details.

https://github.com/llvm/llvm-project/pull/79768
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 3 4 5 6 7 8 >

1 - 100 of 782 matches

Mail list logo