[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-03-17 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D140663#4203604 , @tra wrote:

> It appears that this patch may be causing a use-after free when we attempt to 
> generate kernel registration code. 
> The root cause is that the value we insert into `KernelHandles` by name is 
> later on replaced by a different instance of the global value with the same 
> name.
> AFAICT, the invalidation issue was present before but we accidentally avoided 
> it because we only looked up the still-valid new entries. The dangling 
> references were still in the map, but not accessed.

Agree. I think the reason is that the `F` we passed into 
`CGNVCUDARuntime::getKernelHandle` may be replaced by a new function with the 
same name. Luckily, the new function should be passed to 
`CGNVCUDARuntime::getKernelHandle` again, therefore we get a chance to update 
our maps.




Comment at: clang/lib/CodeGen/CGCUDANV.cpp:1198
+  auto Loc = KernelHandles.find(F->getName());
   if (Loc != KernelHandles.end())
 return Loc->second;

It is possible that F is replaced with a new function with the same name. In 
this case, we need to update our map, so add a condition `&& Loc->second == F` 
to the above condition.





Comment at: clang/lib/CodeGen/CGCUDANV.cpp:1207-1215
   auto *Var = new llvm::GlobalVariable(
   TheModule, F->getType(), /*isConstant=*/true, F->getLinkage(),
   /*Initializer=*/nullptr,
   CGM.getMangledName(
   GD.getWithKernelReferenceKind(KernelReferenceKind::Kernel)));
   Var->setAlignment(CGM.getPointerAlign().getAsAlign());
   Var->setDSOLocal(F->isDSOLocal());

Add a condition `if (Loc == KernelHandles.end())` to the above code for 
creating and modifying `Var`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-03-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

It appears that this patch may be causing a use-after free when we attempt to 
generate kernel registration code. 
The root cause is that the value we insert into `KernelHandles` by name is 
later on replaced by a different instance of the global value with the same 
name.
AFAICT, the invalidation issue was present before but we accidentally avoided 
it because we only looked up the still-valid new entries. The dangling 
references were still in the map, but not accessed.

It's reproducible on this example: https://godbolt.org/z/qGYTr3Ej5

Here's the stack trace for the call path which frees the old entry:

  #3  0x55e5564e75e2 in llvm::User::operator delete (Usr=0x55e55f6ff438) at 
/usr/local/google/home/tra/work/llvm/repo/llvm/lib/IR/User.cpp:190
  #4  0x55e5563a0a70 in 
llvm::ilist_alloc_traits::deleteNode (V=0x55e55f6ff438) at 
/usr/local/google/home/tra/work/llvm/repo/llvm/include/llvm/ADT/ilist.h:42
  #5  0x55e55639d875 in 
llvm::iplist_impl, 
llvm::SymbolTableListTraits >::erase (this=0x55e55f75a3e8, 
where=...) at 
/usr/local/google/home/tra/work/llvm/repo/llvm/include/llvm/ADT/ilist.h:269
  #6  0x55e55637c173 in llvm::Function::eraseFromParent 
(this=0x55e55f6ff438) at 
/usr/local/google/home/tra/work/llvm/repo/llvm/lib/IR/Function.cpp:367
  #7  0x55e5563a4ad5 in llvm::GlobalValue::eraseFromParent 
(this=0x55e55f6ff438) at 
/usr/local/google/home/tra/work/llvm/repo/llvm/include/llvm/IR/Value.def:76
  #8  0x55e5570dd747 in 
clang::CodeGen::CodeGenModule::applyGlobalValReplacements (this=0x55e55f7a06d0) 
at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/CodeGen/CodeGenModule.cpp:315
  #9  0x55e5570deb3a in clang::CodeGen::CodeGenModule::Release 
(this=0x55e55f7a06d0) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/CodeGen/CodeGenModule.cpp:540
  #10 0x55e5581e5ede in (anonymous 
namespace)::CodeGeneratorImpl::HandleTranslationUnit (this=0x55e55f79aa40, 
Ctx=...) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/CodeGen/ModuleBuilder.cpp:287
  #11 0x55e5581de64e in clang::BackendConsumer::HandleTranslationUnit 
(this=0x55e55f79a7a0, C=...) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/CodeGen/CodeGenAction.cpp:308
  #12 0x55e55b063273 in clang::ParseAST (S=..., PrintStats=false, 
SkipFunctionBodies=false) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/Parse/ParseAST.cpp:175
  #13 0x55e557ff26ec in clang::ASTFrontendAction::ExecuteAction 
(this=0x55e55f75aa20) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/Frontend/FrontendAction.cpp:1168
  #14 0x55e5581da604 in clang::CodeGenAction::ExecuteAction 
(this=0x55e55f75aa20) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/CodeGen/CodeGenAction.cpp:1172
  #15 0x55e557ff20ec in clang::FrontendAction::Execute 
(this=0x55e55f75aa20) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/Frontend/FrontendAction.cpp:1058
  #16 0x55e557f1b618 in clang::CompilerInstance::ExecuteAction 
(this=0x55e55f7564c0, Act=...) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/Frontend/CompilerInstance.cpp:1048
  #17 0x55e5581c35c7 in clang::ExecuteCompilerInvocation 
(Clang=0x55e55f7564c0) at 
/usr/local/google/home/tra/work/llvm/repo/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:264
  #18 0x55e5533464b0 in cc1_main (Argv=llvm::ArrayRef of length 86 = {...}, 
Argv0=0x7ffd95ce621e 
"/usr/local/google/home/tra/work/llvm/build/debug/bin/clang-15", 
MainAddr=0x55e5533305f0 )
  at 
/usr/local/google/home/tra/work/llvm/repo/clang/tools/driver/cc1_main.cpp:251
  #19 0x55e553331dca in ExecuteCC1Tool (ArgV=llvm::SmallVector of Size 87, 
Capacity 256 = {...}, ToolContext=...) at 
/usr/local/google/home/tra/work/llvm/repo/clang/tools/driver/driver.cpp:366
  #20 0x55e553330aec in clang_main (Argc=87, Argv=0x7ffd95ce4a68, 
ToolContext=...) at 
/usr/local/google/home/tra/work/llvm/repo/clang/tools/driver/driver.cpp:407
  #21 0x55e553369d5d in main (argc=87, argv=0x7ffd95ce4a68) at 
tools/clang/tools/driver/clang-driver.cpp:15


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-01-19 Thread Daniele Castagna via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG32c26e27b6fc: CUDA/HIP: Use kernel name to map to symbol 
(authored by dcastagna).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

Files:
  clang/lib/CodeGen/CGCUDANV.cpp
  clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu

Index: clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
@@ -0,0 +1,27 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \
+// RUN: | FileCheck %s
+
+#define __global__ __attribute__((global))
+// CHECK: @_Z4kern7TempValIjE = constant ptr @_Z19__device_stub__kern7TempValIjE, align 8
+// CHECK: @0 = private unnamed_addr constant [19 x i8] c"_Z4kern7TempValIjE\00", align 1
+template 
+struct TempVal {
+  type value;
+};
+
+__global__ void kern(TempVal in_val);
+
+int main(int argc, char ** argv) {
+  auto* fptr = &(kern);
+// CHECK:   store ptr @_Z4kern7TempValIjE, ptr %fptr, align 8
+  return 0;
+}
+// CHECK:  define dso_local void @_Z19__device_stub__kern7TempValIjE(i32 %in_val.coerce) #1 {
+// CHECK:  %2 = call i32 @hipLaunchByPtr(ptr @_Z4kern7TempValIjE)
+
+// CHECK:  define internal void @__hip_register_globals(ptr %0) {
+// CHECK:%1 = call i32 @__hipRegisterFunction(ptr %0, ptr @_Z4kern7TempValIjE, ptr @0, ptr @0, i32 -1, ptr null, ptr null, ptr null, ptr null, ptr null)
+
+__global__ void kern(TempVal in_val) {
+}
+
Index: clang/lib/CodeGen/CGCUDANV.cpp
===
--- clang/lib/CodeGen/CGCUDANV.cpp
+++ clang/lib/CodeGen/CGCUDANV.cpp
@@ -49,10 +49,10 @@
 const Decl *D;
   };
   llvm::SmallVector EmittedKernels;
-  // Map a device stub function to a symbol for identifying kernel in host code.
+  // Map a kernel mangled name to a symbol for identifying kernel in host code
   // For CUDA, the symbol for identifying the kernel is the same as the device
   // stub function. For HIP, they are different.
-  llvm::DenseMap KernelHandles;
+  llvm::DenseMap KernelHandles;
   // Map a kernel handle to the kernel stub.
   llvm::DenseMap KernelStubs;
   struct VarInfo {
@@ -310,7 +310,8 @@
 void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction ,
  FunctionArgList ) {
   EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
-  if (auto *GV = dyn_cast(KernelHandles[CGF.CurFn])) {
+  if (auto *GV =
+  dyn_cast(KernelHandles[CGF.CurFn->getName()])) {
 GV->setLinkage(CGF.CurFn->getLinkage());
 GV->setInitializer(CGF.CurFn);
   }
@@ -400,8 +401,8 @@
ShmemSize.getPointer(), Stream.getPointer()});
 
   // Emit the call to cudaLaunch
-  llvm::Value *Kernel =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], VoidPtrTy);
+  llvm::Value *Kernel = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], VoidPtrTy);
   CallArgList LaunchKernelArgs;
   LaunchKernelArgs.add(RValue::get(Kernel),
cudaLaunchKernelFD->getParamDecl(0)->getType());
@@ -456,8 +457,8 @@
 
   // Emit the call to cudaLaunch
   llvm::FunctionCallee cudaLaunchFn = getLaunchFn();
-  llvm::Value *Arg =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], CharPtrTy);
+  llvm::Value *Arg = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], CharPtrTy);
   CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);
   CGF.EmitBranch(EndBlock);
 
@@ -551,7 +552,7 @@
 llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
 llvm::Value *Args[] = {
 ,
-Builder.CreateBitCast(KernelHandles[I.Kernel], VoidPtrTy),
+Builder.CreateBitCast(KernelHandles[I.Kernel->getName()], VoidPtrTy),
 KernelName,
 KernelName,
 llvm::ConstantInt::get(IntTy, -1),
@@ -1130,7 +1131,7 @@
   StringRef Section = CGM.getLangOpts().HIP ? "hip_offloading_entries"
 : "cuda_offloading_entries";
   for (KernelInfo  : EmittedKernels)
-OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel],
+OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel->getName()],
getDeviceSideName(cast(I.D)), 0,
DeviceVarFlags::OffloadGlobalEntry, Section);
 
@@ -1193,12 +1194,12 @@
 
 llvm::GlobalValue *CGNVCUDARuntime::getKernelHandle(llvm::Function *F,
 GlobalDecl GD) {
-  auto Loc = KernelHandles.find(F);
+  auto Loc = KernelHandles.find(F->getName());
   if (Loc != KernelHandles.end())
 return Loc->second;
 
   if (!CGM.getLangOpts().HIP) {
-KernelHandles[F] = F;
+KernelHandles[F->getName()] = F;
 KernelStubs[F] = F;

[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-01-19 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl accepted this revision.
yaxunl added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-01-19 Thread Daniele Castagna via Phabricator via cfe-commits
dcastagna updated this revision to Diff 490644.
dcastagna added a comment.

Add a check for device side kernel name


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

Files:
  clang/lib/CodeGen/CGCUDANV.cpp
  clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu

Index: clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
@@ -0,0 +1,27 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \
+// RUN: | FileCheck %s
+
+#define __global__ __attribute__((global))
+// CHECK: @_Z4kern7TempValIjE = constant ptr @_Z19__device_stub__kern7TempValIjE, align 8
+// CHECK: @0 = private unnamed_addr constant [19 x i8] c"_Z4kern7TempValIjE\00", align 1
+template 
+struct TempVal {
+  type value;
+};
+
+__global__ void kern(TempVal in_val);
+
+int main(int argc, char ** argv) {
+  auto* fptr = &(kern);
+// CHECK:   store ptr @_Z4kern7TempValIjE, ptr %fptr, align 8
+  return 0;
+}
+// CHECK:  define dso_local void @_Z19__device_stub__kern7TempValIjE(i32 %in_val.coerce) #1 {
+// CHECK:  %2 = call i32 @hipLaunchByPtr(ptr @_Z4kern7TempValIjE)
+
+// CHECK:  define internal void @__hip_register_globals(ptr %0) {
+// CHECK:%1 = call i32 @__hipRegisterFunction(ptr %0, ptr @_Z4kern7TempValIjE, ptr @0, ptr @0, i32 -1, ptr null, ptr null, ptr null, ptr null, ptr null)
+
+__global__ void kern(TempVal in_val) {
+}
+
Index: clang/lib/CodeGen/CGCUDANV.cpp
===
--- clang/lib/CodeGen/CGCUDANV.cpp
+++ clang/lib/CodeGen/CGCUDANV.cpp
@@ -49,10 +49,10 @@
 const Decl *D;
   };
   llvm::SmallVector EmittedKernels;
-  // Map a device stub function to a symbol for identifying kernel in host code.
+  // Map a kernel mangled name to a symbol for identifying kernel in host code
   // For CUDA, the symbol for identifying the kernel is the same as the device
   // stub function. For HIP, they are different.
-  llvm::DenseMap KernelHandles;
+  llvm::DenseMap KernelHandles;
   // Map a kernel handle to the kernel stub.
   llvm::DenseMap KernelStubs;
   struct VarInfo {
@@ -310,7 +310,8 @@
 void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction ,
  FunctionArgList ) {
   EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
-  if (auto *GV = dyn_cast(KernelHandles[CGF.CurFn])) {
+  if (auto *GV =
+  dyn_cast(KernelHandles[CGF.CurFn->getName()])) {
 GV->setLinkage(CGF.CurFn->getLinkage());
 GV->setInitializer(CGF.CurFn);
   }
@@ -400,8 +401,8 @@
ShmemSize.getPointer(), Stream.getPointer()});
 
   // Emit the call to cudaLaunch
-  llvm::Value *Kernel =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], VoidPtrTy);
+  llvm::Value *Kernel = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], VoidPtrTy);
   CallArgList LaunchKernelArgs;
   LaunchKernelArgs.add(RValue::get(Kernel),
cudaLaunchKernelFD->getParamDecl(0)->getType());
@@ -456,8 +457,8 @@
 
   // Emit the call to cudaLaunch
   llvm::FunctionCallee cudaLaunchFn = getLaunchFn();
-  llvm::Value *Arg =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], CharPtrTy);
+  llvm::Value *Arg = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], CharPtrTy);
   CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);
   CGF.EmitBranch(EndBlock);
 
@@ -551,7 +552,7 @@
 llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
 llvm::Value *Args[] = {
 ,
-Builder.CreateBitCast(KernelHandles[I.Kernel], VoidPtrTy),
+Builder.CreateBitCast(KernelHandles[I.Kernel->getName()], VoidPtrTy),
 KernelName,
 KernelName,
 llvm::ConstantInt::get(IntTy, -1),
@@ -1130,7 +1131,7 @@
   StringRef Section = CGM.getLangOpts().HIP ? "hip_offloading_entries"
 : "cuda_offloading_entries";
   for (KernelInfo  : EmittedKernels)
-OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel],
+OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel->getName()],
getDeviceSideName(cast(I.D)), 0,
DeviceVarFlags::OffloadGlobalEntry, Section);
 
@@ -1193,12 +1194,12 @@
 
 llvm::GlobalValue *CGNVCUDARuntime::getKernelHandle(llvm::Function *F,
 GlobalDecl GD) {
-  auto Loc = KernelHandles.find(F);
+  auto Loc = KernelHandles.find(F->getName());
   if (Loc != KernelHandles.end())
 return Loc->second;
 
   if (!CGM.getLangOpts().HIP) {
-KernelHandles[F] = F;
+KernelHandles[F->getName()] = F;
 KernelStubs[F] = F;
 return F;
   }
@@ -1212,7 +1213,7 @@
   Var->setDSOLocal(F->isDSOLocal());
   

[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-01-19 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu:22
+// CHECK:  define internal void @__hip_register_globals(ptr %0) {
+// CHECK:%1 = call i32 @__hipRegisterFunction(ptr %0, ptr 
@_Z4kern7TempValIjE, ptr @0, ptr @0, i32 -1, ptr null, ptr null, ptr null, ptr 
null, ptr null)
+

pls also check @0, which is the device side kernel name.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-01-19 Thread Daniele Castagna via Phabricator via cfe-commits
dcastagna updated this revision to Diff 490559.
dcastagna added a comment.

Address yaxunl comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

Files:
  clang/lib/CodeGen/CGCUDANV.cpp
  clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu

Index: clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
@@ -0,0 +1,26 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \
+// RUN: | FileCheck %s
+
+#define __global__ __attribute__((global))
+// CHECK: @_Z4kern7TempValIjE = constant ptr @_Z19__device_stub__kern7TempValIjE, align 8
+template 
+struct TempVal {
+  type value;
+};
+
+__global__ void kern(TempVal in_val);
+
+int main(int argc, char ** argv) {
+  auto* fptr = &(kern);
+// CHECK:   store ptr @_Z4kern7TempValIjE, ptr %fptr, align 8
+  return 0;
+}
+// CHECK:  define dso_local void @_Z19__device_stub__kern7TempValIjE(i32 %in_val.coerce) #1 {
+// CHECK:  %2 = call i32 @hipLaunchByPtr(ptr @_Z4kern7TempValIjE)
+
+// CHECK:  define internal void @__hip_register_globals(ptr %0) {
+// CHECK:%1 = call i32 @__hipRegisterFunction(ptr %0, ptr @_Z4kern7TempValIjE, ptr @0, ptr @0, i32 -1, ptr null, ptr null, ptr null, ptr null, ptr null)
+
+__global__ void kern(TempVal in_val) {
+}
+
Index: clang/lib/CodeGen/CGCUDANV.cpp
===
--- clang/lib/CodeGen/CGCUDANV.cpp
+++ clang/lib/CodeGen/CGCUDANV.cpp
@@ -49,10 +49,10 @@
 const Decl *D;
   };
   llvm::SmallVector EmittedKernels;
-  // Map a device stub function to a symbol for identifying kernel in host code.
+  // Map a kernel mangled name to a symbol for identifying kernel in host code
   // For CUDA, the symbol for identifying the kernel is the same as the device
   // stub function. For HIP, they are different.
-  llvm::DenseMap KernelHandles;
+  llvm::DenseMap KernelHandles;
   // Map a kernel handle to the kernel stub.
   llvm::DenseMap KernelStubs;
   struct VarInfo {
@@ -310,7 +310,8 @@
 void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction ,
  FunctionArgList ) {
   EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
-  if (auto *GV = dyn_cast(KernelHandles[CGF.CurFn])) {
+  if (auto *GV =
+  dyn_cast(KernelHandles[CGF.CurFn->getName()])) {
 GV->setLinkage(CGF.CurFn->getLinkage());
 GV->setInitializer(CGF.CurFn);
   }
@@ -400,8 +401,8 @@
ShmemSize.getPointer(), Stream.getPointer()});
 
   // Emit the call to cudaLaunch
-  llvm::Value *Kernel =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], VoidPtrTy);
+  llvm::Value *Kernel = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], VoidPtrTy);
   CallArgList LaunchKernelArgs;
   LaunchKernelArgs.add(RValue::get(Kernel),
cudaLaunchKernelFD->getParamDecl(0)->getType());
@@ -456,8 +457,8 @@
 
   // Emit the call to cudaLaunch
   llvm::FunctionCallee cudaLaunchFn = getLaunchFn();
-  llvm::Value *Arg =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], CharPtrTy);
+  llvm::Value *Arg = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], CharPtrTy);
   CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);
   CGF.EmitBranch(EndBlock);
 
@@ -551,7 +552,7 @@
 llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
 llvm::Value *Args[] = {
 ,
-Builder.CreateBitCast(KernelHandles[I.Kernel], VoidPtrTy),
+Builder.CreateBitCast(KernelHandles[I.Kernel->getName()], VoidPtrTy),
 KernelName,
 KernelName,
 llvm::ConstantInt::get(IntTy, -1),
@@ -1130,7 +1131,7 @@
   StringRef Section = CGM.getLangOpts().HIP ? "hip_offloading_entries"
 : "cuda_offloading_entries";
   for (KernelInfo  : EmittedKernels)
-OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel],
+OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel->getName()],
getDeviceSideName(cast(I.D)), 0,
DeviceVarFlags::OffloadGlobalEntry, Section);
 
@@ -1193,12 +1194,12 @@
 
 llvm::GlobalValue *CGNVCUDARuntime::getKernelHandle(llvm::Function *F,
 GlobalDecl GD) {
-  auto Loc = KernelHandles.find(F);
+  auto Loc = KernelHandles.find(F->getName());
   if (Loc != KernelHandles.end())
 return Loc->second;
 
   if (!CGM.getLangOpts().HIP) {
-KernelHandles[F] = F;
+KernelHandles[F->getName()] = F;
 KernelStubs[F] = F;
 return F;
   }
@@ -1212,7 +1213,7 @@
   Var->setDSOLocal(F->isDSOLocal());
   Var->setVisibility(F->getVisibility());
   CGM.maybeSetTrivialComdat(*GD.getDecl(), *Var);
-  KernelHandles[F] = Var;
+  

[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2023-01-10 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu:2
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \
+// RUN: | FileCheck %s
+

need to check `_Z19__device_stub__kern7TempValIjE` generates the correct call 
of hipLaunchKernel using the correct handle.

also need to check hipRegisterFunction uses the correct function name and 
handle.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140663/new/

https://reviews.llvm.org/D140663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140663: CUDA/HIP: Use kernel name to map to symbol

2022-12-25 Thread Daniele Castagna via Phabricator via cfe-commits
dcastagna created this revision.
Herald added subscribers: mattd, yaxunl.
Herald added a project: All.
dcastagna requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Currently CGCUDANV uses an llvm::Function as a key to map kernels to a
symbol in host code.  HIP adds one level of indirection and uses the
llvm::Function to map to a global variable that will be initialized to
the kernel stub ptr.

Unfortunately there is no garantee that the llvm::Function created
by GetOrCreateLLVMFunction will be the same.  In fact, the first
time we encounter GetOrCrateLLVMFunction for a kernel, the type
might not be completed yet, and the type of llvm::Function will be
a generic {}, since the complete type is not required to get a symbol
to a function.  In this case we end up creating two global variables,
one for the llvm::Function with the incomplete type and one for the
function with the complete type. The first global variable will be
declared by not defined, resulting in a linking error.

This change uses the mangled name of the llvm::Function as key in the
KernelHandles map, in this way the same llvm::Function will be
associated to the same kernel handle even if they types are different.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D140663

Files:
  clang/lib/CodeGen/CGCUDANV.cpp
  clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu

Index: clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDA/incomplete-func-ptr-type.cu
@@ -0,0 +1,21 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \
+// RUN: | FileCheck %s
+
+#define __global__ __attribute__((global))
+// CHECK: @_Z4kern7TempValIjE = constant ptr @_Z19__device_stub__kern7TempValIjE, align 8
+template 
+struct TempVal {
+  type value;
+};
+
+__global__ void kern(TempVal in_val);
+
+int main(int argc, char ** argv) {
+  auto* fptr = &(kern);
+// CHECK:   store ptr @_Z4kern7TempValIjE, ptr %fptr, align 8
+  return 0;
+}
+
+__global__ void kern(TempVal in_val) {
+}
+
Index: clang/lib/CodeGen/CGCUDANV.cpp
===
--- clang/lib/CodeGen/CGCUDANV.cpp
+++ clang/lib/CodeGen/CGCUDANV.cpp
@@ -49,10 +49,10 @@
 const Decl *D;
   };
   llvm::SmallVector EmittedKernels;
-  // Map a device stub function to a symbol for identifying kernel in host code.
+  // Map a kernel mangled name to a symbol for identifying kernel in host code
   // For CUDA, the symbol for identifying the kernel is the same as the device
   // stub function. For HIP, they are different.
-  llvm::DenseMap KernelHandles;
+  llvm::DenseMap KernelHandles;
   // Map a kernel handle to the kernel stub.
   llvm::DenseMap KernelStubs;
   struct VarInfo {
@@ -310,7 +310,8 @@
 void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction ,
  FunctionArgList ) {
   EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
-  if (auto *GV = dyn_cast(KernelHandles[CGF.CurFn])) {
+  if (auto *GV =
+  dyn_cast(KernelHandles[CGF.CurFn->getName()])) {
 GV->setLinkage(CGF.CurFn->getLinkage());
 GV->setInitializer(CGF.CurFn);
   }
@@ -400,8 +401,8 @@
ShmemSize.getPointer(), Stream.getPointer()});
 
   // Emit the call to cudaLaunch
-  llvm::Value *Kernel =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], VoidPtrTy);
+  llvm::Value *Kernel = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], VoidPtrTy);
   CallArgList LaunchKernelArgs;
   LaunchKernelArgs.add(RValue::get(Kernel),
cudaLaunchKernelFD->getParamDecl(0)->getType());
@@ -456,8 +457,8 @@
 
   // Emit the call to cudaLaunch
   llvm::FunctionCallee cudaLaunchFn = getLaunchFn();
-  llvm::Value *Arg =
-  CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], CharPtrTy);
+  llvm::Value *Arg = CGF.Builder.CreatePointerCast(
+  KernelHandles[CGF.CurFn->getName()], CharPtrTy);
   CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);
   CGF.EmitBranch(EndBlock);
 
@@ -551,7 +552,7 @@
 llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
 llvm::Value *Args[] = {
 ,
-Builder.CreateBitCast(KernelHandles[I.Kernel], VoidPtrTy),
+Builder.CreateBitCast(KernelHandles[I.Kernel->getName()], VoidPtrTy),
 KernelName,
 KernelName,
 llvm::ConstantInt::get(IntTy, -1),
@@ -1130,7 +1131,7 @@
   StringRef Section = CGM.getLangOpts().HIP ? "hip_offloading_entries"
 : "cuda_offloading_entries";
   for (KernelInfo  : EmittedKernels)
-OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel],
+OMPBuilder.emitOffloadingEntry(KernelHandles[I.Kernel->getName()],
getDeviceSideName(cast(I.D)), 0,