[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/yxsamliu closed https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/yxsamliu updated https://github.com/llvm/llvm-project/pull/81700 >From 1e9c29c4794ed6d60cf54c902cd0f854afd1aace Mon Sep 17 00:00:00 2001 From: "Yaxun (Sam) Liu" Date: Tue, 13 Feb 2024 10:00:21 -0500 Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc` `-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `-emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018 --- clang/lib/CodeGen/CGCUDANV.cpp| 22 +- clang/lib/CodeGen/CodeGenModule.cpp | 10 +- clang/lib/Driver/OffloadBundler.cpp | 40 ++- clang/lib/Driver/ToolChains/HIPUtility.cpp| 258 +- clang/test/CMakeLists.txt | 1 + clang/test/CodeGenCUDA/device-stub.cu | 10 +- .../test/CodeGenCUDA/host-used-device-var.cu | 5 +- clang/test/Driver/Inputs/hip.h| 25 ++ clang/test/Driver/clang-offload-bundler.c | 13 +- clang/test/Driver/hip-partial-link.hip| 97 +++ clang/test/Driver/hip-toolchain-rdc.hip | 38 ++- 11 files changed, 469 insertions(+), 50 deletions(-) create mode 100644 clang/test/Driver/Inputs/hip.h create mode 100644 clang/test/Driver/hip-partial-link.hip diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp index 5b43272bfa62f4..49f93451db7bbb 100644 --- a/clang/lib/CodeGen/CGCUDANV.cpp +++ b/clang/lib/CodeGen/CGCUDANV.cpp @@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // to contain the fat binary but will be populated somewhere else, // e.g. by lld through link script. FatBinStr = new llvm::GlobalVariable( -CGM.getModule(), CGM.Int8Ty, -/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, -"__hip_fatbin", nullptr, -llvm::GlobalVariable::NotThreadLocal); + CGM.getModule(), CGM.Int8Ty, + /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, + "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr, + llvm::GlobalVariable::NotThreadLocal); cast(FatBinStr)->setSection(FatbinConstantName); } @@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // thread safety of the loaded program. Therefore we can assume sequential // execution of constructor functions here. if (IsHIP) { -auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage : -llvm::GlobalValue::LinkOnceAnyLinkage; +auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage + : llvm::GlobalValue::ExternalLinkage; llvm::BasicBlock *IfBlock = llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc); llvm::BasicBlock *ExitBlock = @@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // of HIP ABI. GpuBinaryHandle = new llvm::GlobalVariable( TheModule, PtrTy, /*isConstant=*/false, Linkage, -/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy), -"__hip_gpubin_handle"); -if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage) - GpuBinaryHandle->setComdat( - CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName())); +/*Initializer=*/ +CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr, +CudaGpuBinary +? "__hip_gpubin_handle" +: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash()); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign()); // Prevent the weak symbol in different shared libraries being merged. if (Linkage != llvm::GlobalValue::
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
yxsamliu wrote: > This makes sense overall, though it's very complicated. Generally we just > need to make sure these things are private to one group of files. There's a > lot more to parse here compared to the `linker-wrapper`. > > Do any of these tests check when called with `-r`? I'm assuming that's the > difference here when we decide whether or not to ghoup files by cuid. Added a test for `-r` https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/yxsamliu updated https://github.com/llvm/llvm-project/pull/81700 >From d84645f035609e7ece76c1f2eb06637826b7b22a Mon Sep 17 00:00:00 2001 From: "Yaxun (Sam) Liu" Date: Tue, 13 Feb 2024 10:00:21 -0500 Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc` `-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `-emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018 --- clang/lib/CodeGen/CGCUDANV.cpp| 22 +- clang/lib/CodeGen/CodeGenModule.cpp | 10 +- clang/lib/Driver/OffloadBundler.cpp | 40 ++- clang/lib/Driver/ToolChains/HIPUtility.cpp| 258 +- clang/test/CodeGenCUDA/device-stub.cu | 10 +- .../test/CodeGenCUDA/host-used-device-var.cu | 5 +- clang/test/Driver/Inputs/hip.h| 25 ++ clang/test/Driver/clang-offload-bundler.c | 4 +- clang/test/Driver/hip-partial-link.hip| 97 +++ clang/test/Driver/hip-toolchain-rdc.hip | 38 ++- 10 files changed, 461 insertions(+), 48 deletions(-) create mode 100644 clang/test/Driver/Inputs/hip.h create mode 100644 clang/test/Driver/hip-partial-link.hip diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp index 5b43272bfa62f4..49f93451db7bbb 100644 --- a/clang/lib/CodeGen/CGCUDANV.cpp +++ b/clang/lib/CodeGen/CGCUDANV.cpp @@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // to contain the fat binary but will be populated somewhere else, // e.g. by lld through link script. FatBinStr = new llvm::GlobalVariable( -CGM.getModule(), CGM.Int8Ty, -/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, -"__hip_fatbin", nullptr, -llvm::GlobalVariable::NotThreadLocal); + CGM.getModule(), CGM.Int8Ty, + /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, + "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr, + llvm::GlobalVariable::NotThreadLocal); cast(FatBinStr)->setSection(FatbinConstantName); } @@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // thread safety of the loaded program. Therefore we can assume sequential // execution of constructor functions here. if (IsHIP) { -auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage : -llvm::GlobalValue::LinkOnceAnyLinkage; +auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage + : llvm::GlobalValue::ExternalLinkage; llvm::BasicBlock *IfBlock = llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc); llvm::BasicBlock *ExitBlock = @@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // of HIP ABI. GpuBinaryHandle = new llvm::GlobalVariable( TheModule, PtrTy, /*isConstant=*/false, Linkage, -/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy), -"__hip_gpubin_handle"); -if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage) - GpuBinaryHandle->setComdat( - CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName())); +/*Initializer=*/ +CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr, +CudaGpuBinary +? "__hip_gpubin_handle" +: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash()); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign()); // Prevent the weak symbol in different shared libraries being merged. if (Linkage != llvm::GlobalValue::InternalLinkage) diff --git a/clang/lib/CodeGen/CodeGen
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple &T, : T.normalize(); } +// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all +// input object or archive files. +class HIPUndefinedFatBinSymbols { +public: + HIPUndefinedFatBinSymbols(const Compilation &C) + : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID( + DiagnosticsEngine::Error, + "Error collecting HIP undefined fatbin symbols: %0")), +Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)), +Verbose(C.getArgs().hasArg(options::OPT_v)) { +populateSymbols(); +if (Verbose) { + for (auto Name : FatBinSymbols) +llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n"; + for (auto Name : GPUBinHandleSymbols) +llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name + << "\n"; +} + } + + const std::set &getFatBinSymbols() const { +return FatBinSymbols; + } + + const std::set &getGPUBinHandleSymbols() const { +return GPUBinHandleSymbols; + } + +private: + const Compilation &C; + unsigned DiagID; + bool Quiet; + bool Verbose; + std::set FatBinSymbols; + std::set GPUBinHandleSymbols; + const std::string FatBinPrefix = "__hip_fatbin"; + const std::string GPUBinHandlePrefix = "__hip_gpubin_handle"; + + void populateSymbols() { +std::deque WorkList; +std::set Visited; + +for (const auto &Action : C.getActions()) { + WorkList.push_back(Action); +} + +while (!WorkList.empty()) { + const Action *CurrentAction = WorkList.front(); + WorkList.pop_front(); + + if (!CurrentAction || !Visited.insert(CurrentAction).second) +continue; + + if (const auto *IA = dyn_cast(CurrentAction)) { +std::string ID = IA->getId().str(); +if (!ID.empty()) { + ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true); + FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str()); + GPUBinHandleSymbols.insert( + Twine(GPUBinHandlePrefix + "_" + ID).str()); + continue; +} +const char *Filename = IA->getInputArg().getValue(); +auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename); +// Input action could be options to linker, therefore ignore it +// if cannot read it. +if (!BufferOrErr) + continue; + +processInput(BufferOrErr.get()->getMemBufferRef()); + } else +WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(), +CurrentAction->getInputs().end()); +} + } + + void processInput(const llvm::MemoryBufferRef &Buffer) { +// Try processing as object file first. +auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer); +if (ObjFileOrErr) { + processSymbols(**ObjFileOrErr); + return; +} + +// Then try processing as archive files. +llvm::consumeError(ObjFileOrErr.takeError()); +auto ArchiveOrErr = llvm::object::Archive::create(Buffer); +if (ArchiveOrErr) { + llvm::Error Err = llvm::Error::success(); + llvm::object::Archive &Archive = *ArchiveOrErr.get(); + for (auto &Child : Archive.children(Err)) { +auto ChildBufOrErr = Child.getMemoryBufferRef(); +if (ChildBufOrErr) + processInput(*ChildBufOrErr); +else + errorHandler(ChildBufOrErr.takeError()); + } + + if (Err) +errorHandler(std::move(Err)); + return; +} + +// Ignore other files. +llvm::consumeError(ArchiveOrErr.takeError()); + } + void processSymbols(const llvm::object::ObjectFile &Obj) { +for (const auto &Symbol : Obj.symbols()) { + auto FlagOrErr = Symbol.getFlags(); + if (!FlagOrErr) { +errorHandler(FlagOrErr.takeError()); +continue; + } + + // Filter only undefined symbols + if (!(FlagOrErr.get() & llvm::object::SymbolRef::SF_Undefined)) { yxsamliu wrote: will do https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple &T, : T.normalize(); } +// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all +// input object or archive files. +class HIPUndefinedFatBinSymbols { +public: + HIPUndefinedFatBinSymbols(const Compilation &C) + : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID( + DiagnosticsEngine::Error, + "Error collecting HIP undefined fatbin symbols: %0")), +Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)), +Verbose(C.getArgs().hasArg(options::OPT_v)) { +populateSymbols(); +if (Verbose) { + for (auto Name : FatBinSymbols) +llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n"; + for (auto Name : GPUBinHandleSymbols) +llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name + << "\n"; +} + } + + const std::set &getFatBinSymbols() const { +return FatBinSymbols; + } + + const std::set &getGPUBinHandleSymbols() const { +return GPUBinHandleSymbols; + } + +private: + const Compilation &C; + unsigned DiagID; + bool Quiet; + bool Verbose; + std::set FatBinSymbols; + std::set GPUBinHandleSymbols; + const std::string FatBinPrefix = "__hip_fatbin"; + const std::string GPUBinHandlePrefix = "__hip_gpubin_handle"; + + void populateSymbols() { +std::deque WorkList; +std::set Visited; + +for (const auto &Action : C.getActions()) { + WorkList.push_back(Action); +} + +while (!WorkList.empty()) { + const Action *CurrentAction = WorkList.front(); + WorkList.pop_front(); + + if (!CurrentAction || !Visited.insert(CurrentAction).second) +continue; + + if (const auto *IA = dyn_cast(CurrentAction)) { +std::string ID = IA->getId().str(); +if (!ID.empty()) { + ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true); + FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str()); + GPUBinHandleSymbols.insert( + Twine(GPUBinHandlePrefix + "_" + ID).str()); + continue; +} +const char *Filename = IA->getInputArg().getValue(); +auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename); +// Input action could be options to linker, therefore ignore it +// if cannot read it. +if (!BufferOrErr) + continue; + +processInput(BufferOrErr.get()->getMemBufferRef()); + } else +WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(), +CurrentAction->getInputs().end()); +} + } + + void processInput(const llvm::MemoryBufferRef &Buffer) { +// Try processing as object file first. +auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer); +if (ObjFileOrErr) { + processSymbols(**ObjFileOrErr); + return; +} + +// Then try processing as archive files. +llvm::consumeError(ObjFileOrErr.takeError()); +auto ArchiveOrErr = llvm::object::Archive::create(Buffer); +if (ArchiveOrErr) { + llvm::Error Err = llvm::Error::success(); + llvm::object::Archive &Archive = *ArchiveOrErr.get(); + for (auto &Child : Archive.children(Err)) { +auto ChildBufOrErr = Child.getMemoryBufferRef(); +if (ChildBufOrErr) + processInput(*ChildBufOrErr); +else + errorHandler(ChildBufOrErr.takeError()); + } + + if (Err) +errorHandler(std::move(Err)); + return; +} + +// Ignore other files. +llvm::consumeError(ArchiveOrErr.takeError()); + } + void processSymbols(const llvm::object::ObjectFile &Obj) { yxsamliu wrote: will do https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple &T, : T.normalize(); } +// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all +// input object or archive files. +class HIPUndefinedFatBinSymbols { +public: + HIPUndefinedFatBinSymbols(const Compilation &C) + : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID( + DiagnosticsEngine::Error, + "Error collecting HIP undefined fatbin symbols: %0")), +Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)), +Verbose(C.getArgs().hasArg(options::OPT_v)) { +populateSymbols(); +if (Verbose) { + for (auto Name : FatBinSymbols) +llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n"; + for (auto Name : GPUBinHandleSymbols) +llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name + << "\n"; +} + } + + const std::set &getFatBinSymbols() const { +return FatBinSymbols; + } + + const std::set &getGPUBinHandleSymbols() const { +return GPUBinHandleSymbols; + } + +private: + const Compilation &C; + unsigned DiagID; + bool Quiet; + bool Verbose; + std::set FatBinSymbols; + std::set GPUBinHandleSymbols; + const std::string FatBinPrefix = "__hip_fatbin"; + const std::string GPUBinHandlePrefix = "__hip_gpubin_handle"; + + void populateSymbols() { +std::deque WorkList; +std::set Visited; + +for (const auto &Action : C.getActions()) { + WorkList.push_back(Action); +} + +while (!WorkList.empty()) { + const Action *CurrentAction = WorkList.front(); + WorkList.pop_front(); + + if (!CurrentAction || !Visited.insert(CurrentAction).second) +continue; + + if (const auto *IA = dyn_cast(CurrentAction)) { +std::string ID = IA->getId().str(); +if (!ID.empty()) { + ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true); + FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str()); + GPUBinHandleSymbols.insert( + Twine(GPUBinHandlePrefix + "_" + ID).str()); + continue; +} +const char *Filename = IA->getInputArg().getValue(); +auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename); +// Input action could be options to linker, therefore ignore it +// if cannot read it. yxsamliu wrote: If the argument is an input file and it fails to be read, the error will be captured by the linker and error will be emitted there. will fix the comment. https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/jhuber6 commented: This makes sense overall, though it's very complicated. Generally we just need to make sure these things are private to one group of files. There's a lot more to parse here compared to the `linker-wrapper`. Do any of these tests check when called with `-r`? I'm assuming that's the difference here when we decide whether or not to ghoup files by cuid. https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/Artem-B approved this pull request. Overall LGTM. Please wait for @jhuber6's to double check the partial linking mechanics details. https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple &T, : T.normalize(); } +// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all +// input object or archive files. +class HIPUndefinedFatBinSymbols { +public: + HIPUndefinedFatBinSymbols(const Compilation &C) + : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID( + DiagnosticsEngine::Error, + "Error collecting HIP undefined fatbin symbols: %0")), +Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)), +Verbose(C.getArgs().hasArg(options::OPT_v)) { +populateSymbols(); +if (Verbose) { + for (auto Name : FatBinSymbols) +llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n"; + for (auto Name : GPUBinHandleSymbols) +llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name + << "\n"; +} + } + + const std::set &getFatBinSymbols() const { +return FatBinSymbols; + } + + const std::set &getGPUBinHandleSymbols() const { +return GPUBinHandleSymbols; + } + +private: + const Compilation &C; + unsigned DiagID; + bool Quiet; + bool Verbose; + std::set FatBinSymbols; + std::set GPUBinHandleSymbols; + const std::string FatBinPrefix = "__hip_fatbin"; + const std::string GPUBinHandlePrefix = "__hip_gpubin_handle"; + + void populateSymbols() { +std::deque WorkList; +std::set Visited; + +for (const auto &Action : C.getActions()) { + WorkList.push_back(Action); +} + +while (!WorkList.empty()) { + const Action *CurrentAction = WorkList.front(); + WorkList.pop_front(); + + if (!CurrentAction || !Visited.insert(CurrentAction).second) +continue; + + if (const auto *IA = dyn_cast(CurrentAction)) { +std::string ID = IA->getId().str(); +if (!ID.empty()) { + ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true); + FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str()); + GPUBinHandleSymbols.insert( + Twine(GPUBinHandlePrefix + "_" + ID).str()); + continue; +} +const char *Filename = IA->getInputArg().getValue(); +auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename); +// Input action could be options to linker, therefore ignore it +// if cannot read it. +if (!BufferOrErr) + continue; + +processInput(BufferOrErr.get()->getMemBufferRef()); + } else +WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(), +CurrentAction->getInputs().end()); +} + } + + void processInput(const llvm::MemoryBufferRef &Buffer) { +// Try processing as object file first. +auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer); +if (ObjFileOrErr) { + processSymbols(**ObjFileOrErr); + return; +} + +// Then try processing as archive files. +llvm::consumeError(ObjFileOrErr.takeError()); +auto ArchiveOrErr = llvm::object::Archive::create(Buffer); +if (ArchiveOrErr) { + llvm::Error Err = llvm::Error::success(); + llvm::object::Archive &Archive = *ArchiveOrErr.get(); + for (auto &Child : Archive.children(Err)) { +auto ChildBufOrErr = Child.getMemoryBufferRef(); +if (ChildBufOrErr) + processInput(*ChildBufOrErr); +else + errorHandler(ChildBufOrErr.takeError()); + } + + if (Err) +errorHandler(std::move(Err)); + return; +} + +// Ignore other files. +llvm::consumeError(ArchiveOrErr.takeError()); + } + void processSymbols(const llvm::object::ObjectFile &Obj) { +for (const auto &Symbol : Obj.symbols()) { + auto FlagOrErr = Symbol.getFlags(); + if (!FlagOrErr) { +errorHandler(FlagOrErr.takeError()); +continue; + } + + // Filter only undefined symbols + if (!(FlagOrErr.get() & llvm::object::SymbolRef::SF_Undefined)) { Artem-B wrote: style nit: remove `{}` around single-statement body. Applies here and in a handful of other places throughout the patch. https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple &T, : T.normalize(); } +// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all +// input object or archive files. +class HIPUndefinedFatBinSymbols { +public: + HIPUndefinedFatBinSymbols(const Compilation &C) + : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID( + DiagnosticsEngine::Error, + "Error collecting HIP undefined fatbin symbols: %0")), +Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)), +Verbose(C.getArgs().hasArg(options::OPT_v)) { +populateSymbols(); +if (Verbose) { + for (auto Name : FatBinSymbols) +llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n"; + for (auto Name : GPUBinHandleSymbols) +llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name + << "\n"; +} + } + + const std::set &getFatBinSymbols() const { +return FatBinSymbols; + } + + const std::set &getGPUBinHandleSymbols() const { +return GPUBinHandleSymbols; + } + +private: + const Compilation &C; + unsigned DiagID; + bool Quiet; + bool Verbose; + std::set FatBinSymbols; + std::set GPUBinHandleSymbols; + const std::string FatBinPrefix = "__hip_fatbin"; + const std::string GPUBinHandlePrefix = "__hip_gpubin_handle"; + + void populateSymbols() { +std::deque WorkList; +std::set Visited; + +for (const auto &Action : C.getActions()) { + WorkList.push_back(Action); +} + +while (!WorkList.empty()) { + const Action *CurrentAction = WorkList.front(); + WorkList.pop_front(); + + if (!CurrentAction || !Visited.insert(CurrentAction).second) +continue; + + if (const auto *IA = dyn_cast(CurrentAction)) { +std::string ID = IA->getId().str(); +if (!ID.empty()) { + ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true); + FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str()); + GPUBinHandleSymbols.insert( + Twine(GPUBinHandlePrefix + "_" + ID).str()); + continue; +} +const char *Filename = IA->getInputArg().getValue(); +auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename); +// Input action could be options to linker, therefore ignore it +// if cannot read it. Artem-B wrote: Comment could use some editing. `therefore, ignore an error if we fail to read the file`. This makes me ask -- what if the argument *is* an input file, and we do fail to read it. How do we tell apart the linker options from the input file? Relying on a failure to read it does not seem to be a good way to handle it. https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple &T, : T.normalize(); } +// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all +// input object or archive files. +class HIPUndefinedFatBinSymbols { +public: + HIPUndefinedFatBinSymbols(const Compilation &C) + : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID( + DiagnosticsEngine::Error, + "Error collecting HIP undefined fatbin symbols: %0")), +Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)), +Verbose(C.getArgs().hasArg(options::OPT_v)) { +populateSymbols(); +if (Verbose) { + for (auto Name : FatBinSymbols) +llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n"; + for (auto Name : GPUBinHandleSymbols) +llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name + << "\n"; +} + } + + const std::set &getFatBinSymbols() const { +return FatBinSymbols; + } + + const std::set &getGPUBinHandleSymbols() const { +return GPUBinHandleSymbols; + } + +private: + const Compilation &C; + unsigned DiagID; + bool Quiet; + bool Verbose; + std::set FatBinSymbols; + std::set GPUBinHandleSymbols; + const std::string FatBinPrefix = "__hip_fatbin"; + const std::string GPUBinHandlePrefix = "__hip_gpubin_handle"; + + void populateSymbols() { +std::deque WorkList; +std::set Visited; + +for (const auto &Action : C.getActions()) { + WorkList.push_back(Action); +} + +while (!WorkList.empty()) { + const Action *CurrentAction = WorkList.front(); + WorkList.pop_front(); + + if (!CurrentAction || !Visited.insert(CurrentAction).second) +continue; + + if (const auto *IA = dyn_cast(CurrentAction)) { +std::string ID = IA->getId().str(); +if (!ID.empty()) { + ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true); + FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str()); + GPUBinHandleSymbols.insert( + Twine(GPUBinHandlePrefix + "_" + ID).str()); + continue; +} +const char *Filename = IA->getInputArg().getValue(); +auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename); +// Input action could be options to linker, therefore ignore it +// if cannot read it. +if (!BufferOrErr) + continue; + +processInput(BufferOrErr.get()->getMemBufferRef()); + } else +WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(), +CurrentAction->getInputs().end()); +} + } + + void processInput(const llvm::MemoryBufferRef &Buffer) { +// Try processing as object file first. +auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer); +if (ObjFileOrErr) { + processSymbols(**ObjFileOrErr); + return; +} + +// Then try processing as archive files. +llvm::consumeError(ObjFileOrErr.takeError()); +auto ArchiveOrErr = llvm::object::Archive::create(Buffer); +if (ArchiveOrErr) { + llvm::Error Err = llvm::Error::success(); + llvm::object::Archive &Archive = *ArchiveOrErr.get(); + for (auto &Child : Archive.children(Err)) { +auto ChildBufOrErr = Child.getMemoryBufferRef(); +if (ChildBufOrErr) + processInput(*ChildBufOrErr); +else + errorHandler(ChildBufOrErr.takeError()); + } + + if (Err) +errorHandler(std::move(Err)); + return; +} + +// Ignore other files. +llvm::consumeError(ArchiveOrErr.takeError()); + } + void processSymbols(const llvm::object::ObjectFile &Obj) { Artem-B wrote: Nit -- add an empty line to separate functions. https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/yxsamliu updated https://github.com/llvm/llvm-project/pull/81700 >From 6006975bbff9fc0f6fb9b8e24a52d4963ceb774c Mon Sep 17 00:00:00 2001 From: "Yaxun (Sam) Liu" Date: Tue, 13 Feb 2024 10:00:21 -0500 Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc` `-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `-emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018 Change-Id: Ia16ac3ddb05b66e6149288aacb0ba4a80120ad8c --- clang/lib/CodeGen/CGCUDANV.cpp| 22 +- clang/lib/CodeGen/CodeGenModule.cpp | 10 +- clang/lib/Driver/ToolChains/HIPUtility.cpp| 237 +- clang/test/CodeGenCUDA/device-stub.cu | 10 +- .../test/CodeGenCUDA/host-used-device-var.cu | 5 +- clang/test/Driver/hip-toolchain-rdc.hip | 38 ++- 6 files changed, 278 insertions(+), 44 deletions(-) diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp index 5b43272bfa62f4..49f93451db7bbb 100644 --- a/clang/lib/CodeGen/CGCUDANV.cpp +++ b/clang/lib/CodeGen/CGCUDANV.cpp @@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // to contain the fat binary but will be populated somewhere else, // e.g. by lld through link script. FatBinStr = new llvm::GlobalVariable( -CGM.getModule(), CGM.Int8Ty, -/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, -"__hip_fatbin", nullptr, -llvm::GlobalVariable::NotThreadLocal); + CGM.getModule(), CGM.Int8Ty, + /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, + "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr, + llvm::GlobalVariable::NotThreadLocal); cast(FatBinStr)->setSection(FatbinConstantName); } @@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // thread safety of the loaded program. Therefore we can assume sequential // execution of constructor functions here. if (IsHIP) { -auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage : -llvm::GlobalValue::LinkOnceAnyLinkage; +auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage + : llvm::GlobalValue::ExternalLinkage; llvm::BasicBlock *IfBlock = llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc); llvm::BasicBlock *ExitBlock = @@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // of HIP ABI. GpuBinaryHandle = new llvm::GlobalVariable( TheModule, PtrTy, /*isConstant=*/false, Linkage, -/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy), -"__hip_gpubin_handle"); -if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage) - GpuBinaryHandle->setComdat( - CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName())); +/*Initializer=*/ +CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr, +CudaGpuBinary +? "__hip_gpubin_handle" +: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash()); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign()); // Prevent the weak symbol in different shared libraries being merged. if (Linkage != llvm::GlobalValue::InternalLinkage) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index c984260b082cd1..218066bced6c19 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -919,7 +919,15 @@ void CodeGenModule::Release() { llvm::ConstantArray::get(ATy, UsedArray), "__cla
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff ec5f4a4bc6f27b044bc73668414ecefe9690d283 318117089831345caa13d8b4eeea23d0aa2c8588 -- clang/lib/CodeGen/CGCUDANV.cpp clang/lib/CodeGen/CodeGenModule.cpp clang/lib/Driver/ToolChains/HIPUtility.cpp `` View the diff from clang-format here. ``diff diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp index 7d23f94473..49f93451db 100644 --- a/clang/lib/CodeGen/CGCUDANV.cpp +++ b/clang/lib/CodeGen/CGCUDANV.cpp @@ -827,7 +827,7 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { GpuBinaryHandle = new llvm::GlobalVariable( TheModule, PtrTy, /*isConstant=*/false, Linkage, /*Initializer=*/ -CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr, +CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr, CudaGpuBinary ? "__hip_gpubin_handle" : "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash()); diff --git a/clang/lib/Driver/ToolChains/HIPUtility.cpp b/clang/lib/Driver/ToolChains/HIPUtility.cpp index 4bd6926ec6..67519a8a54 100644 --- a/clang/lib/Driver/ToolChains/HIPUtility.cpp +++ b/clang/lib/Driver/ToolChains/HIPUtility.cpp @@ -56,7 +56,7 @@ public: DiagnosticsEngine::Error, "Error collecting HIP undefined fatbin symbols: %0")), Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)), -Verbose(C.getArgs().hasArg(options::OPT_v)){ +Verbose(C.getArgs().hasArg(options::OPT_v)) { populateSymbols(); if (Verbose) { for (auto Name : FatBinSymbols) @@ -284,17 +284,12 @@ void HIP::constructGenerateObjFileFromHIPFatBinary( HIPUndefinedFatBinSymbols Symbols(C); std::string PrimaryHipFatbinSymbol; - std::string - PrimaryGpuBinHandleSymbol; - bool FoundPrimaryHipFatbinSymbol = - false; - bool FoundPrimaryGpuBinHandleSymbol = - false; - - std::vector - AliasHipFatbinSymbols; - std::vector - AliasGpuBinHandleSymbols; + std::string PrimaryGpuBinHandleSymbol; + bool FoundPrimaryHipFatbinSymbol = false; + bool FoundPrimaryGpuBinHandleSymbol = false; + + std::vector AliasHipFatbinSymbols; + std::vector AliasGpuBinHandleSymbols; // Iterate through symbols to find the primary ones and collect others for // aliasing `` https://github.com/llvm/llvm-project/pull/81700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Yaxun (Sam) Liu (yxsamliu) Changes `-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `-emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018 --- Full diff: https://github.com/llvm/llvm-project/pull/81700.diff 6 Files Affected: - (modified) clang/lib/CodeGen/CGCUDANV.cpp (+11-11) - (modified) clang/lib/CodeGen/CodeGenModule.cpp (+9-1) - (modified) clang/lib/Driver/ToolChains/HIPUtility.cpp (+229-13) - (modified) clang/test/CodeGenCUDA/device-stub.cu (+4-6) - (modified) clang/test/CodeGenCUDA/host-used-device-var.cu (+3-2) - (modified) clang/test/Driver/hip-toolchain-rdc.hip (+27-11) ``diff diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp index 5b43272bfa62f4..7d23f944732dbf 100644 --- a/clang/lib/CodeGen/CGCUDANV.cpp +++ b/clang/lib/CodeGen/CGCUDANV.cpp @@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // to contain the fat binary but will be populated somewhere else, // e.g. by lld through link script. FatBinStr = new llvm::GlobalVariable( -CGM.getModule(), CGM.Int8Ty, -/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, -"__hip_fatbin", nullptr, -llvm::GlobalVariable::NotThreadLocal); + CGM.getModule(), CGM.Int8Ty, + /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, + "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr, + llvm::GlobalVariable::NotThreadLocal); cast(FatBinStr)->setSection(FatbinConstantName); } @@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // thread safety of the loaded program. Therefore we can assume sequential // execution of constructor functions here. if (IsHIP) { -auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage : -llvm::GlobalValue::LinkOnceAnyLinkage; +auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage + : llvm::GlobalValue::ExternalLinkage; llvm::BasicBlock *IfBlock = llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc); llvm::BasicBlock *ExitBlock = @@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // of HIP ABI. GpuBinaryHandle = new llvm::GlobalVariable( TheModule, PtrTy, /*isConstant=*/false, Linkage, -/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy), -"__hip_gpubin_handle"); -if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage) - GpuBinaryHandle->setComdat( - CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName())); +/*Initializer=*/ +CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr, +CudaGpuBinary +? "__hip_gpubin_handle" +: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash()); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign()); // Prevent the weak symbol in different shared libraries being merged. if (Linkage != llvm::GlobalValue::InternalLinkage) diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index c984260b082cd1..218066bced6c19 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -919,7 +919,15 @@ void CodeGenModule::Release() { llvm::ConstantArray::get(ATy, UsedArray), "__clang_gpu_used_external"); addCompilerUsedGlobal(GV); } - + if (LangOpts.HIP) { +// Emit a unique ID so that host and device binaries from the same
[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)
https://github.com/yxsamliu created https://github.com/llvm/llvm-project/pull/81700 `-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `-emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018 >From 318117089831345caa13d8b4eeea23d0aa2c8588 Mon Sep 17 00:00:00 2001 From: "Yaxun (Sam) Liu" Date: Tue, 13 Feb 2024 10:00:21 -0500 Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc` `-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `-emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018 --- clang/lib/CodeGen/CGCUDANV.cpp| 22 +- clang/lib/CodeGen/CodeGenModule.cpp | 10 +- clang/lib/Driver/ToolChains/HIPUtility.cpp| 242 +- clang/test/CodeGenCUDA/device-stub.cu | 10 +- .../test/CodeGenCUDA/host-used-device-var.cu | 5 +- clang/test/Driver/hip-toolchain-rdc.hip | 38 ++- 6 files changed, 283 insertions(+), 44 deletions(-) diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp index 5b43272bfa62f4..7d23f944732dbf 100644 --- a/clang/lib/CodeGen/CGCUDANV.cpp +++ b/clang/lib/CodeGen/CGCUDANV.cpp @@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // to contain the fat binary but will be populated somewhere else, // e.g. by lld through link script. FatBinStr = new llvm::GlobalVariable( -CGM.getModule(), CGM.Int8Ty, -/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, -"__hip_fatbin", nullptr, -llvm::GlobalVariable::NotThreadLocal); + CGM.getModule(), CGM.Int8Ty, + /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr, + "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr, + llvm::GlobalVariable::NotThreadLocal); cast(FatBinStr)->setSection(FatbinConstantName); } @@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { // thread safety of the loaded program. Therefore we can assume sequential // execution of constructor functions here. if (IsHIP) { -auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage : -llvm::GlobalValue::LinkOnceAnyLinkage; +auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage +