[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-11-08 Thread Deepak Eachempati via Phabricator via cfe-commits
dreachem added subscribers: dhruvachak, dreachem.
dreachem added a comment.
Herald added a subscriber: MaskRay.
Herald added a project: All.

@jdoerfert @tianshilei1992 @atmnpatel @dhruvachak

Is the target to get this merged in for LLVM 16? Does the VGPU implementation 
provide a way to support OMPT callbacks for various constructs (parallel, 
worksharing, barriers, etc.)?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-02-03 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

We can merge runtime first, build it in isolation, then libomptarget host 
runtime, then clang.

Also make sure to adjust the commit messages


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-02-03 Thread Shilei Tian via Phabricator via cfe-commits
tianshilei1992 added a comment.

Not sure if it's good to merge such a large patch. We could potentially split 
the patch to three independent patches: tool chain, device runtime, and the 
OpenMPOpt pass to support expansion of shared variable (which for some reason 
is not included in this patch. That is actually very important component 
otherwise the backend will complain about it).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-02-03 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert accepted this revision.
jdoerfert added a comment.
This revision is now accepted and ready to land.

LG, with some things to address before the merge though.

Didn't we have a pass to expand shared memory (and such)?




Comment at: clang/lib/Basic/TargetInfo.cpp:155
+
+  if (Triple.getVendor() == llvm::Triple::OpenMP_VGPU)
+AddrSpaceMap = ::omp::OpenMPVGPUAddrSpaceMap;

use isOpenMPVGPU



Comment at: clang/lib/Basic/Targets/X86.h:395
+return llvm::omp::VirtualGpuGridValues;
+  }
 };

Do we need the changes in this file at all? I couldn't see why.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1125
+llvm::GlobalValue::LinkageTypes Linkage) {
+  if (CGM.getTarget().getTriple().getVendor() == llvm::Triple::OpenMP_VGPU)
+return CGOpenMPRuntime::createOffloadEntry(ID, Addr, Size, Flags, Linkage);

isOpenMPVGPU



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:252
   default:
-if (LangOpts.OpenMPSimd)
+if (getTriple().getVendor() == llvm::Triple::OpenMP_VGPU)
+  OpenMPRuntime.reset(new CGOpenMPRuntimeGPU(*this));

isOpenMPVGPU



Comment at: clang/lib/Driver/ToolChains/Gnu.cpp:3076
+
+  if (getTriple().getVendor() == llvm::Triple::OpenMP_VGPU) {
+std::string BitcodeSuffix = getTripleString() + "-openmp_vgpu";

isOpenMPVGPU



Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:323
+constexpr uint32_t UNSET = 0;
+constexpr uint32_t SET = 1;
+

Remove these. Also the TODO below (copied from somewhere)



Comment at: openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.cpp:85
+ CTAEnvironmentTy *CTAE)
+: ThreadIdInWarp(Idx++ % WE->getNumThreads()),
+  ThreadIdInBlock(WE->getWarpId() * WE->getNumThreads() + ThreadIdInWarp),

This is racy, I think. Can we use atomic_add for all these Idx updates or pass 
the Id from the outside?



Comment at: openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.h:118
+
+  // FIXME: This is wrong
+  LaneMaskTy getActiveMask() const;

at least add more information what the problem and potential solutions are.



Comment at: openmp/libomptarget/plugins/vgpu/src/rtl.cpp:271
+ ThreadIdx++) {
+  Threads.emplace_back([this, GlobalThreadIdx, CTAEnv, WarpEnv]() {
+ThreadEnvironment = new ThreadEnvironmentTy(WarpEnv, CTAEnv);

Move the lambda into a helper function. indention of 12 is too much.



Comment at: openmp/libomptarget/plugins/vgpu/src/rtl.cpp:313
+  });
+  GlobalThreadIdx = (GlobalThreadIdx + 1) % NumThreads;
+}

When do we have more threads than NumThreads? 



Comment at: openmp/libomptarget/plugins/vgpu/src/rtl.cpp:554
+
+int32_t __tgt_rtl_data_delete(int32_t device_id, void *tgt_ptr) {
+  free(tgt_ptr);

if we need for submit/retrieve, I'd assume to wait here too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-02-02 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel added inline comments.



Comment at: openmp/libomptarget/test/CMakeLists.txt:23
+continue()
+  ENDIF()
   string(STRIP "${CURRENT_TARGET}" CURRENT_TARGET)

jdoerfert wrote:
> This is to disable the tests? Not sure this is a good way though. For one, 
> can we check against -vgpu not x86, also openmp-vgpu or something, right?
Yep


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-02-02 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel updated this revision to Diff 405407.
atmnpatel marked 7 inline comments as done.
atmnpatel added a comment.

updates


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

Files:
  clang/lib/Basic/TargetInfo.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Gnu.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  llvm/include/llvm/ADT/Triple.h
  llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
  llvm/lib/Support/Triple.cpp
  openmp/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/include/ThreadEnvironment.h
  openmp/libomptarget/DeviceRTL/src/Debug.cpp
  openmp/libomptarget/DeviceRTL/src/Mapping.cpp
  openmp/libomptarget/DeviceRTL/src/Misc.cpp
  openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
  openmp/libomptarget/DeviceRTL/src/Utils.cpp
  openmp/libomptarget/plugins/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.cpp
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.h
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.cpp
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.h
  openmp/libomptarget/plugins/vgpu/src/rtl.cpp
  openmp/libomptarget/src/rtl.cpp
  openmp/libomptarget/test/CMakeLists.txt

Index: openmp/libomptarget/test/CMakeLists.txt
===
--- openmp/libomptarget/test/CMakeLists.txt
+++ openmp/libomptarget/test/CMakeLists.txt
@@ -18,6 +18,9 @@
 
 string(REGEX MATCHALL "([^\ ]+\ |[^\ ]+$)" SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}")
 foreach(CURRENT_TARGET IN LISTS SYSTEM_TARGETS)
+  IF ("${CURRENT_TARGET}" MATCHES "-vgpu")
+continue()
+  ENDIF()
   string(STRIP "${CURRENT_TARGET}" CURRENT_TARGET)
   add_openmp_testsuite(check-libomptarget-${CURRENT_TARGET}
 "Running libomptarget tests"
Index: openmp/libomptarget/src/rtl.cpp
===
--- openmp/libomptarget/src/rtl.cpp
+++ openmp/libomptarget/src/rtl.cpp
@@ -21,17 +21,22 @@
 #include 
 #include 
 
-// List of all plugins that can support offloading.
-static const char *RTLNames[] = {
-/* PowerPC target   */ "libomptarget.rtl.ppc64.so",
-/* x86_64 target*/ "libomptarget.rtl.x86_64.so",
-/* CUDA target  */ "libomptarget.rtl.cuda.so",
-/* AArch64 target   */ "libomptarget.rtl.aarch64.so",
-/* SX-Aurora VE target  */ "libomptarget.rtl.ve.so",
-/* AMDGPU target*/ "libomptarget.rtl.amdgpu.so",
-/* Remote target*/ "libomptarget.rtl.rpc.so",
+struct PluginInfoTy {
+  std::string Name;
+  bool IsHost;
 };
 
+// List of all plugins that can support offloading.
+static const PluginInfoTy Plugins[] = {
+/* PowerPC target   */ {"libomptarget.rtl.ppc64.so", true},
+/* x86_64 target*/ {"libomptarget.rtl.x86_64.so", true},
+/* CUDA target  */ {"libomptarget.rtl.cuda.so", false},
+/* AArch64 target   */ {"libomptarget.rtl.aarch64.so", true},
+/* SX-Aurora VE target  */ {"libomptarget.rtl.ve.so", false},
+/* AMDGPU target*/ {"libomptarget.rtl.amdgpu.so", false},
+/* Remote target*/ {"libomptarget.rtl.rpc.so", false},
+/* Virtual GPU target   */ {"libomptarget.rtl.vgpu.so", false}};
+
 PluginManager *PM;
 
 #if OMPTARGET_PROFILE_ENABLED
@@ -86,21 +91,37 @@
 return;
   }
 
+  // TODO: add ability to inspect image and decide automatically
+  bool UseVGPU = false;
+  if (auto *EnvFlag = std::getenv("LIBOMPTARGET_USE_VGPU"))
+UseVGPU = true;
+
   DP("Loading RTLs...\n");
 
   // Attempt to open all the plugins and, if they exist, check if the interface
   // is correct and if they are supporting any devices.
-  for (auto *Name : RTLNames) {
-DP("Loading library '%s'...\n", Name);
-void *dynlib_handle = dlopen(Name, RTLD_NOW);
+  for (auto &[Name, IsHost] : Plugins) {
+DP("Loading library '%s'...\n", Name.c_str());
+
+int Flags = RTLD_NOW;
+
+if (Name.compare("libomptarget.rtl.vgpu.so") == 0)
+  Flags |= RTLD_GLOBAL;
+
+if (UseVGPU && IsHost) {
+  DP("Skipping library '%s': VGPU was requested.\n", Name.c_str());
+  continue;
+}
+
+void *dynlib_handle = dlopen(Name.c_str(), Flags);
 
 if (!dynlib_handle) {
   // Library does not exist or cannot be found.
-  DP("Unable to load library '%s': %s!\n", Name, dlerror());
+  DP("Unable to load library '%s': %s!\n", Name.c_str(), dlerror());
   continue;
 }
 
-DP("Successfully loaded library '%s'!\n", Name);
+DP("Successfully loaded library '%s'!\n", Name.c_str());
 
 AllRTLs.emplace_back();
 
Index: openmp/libomptarget/plugins/vgpu/src/rtl.cpp
===
--- /dev/null
+++ 

[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-02-02 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added inline comments.



Comment at: clang/lib/Driver/ToolChains/Gnu.cpp:3082
+  if (getTriple().getVendor() == llvm::Triple::OpenMP_VGPU) {
+std::string BitcodeSuffix = "x86_64-vgpu";
+clang::driver::tools::addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args,

tianshilei1992 wrote:
> Maybe `"x86_64-openmp_vpu"` now?
not x86, right? triple contains the proper arch



Comment at: openmp/libomptarget/DeviceRTL/src/Mapping.cpp:29
+
+#include "ThreadEnvironment.h"
+

Move up to the beginning.



Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:291
+
+#include "ThreadEnvironment.h"
+namespace impl {

Move up.



Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:342
+  VGPUImpl::setLock((uint32_t *)Lock, UNSET, SET, OMP_SPIN,
+mapping::getBlockId(), atomicCAS);
+}

We should simply use omp locks. Either here, or maybe better, in VGPUImpl. So 
redirect all calls to there and use a proper lock. no OMP_SPIN and stuff



Comment at: openmp/libomptarget/DeviceRTL/src/Utils.cpp:118
+
+#include "ThreadEnvironment.h"
+namespace impl {

Move up



Comment at: openmp/libomptarget/DeviceRTL/src/Utils.cpp:127
+  return getThreadEnvironment()->shuffleDown(Var, Delta);
+}
+

Pass the mask, both times.



Comment at: openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.cpp:49
+  } // wait for 0 to be the read value
+}
+

see above.



Comment at: openmp/libomptarget/src/rtl.cpp:97
+  continue;
+}
+

Not only x86, also let's not do strcmp. Extend RTLNAmes to be an array of 
structs with more elaborate information, e.g., is host flag. That said, unsure 
if not loading the plugin is the right way to not grab the image. Good enough 
for now.



Comment at: openmp/libomptarget/test/CMakeLists.txt:23
+continue()
+  ENDIF()
   string(STRIP "${CURRENT_TARGET}" CURRENT_TARGET)

This is to disable the tests? Not sure this is a good way though. For one, can 
we check against -vgpu not x86, also openmp-vgpu or something, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-01-18 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel added inline comments.



Comment at: openmp/libomptarget/DeviceRTL/CMakeLists.txt:231
+
+compileDeviceRTLLibrary(x86_64 vgpu -target x86_64-vgpu -std=c++20 
-stdlib=libc++ -I${devicertl_base_directory}/../plugins/vgpu/src)

tianshilei1992 wrote:
> It's not a good practice to specify include directories in CMake in this way. 
> Use `include_directories` instead.
can't quite do that here I think, afaik both `include_directories` and 
`target_include_directories` require that CMake builds the target, but we 
specify custom targets/build commands so they don't get pulled in


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-01-18 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel updated this revision to Diff 401112.
atmnpatel marked 9 inline comments as done.
atmnpatel added a comment.

Addressed comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

Files:
  clang/lib/Basic/TargetInfo.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Gnu.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  llvm/include/llvm/ADT/Triple.h
  llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
  llvm/lib/Support/Triple.cpp
  openmp/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/include/ThreadEnvironment.h
  openmp/libomptarget/DeviceRTL/src/Debug.cpp
  openmp/libomptarget/DeviceRTL/src/Mapping.cpp
  openmp/libomptarget/DeviceRTL/src/Misc.cpp
  openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
  openmp/libomptarget/DeviceRTL/src/Utils.cpp
  openmp/libomptarget/plugins/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.cpp
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.h
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.cpp
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.h
  openmp/libomptarget/plugins/vgpu/src/rtl.cpp
  openmp/libomptarget/src/rtl.cpp
  openmp/libomptarget/test/CMakeLists.txt

Index: openmp/libomptarget/test/CMakeLists.txt
===
--- openmp/libomptarget/test/CMakeLists.txt
+++ openmp/libomptarget/test/CMakeLists.txt
@@ -18,6 +18,9 @@
 
 string(REGEX MATCHALL "([^\ ]+\ |[^\ ]+$)" SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}")
 foreach(CURRENT_TARGET IN LISTS SYSTEM_TARGETS)
+  IF ("${CURRENT_TARGET}" MATCHES "x86_64-vgpu")
+continue()
+  ENDIF()
   string(STRIP "${CURRENT_TARGET}" CURRENT_TARGET)
   add_openmp_testsuite(check-libomptarget-${CURRENT_TARGET}
 "Running libomptarget tests"
Index: openmp/libomptarget/src/rtl.cpp
===
--- openmp/libomptarget/src/rtl.cpp
+++ openmp/libomptarget/src/rtl.cpp
@@ -30,6 +30,7 @@
 /* SX-Aurora VE target  */ "libomptarget.rtl.ve.so",
 /* AMDGPU target*/ "libomptarget.rtl.amdgpu.so",
 /* Remote target*/ "libomptarget.rtl.rpc.so",
+/* Virtual GPU target   */ "libomptarget.rtl.vgpu.so",
 };
 
 PluginManager *PM;
@@ -73,13 +74,29 @@
 return;
   }
 
+  // TODO: add ability to inspect image and decide automatically
+  bool UseVGPU = false;
+  if (auto *EnvFlag = std::getenv("LIBOMPTARGET_USE_VGPU"))
+UseVGPU = true;
+
   DP("Loading RTLs...\n");
 
   // Attempt to open all the plugins and, if they exist, check if the interface
   // is correct and if they are supporting any devices.
   for (auto *Name : RTLNames) {
 DP("Loading library '%s'...\n", Name);
-void *dynlib_handle = dlopen(Name, RTLD_NOW);
+
+int Flags = RTLD_NOW;
+
+if (strcmp(Name, "libomptarget.rtl.vgpu.so") == 0)
+  Flags |= RTLD_GLOBAL;
+
+if (UseVGPU && (strcmp(Name, "libomptarget.rtl.x86_64.so") == 0)) {
+  DP("Skipping library '%s': VGPU was requested.\n", Name);
+  continue;
+}
+
+void *dynlib_handle = dlopen(Name, Flags);
 
 if (!dynlib_handle) {
   // Library does not exist or cannot be found.
Index: openmp/libomptarget/plugins/vgpu/src/rtl.cpp
===
--- /dev/null
+++ openmp/libomptarget/plugins/vgpu/src/rtl.cpp
@@ -0,0 +1,615 @@
+//===--RTLs/vgpu/src/rtl.cpp - Target RTLs Implementation - C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// RTL for virtual (x86) GPU
+//
+//===--===//
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "Debug.h"
+#include "ThreadEnvironment.h"
+#include "ThreadEnvironmentImpl.h"
+#include "omptarget.h"
+#include "omptargetplugin.h"
+
+#ifndef TARGET_NAME
+#define TARGET_NAME Generic ELF - 64bit
+#endif
+#define DEBUG_PREFIX "TARGET " GETNAME(TARGET_NAME) " RTL"
+
+#ifndef TARGET_ELF_ID
+#define TARGET_ELF_ID 0
+#endif
+
+#include "elf_common.h"
+
+#define OFFLOADSECTIONNAME "omp_offloading_entries"
+
+#define DEBUG false
+
+struct FFICallTy {
+  ffi_cif CIF;
+  std::vector ArgsTypes;
+  std::vector Args;
+  std::vector Ptrs;
+  void (*Entry)(void);
+
+  FFICallTy(int32_t ArgNum, void **TgtArgs, ptrdiff_t *TgtOffsets,
+void 

[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-01-18 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added inline comments.



Comment at: openmp/libomptarget/DeviceRTL/src/Kernel.cpp:127
 
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})

tianshilei1992 wrote:
> Are these code here unintentional? We don't need to specialize this function 
> for vgpu IIRC.
we might be able to avoid it if we move the synchronize::threads "effect" into 
the VGPU instead.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-01-12 Thread Shilei Tian via Phabricator via cfe-commits
tianshilei1992 added inline comments.



Comment at: openmp/libomptarget/DeviceRTL/CMakeLists.txt:231
+
+compileDeviceRTLLibrary(x86_64 vgpu -target x86_64-vgpu -std=c++20 
-stdlib=libc++ -I${devicertl_base_directory}/../plugins/vgpu/src)

It's not a good practice to specify include directories in CMake in this way. 
Use `include_directories` instead.



Comment at: openmp/libomptarget/DeviceRTL/src/Kernel.cpp:127
 
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})

Are these code here unintentional? We don't need to specialize this function 
for vgpu IIRC.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-01-10 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added inline comments.



Comment at: llvm/lib/Support/Triple.cpp:512
+  .Case("oe", Triple::OpenEmbedded)
+  .Case("vgpu", Triple::OpenMP_VGPU)
+  .Default(Triple::UnknownVendor);





Comment at: openmp/libomptarget/DeviceRTL/src/Debug.cpp:53
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})
+int32_t vprintf(const char *, void *);





Comment at: openmp/libomptarget/DeviceRTL/src/Kernel.cpp:128
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})
+void __kmpc_target_deinit(IdentTy *Ident, int8_t Mode, bool) {





Comment at: openmp/libomptarget/DeviceRTL/src/Mapping.cpp:28
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})
+





Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:290
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})
+





Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:314
+// Simply call fenceKernel because there is no need to sync with host
+void fenceSystem(int) { fenceKernel(0); }
+

Pass the memory order, also rename the arguments to match the coding convention.



Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:317
+void syncWarp(__kmpc_impl_lanemask_t Mask) {
+  getThreadEnvironment()->syncWarp();
+}

Pass the mask



Comment at: openmp/libomptarget/DeviceRTL/src/Utils.cpp:56
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})
+





Comment at: openmp/libomptarget/DeviceRTL/src/Utils.cpp:68
+
+#pragma omp end declare variant
+

Can't we merge this with AMDGPU?



Comment at: openmp/libomptarget/DeviceRTL/src/Utils.cpp:138
+#pragma omp begin declare variant match(   
\
+device = {kind(cpu)}, implementation = {extension(match_any)})
+





Comment at: openmp/libomptarget/plugins/vgpu/src/rtl.cpp:303
+TeamIdx += NumCTAs;
+  }
+

Can we split this up and create some helper functions maybe?



Comment at: openmp/libomptarget/src/rtl.cpp:34
+/* Virtual GPU target   */ "libomptarget.rtl.vgpu.so",
 };
 

Introduce an environment variable, if it is set, X86 target should skip the 
image.
Also, add a TODO such that we later look into the image and inspect it to 
decide automatically.



Comment at: openmp/libomptarget/test/lit.cfg:189
 config.substitutions.append(("%libomptarget-compile-and-run-" + \
 libomptarget_target, \
 "echo ignored-command"))

Leftovers.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2022-01-08 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel updated this revision to Diff 398370.
atmnpatel added a comment.

- Fixed lifetime issue around ffi_call
- Addressed comments

The existing x86 plugin uses ffi, so this does as well, no explicit benefit in 
doing so. Is it worth keeping?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

Files:
  clang/lib/Basic/TargetInfo.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Gnu.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  llvm/include/llvm/ADT/Triple.h
  llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
  llvm/lib/Support/Triple.cpp
  openmp/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/include/ThreadEnvironment.h
  openmp/libomptarget/DeviceRTL/src/Debug.cpp
  openmp/libomptarget/DeviceRTL/src/Kernel.cpp
  openmp/libomptarget/DeviceRTL/src/Mapping.cpp
  openmp/libomptarget/DeviceRTL/src/Misc.cpp
  openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
  openmp/libomptarget/DeviceRTL/src/Utils.cpp
  openmp/libomptarget/plugins/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.cpp
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.h
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.h
  openmp/libomptarget/plugins/vgpu/src/rtl.cpp
  openmp/libomptarget/src/rtl.cpp
  openmp/libomptarget/test/lit.cfg

Index: openmp/libomptarget/test/lit.cfg
===
--- openmp/libomptarget/test/lit.cfg
+++ openmp/libomptarget/test/lit.cfg
@@ -114,9 +114,11 @@
 
 # Scan all the valid targets.
 for libomptarget_target in config.libomptarget_all_targets:
+print("Checking {}".format(libomptarget_target))
 # Is this target in the current system? If so create a compile, run and test
 # command. Otherwise create command that return false.
 if libomptarget_target == config.libomptarget_current_target:
+print("First")
 config.substitutions.append(("%libomptarget-compilexx-run-and-check-generic", 
 "%libomptarget-compilexx-run-and-check-" + libomptarget_target))
 config.substitutions.append(("%libomptarget-compile-run-and-check-generic",
@@ -176,6 +178,7 @@
 config.substitutions.append(("%fcheck-" + libomptarget_target, \
 config.libomptarget_filecheck + " %s"))
 else:
+print("Second")
 config.substitutions.append(("%libomptarget-compile-run-and-check-" + \
 libomptarget_target, \
 "echo ignored-command"))
Index: openmp/libomptarget/src/rtl.cpp
===
--- openmp/libomptarget/src/rtl.cpp
+++ openmp/libomptarget/src/rtl.cpp
@@ -24,12 +24,13 @@
 // List of all plugins that can support offloading.
 static const char *RTLNames[] = {
 /* PowerPC target   */ "libomptarget.rtl.ppc64.so",
-/* x86_64 target*/ "libomptarget.rtl.x86_64.so",
+/* x86_64 target "libomptarget.rtl.x86_64.so", */
 /* CUDA target  */ "libomptarget.rtl.cuda.so",
 /* AArch64 target   */ "libomptarget.rtl.aarch64.so",
 /* SX-Aurora VE target  */ "libomptarget.rtl.ve.so",
 /* AMDGPU target*/ "libomptarget.rtl.amdgpu.so",
 /* Remote target*/ "libomptarget.rtl.rpc.so",
+/* Virtual GPU target   */ "libomptarget.rtl.vgpu.so",
 };
 
 PluginManager *PM;
@@ -79,7 +80,13 @@
   // is correct and if they are supporting any devices.
   for (auto *Name : RTLNames) {
 DP("Loading library '%s'...\n", Name);
-void *dynlib_handle = dlopen(Name, RTLD_NOW);
+
+int Flags = RTLD_NOW;
+
+if (strcmp(Name, "libomptarget.rtl.vgpu.so") == 0)
+  Flags |= RTLD_GLOBAL;
+
+void *dynlib_handle = dlopen(Name, Flags);
 
 if (!dynlib_handle) {
   // Library does not exist or cannot be found.
Index: openmp/libomptarget/plugins/vgpu/src/rtl.cpp
===
--- /dev/null
+++ openmp/libomptarget/plugins/vgpu/src/rtl.cpp
@@ -0,0 +1,609 @@
+//===--RTLs/vgpu/src/rtl.cpp - Target RTLs Implementation - C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// RTL for virtual (x86) GPU
+//
+//===--===//
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "Debug.h"
+#include "ThreadEnvironment.h"

[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2021-11-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I can't see it in the diff - does the cmake somewhere enable the existing tests 
on this new target?

A bit surprised to see ffi involved, are we thinking of spawning a separate 
process for the target?




Comment at: clang/lib/Basic/Targets/X86.h:49
 
+static const unsigned X86VGPUAddrSpaceMap[] = {
+0,   // Default

It's not clear to me what this is x86 specific. Being able to run our tests on 
power / arm etc seems like an advantage. Would also mean we would avoid adding 
openmp stuff the x86 specific files. Maybe OpenMPVGPUAddrSpaceMap and put it in 
one of the openmp source files?



Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3988
+(T.isNVPTX() || T.isAMDGCN() ||
+ T.getVendor() == llvm::Triple::OpenMP_VGPU) &&
 Args.hasArg(options::OPT_fopenmp_cuda_mode);

Add a isOpenmpVGPU function?



Comment at: openmp/libomptarget/DeviceRTL/CMakeLists.txt:135
  -I${devicertl_base_directory}/../include
+ -I${devicertl_base_directory}/../plugins/vgpu/src
  ${LIBOMPTARGET_LLVM_INCLUDE_DIRS_DEVICERTL}

Should only add this include to the vgu, not all the plugins. May be able to 
use relative include paths to drop it entirely


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2021-11-11 Thread Shilei Tian via Phabricator via cfe-commits
tianshilei1992 added inline comments.



Comment at: clang/lib/Driver/ToolChains/Gnu.cpp:3082
+  if (getTriple().getVendor() == llvm::Triple::OpenMP_VGPU) {
+std::string BitcodeSuffix = "x86_64-vgpu";
+clang::driver::tools::addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args,

Maybe `"x86_64-openmp_vpu"` now?



Comment at: llvm/lib/Support/Triple.cpp:189
+  case OpenMP_VGPU:
+return "vgpu";
   }

`"openmp_vpu"`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2021-11-10 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel updated this revision to Diff 386426.
atmnpatel added a comment.

small nit fix


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

Files:
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Gnu.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  llvm/include/llvm/ADT/Triple.h
  llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
  llvm/lib/Support/Triple.cpp
  openmp/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/src/Debug.cpp
  openmp/libomptarget/DeviceRTL/src/Kernel.cpp
  openmp/libomptarget/DeviceRTL/src/Mapping.cpp
  openmp/libomptarget/DeviceRTL/src/Misc.cpp
  openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
  openmp/libomptarget/DeviceRTL/src/Utils.cpp
  openmp/libomptarget/plugins/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.cpp
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.h
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.h
  openmp/libomptarget/plugins/vgpu/src/rtl.cpp
  openmp/libomptarget/src/rtl.cpp

Index: openmp/libomptarget/src/rtl.cpp
===
--- openmp/libomptarget/src/rtl.cpp
+++ openmp/libomptarget/src/rtl.cpp
@@ -30,6 +30,7 @@
 /* SX-Aurora VE target  */ "libomptarget.rtl.ve.so",
 /* AMDGPU target*/ "libomptarget.rtl.amdgpu.so",
 /* Remote target*/ "libomptarget.rtl.rpc.so",
+/* Virtual GPU target   */ "libomptarget.rtl.vgpu.so",
 };
 
 PluginManager *PM;
@@ -79,7 +80,13 @@
   // is correct and if they are supporting any devices.
   for (auto *Name : RTLNames) {
 DP("Loading library '%s'...\n", Name);
-void *dynlib_handle = dlopen(Name, RTLD_NOW);
+
+int Flags = RTLD_NOW;
+
+if (strcmp(Name, "libomptarget.rtl.vgpu.so") == 0)
+  Flags |= RTLD_GLOBAL;
+
+void *dynlib_handle = dlopen(Name, Flags);
 
 if (!dynlib_handle) {
   // Library does not exist or cannot be found.
Index: openmp/libomptarget/plugins/vgpu/src/rtl.cpp
===
--- /dev/null
+++ openmp/libomptarget/plugins/vgpu/src/rtl.cpp
@@ -0,0 +1,623 @@
+//===--RTLs/vgpu/src/rtl.cpp - Target RTLs Implementation - C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// RTL for virtual (x86) GPU
+//
+//===--===//
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "Debug.h"
+#include "ThreadEnvironment.h"
+#include "ThreadEnvironmentImpl.h"
+#include "omptarget.h"
+#include "omptargetplugin.h"
+
+#ifndef TARGET_NAME
+#define TARGET_NAME Generic ELF - 64bit
+#endif
+#define DEBUG_PREFIX "TARGET " GETNAME(TARGET_NAME) " RTL"
+
+#ifndef TARGET_ELF_ID
+#define TARGET_ELF_ID 0
+#endif
+
+#include "elf_common.h"
+
+#define NUMBER_OF_DEVICES 1
+#define OFFLOADSECTIONNAME "omp_offloading_entries"
+
+#define DEBUG false
+
+/// Array of Dynamic libraries loaded for this target.
+struct DynLibTy {
+  char *FileName;
+  void *Handle;
+};
+
+/// Keep entries table per device.
+struct FuncOrGblEntryTy {
+  __tgt_target_table Table;
+};
+
+thread_local ThreadEnvironmentTy *ThreadEnvironment;
+
+/// Class containing all the device information.
+class RTLDeviceInfoTy {
+  std::vector> FuncGblEntries;
+
+public:
+  std::list DynLibs;
+
+  // Record entry point associated with device.
+  void createOffloadTable(int32_t device_id, __tgt_offload_entry *begin,
+  __tgt_offload_entry *end) {
+assert(device_id < (int32_t)FuncGblEntries.size() &&
+   "Unexpected device id!");
+FuncGblEntries[device_id].emplace_back();
+FuncOrGblEntryTy  = FuncGblEntries[device_id].back();
+
+E.Table.EntriesBegin = begin;
+E.Table.EntriesEnd = end;
+  }
+
+  // Return true if the entry is associated with device.
+  bool findOffloadEntry(int32_t device_id, void *addr) {
+assert(device_id < (int32_t)FuncGblEntries.size() &&
+   "Unexpected device id!");
+FuncOrGblEntryTy  = FuncGblEntries[device_id].back();
+
+for (__tgt_offload_entry *i = E.Table.EntriesBegin, *e = E.Table.EntriesEnd;
+ i < e; ++i) {
+  if (i->addr == addr)
+return true;
+}
+
+return false;
+  }
+
+  // Return the pointer to the target entries table.
+  __tgt_target_table *getOffloadEntriesTable(int32_t 

[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2021-11-10 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel updated this revision to Diff 386425.
atmnpatel added a comment.

I removed the shared var opt - might be best to keep this in a separate patch 
@tianshilei1992. Also addressed comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

Files:
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Gnu.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  llvm/include/llvm/ADT/Triple.h
  llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
  llvm/lib/Support/Triple.cpp
  openmp/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/src/Debug.cpp
  openmp/libomptarget/DeviceRTL/src/Kernel.cpp
  openmp/libomptarget/DeviceRTL/src/Mapping.cpp
  openmp/libomptarget/DeviceRTL/src/Misc.cpp
  openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
  openmp/libomptarget/DeviceRTL/src/Utils.cpp
  openmp/libomptarget/plugins/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.cpp
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironment.h
  openmp/libomptarget/plugins/vgpu/src/ThreadEnvironmentImpl.h
  openmp/libomptarget/plugins/vgpu/src/rtl.cpp
  openmp/libomptarget/src/rtl.cpp

Index: openmp/libomptarget/src/rtl.cpp
===
--- openmp/libomptarget/src/rtl.cpp
+++ openmp/libomptarget/src/rtl.cpp
@@ -30,6 +30,7 @@
 /* SX-Aurora VE target  */ "libomptarget.rtl.ve.so",
 /* AMDGPU target*/ "libomptarget.rtl.amdgpu.so",
 /* Remote target*/ "libomptarget.rtl.rpc.so",
+/* Virtual GPU target   */ "libomptarget.rtl.vgpu.so",
 };
 
 PluginManager *PM;
@@ -79,7 +80,7 @@
   // is correct and if they are supporting any devices.
   for (auto *Name : RTLNames) {
 DP("Loading library '%s'...\n", Name);
-void *dynlib_handle = dlopen(Name, RTLD_NOW);
+void *dynlib_handle = dlopen(Name, RTLD_NOW | RTLD_GLOBAL);
 
 if (!dynlib_handle) {
   // Library does not exist or cannot be found.
Index: openmp/libomptarget/plugins/vgpu/src/rtl.cpp
===
--- /dev/null
+++ openmp/libomptarget/plugins/vgpu/src/rtl.cpp
@@ -0,0 +1,623 @@
+//===--RTLs/vgpu/src/rtl.cpp - Target RTLs Implementation - C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// RTL for virtual (x86) GPU
+//
+//===--===//
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "Debug.h"
+#include "ThreadEnvironment.h"
+#include "ThreadEnvironmentImpl.h"
+#include "omptarget.h"
+#include "omptargetplugin.h"
+
+#ifndef TARGET_NAME
+#define TARGET_NAME Generic ELF - 64bit
+#endif
+#define DEBUG_PREFIX "TARGET " GETNAME(TARGET_NAME) " RTL"
+
+#ifndef TARGET_ELF_ID
+#define TARGET_ELF_ID 0
+#endif
+
+#include "elf_common.h"
+
+#define NUMBER_OF_DEVICES 1
+#define OFFLOADSECTIONNAME "omp_offloading_entries"
+
+#define DEBUG false
+
+/// Array of Dynamic libraries loaded for this target.
+struct DynLibTy {
+  char *FileName;
+  void *Handle;
+};
+
+/// Keep entries table per device.
+struct FuncOrGblEntryTy {
+  __tgt_target_table Table;
+};
+
+thread_local ThreadEnvironmentTy *ThreadEnvironment;
+
+/// Class containing all the device information.
+class RTLDeviceInfoTy {
+  std::vector> FuncGblEntries;
+
+public:
+  std::list DynLibs;
+
+  // Record entry point associated with device.
+  void createOffloadTable(int32_t device_id, __tgt_offload_entry *begin,
+  __tgt_offload_entry *end) {
+assert(device_id < (int32_t)FuncGblEntries.size() &&
+   "Unexpected device id!");
+FuncGblEntries[device_id].emplace_back();
+FuncOrGblEntryTy  = FuncGblEntries[device_id].back();
+
+E.Table.EntriesBegin = begin;
+E.Table.EntriesEnd = end;
+  }
+
+  // Return true if the entry is associated with device.
+  bool findOffloadEntry(int32_t device_id, void *addr) {
+assert(device_id < (int32_t)FuncGblEntries.size() &&
+   "Unexpected device id!");
+FuncOrGblEntryTy  = FuncGblEntries[device_id].back();
+
+for (__tgt_offload_entry *i = E.Table.EntriesBegin, *e = E.Table.EntriesEnd;
+ i < e; ++i) {
+  if (i->addr == addr)
+return true;
+}
+
+return false;
+  }
+
+  // Return the pointer to the target entries table.
+  __tgt_target_table *getOffloadEntriesTable(int32_t 

[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2021-11-08 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeVirtualGPU.cpp:54
+  CGOpenMPRuntime::createOffloadEntry(ID, Addr, Size, Flags, Linkage);
+}

We should be able to get rid of this file (and the cuda/hip) version. Might be 
the right time now as a precommit.



Comment at: llvm/include/llvm/ADT/Triple.h:166
+VGPU,
+LastVendorType = VGPU
   };

Let's call it OpenMP_VGPU or something like that to make it clear.



Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:2177
+}
+
 /// Abstract Attribute for tracking ICV values.

@tianshilei1992 This needs a test.



Comment at: openmp/libomptarget/DeviceRTL/src/Kernel.cpp:107
+  synchronize::threads();
+
   // Signal the workers to exit the state machine and exit the kernel.

I don't think we should do this. Instead, the plugin should signal as threads 
finish the kernel.



Comment at: openmp/libomptarget/DeviceRTL/src/Mapping.cpp:171
+#pragma omp begin declare variant match(   
\
+device = {arch(x86, x86_64)}, implementation = {extension(match_any)})
+

We probably should use kind(CPU) or something instead. Nothing x86 specific 
about it I think.



Comment at: openmp/libomptarget/include/DeviceEnvironment.h:83
+
+ThreadEnvironmentTy *getThreadEnvironment(void);
+

This should go into a new file (ThreadEnvironment)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113359/new/

https://reviews.llvm.org/D113359

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D113359: [Libomptarget][WIP] Introduce VGPU Plugin

2021-11-06 Thread Atmn Patel via Phabricator via cfe-commits
atmnpatel created this revision.
atmnpatel added reviewers: jdoerfert, tianshilei1992, JonChesterfield.
Herald added subscribers: ormris, dexonsmith, pengfei, hiraditya, mgorny.
atmnpatel requested review of this revision.
Herald added subscribers: llvm-commits, openmp-commits, cfe-commits, sstefan1.
Herald added projects: clang, OpenMP, LLVM.

This patch introduces a virtual GPU (x86) plugin. This allows for the
emulation of the GPU environment on the host. This re-uses the same
execution model, compilation paths, runtimes as a physical GPU. The
number of threads, warps, and CTAs are set through the environment
variables `VGPU_{NUM_THREADS,NUM_WARPS,WARPS_PER_CTA}` respectively.

Known Bugs (hence WIP):

- In the rebase from LLVM 12, larger applications started segfaulting. Small 
programs still work with this patch.
- The virtual GPU should be able to execute kernels asynchronously using the 
streams - but there is an unknown lifetime issue around the `ffi_call` that 
prevents the removal of the await after the `scheduleAsync` call.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D113359

Files:
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/CGOpenMPRuntimeVirtualGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeVirtualGPU.h
  clang/lib/CodeGen/CMakeLists.txt
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Gnu.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  llvm/include/llvm/ADT/Triple.h
  llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Support/Triple.cpp
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/utils/gn/secondary/clang/lib/CodeGen/BUILD.gn
  openmp/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/CMakeLists.txt
  openmp/libomptarget/DeviceRTL/include/Interface.h
  openmp/libomptarget/DeviceRTL/src/Kernel.cpp
  openmp/libomptarget/DeviceRTL/src/Mapping.cpp
  openmp/libomptarget/DeviceRTL/src/Misc.cpp
  openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
  openmp/libomptarget/DeviceRTL/src/Utils.cpp
  openmp/libomptarget/include/DeviceEnvironment.h
  openmp/libomptarget/plugins/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/CMakeLists.txt
  openmp/libomptarget/plugins/vgpu/src/DeviceEnvironment.cpp
  openmp/libomptarget/plugins/vgpu/src/DeviceEnvironmentImpl.h
  openmp/libomptarget/plugins/vgpu/src/rtl.cpp
  openmp/libomptarget/src/rtl.cpp

Index: openmp/libomptarget/src/rtl.cpp
===
--- openmp/libomptarget/src/rtl.cpp
+++ openmp/libomptarget/src/rtl.cpp
@@ -34,6 +34,7 @@
 /* SX-Aurora VE target  */ "libomptarget.rtl.ve.so",
 /* AMDGPU target*/ "libomptarget.rtl.amdgpu.so",
 /* Remote target*/ "libomptarget.rtl.rpc.so",
+/* Virtual GPU target   */ "libomptarget.rtl.vgpu.so",
 };
 
 PluginManager *PM;
@@ -83,7 +84,7 @@
   // is correct and if they are supporting any devices.
   for (auto *Name : RTLNames) {
 DP("Loading library '%s'...\n", Name);
-void *dynlib_handle = dlopen(Name, RTLD_NOW);
+void *dynlib_handle = dlopen(Name, RTLD_NOW | RTLD_GLOBAL);
 
 if (!dynlib_handle) {
   // Library does not exist or cannot be found.
Index: openmp/libomptarget/plugins/vgpu/src/rtl.cpp
===
--- /dev/null
+++ openmp/libomptarget/plugins/vgpu/src/rtl.cpp
@@ -0,0 +1,623 @@
+//===--RTLs/vgpu/src/rtl.cpp - Target RTLs Implementation - C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// RTL for virtual (x86) GPU
+//
+//===--===//
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "Debug.h"
+#include "DeviceEnvironment.h"
+#include "DeviceEnvironmentImpl.h"
+#include "omptarget.h"
+#include "omptargetplugin.h"
+
+#ifndef TARGET_NAME
+#define TARGET_NAME Generic ELF - 64bit
+#endif
+#define DEBUG_PREFIX "TARGET " GETNAME(TARGET_NAME) " RTL"
+
+#ifndef TARGET_ELF_ID
+#define TARGET_ELF_ID 0
+#endif
+
+#include "elf_common.h"
+
+#define NUMBER_OF_DEVICES 1
+#define OFFLOADSECTIONNAME "omp_offloading_entries"
+
+#define DEBUG false
+
+/// Array of Dynamic libraries loaded for this target.
+struct DynLibTy {
+  char *FileName;
+  void *Handle;
+};
+
+/// Keep entries table per device.
+struct FuncOrGblEntryTy {
+  __tgt_target_table Table;
+};
+
+thread_local ThreadEnvironmentTy *ThreadEnvironment;
+
+/// Class containing all the device information.
+class RTLDeviceInfoTy {
+  std::vector> FuncGblEntries;
+
+public:
+  std::list DynLibs;
+
+  // Record