[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)

2024-05-03 Thread Jan Patrick Lehr via cfe-commits


@@ -111,6 +156,71 @@ void __xray_init() XRAY_NEVER_INSTRUMENT {
 #endif
 }
 
+// Default visibility is hidden, so we have to explicitly make it visible to
+// DSO.
+SANITIZER_INTERFACE_ATTRIBUTE int32_t __xray_register_dso(
+const XRaySledEntry *SledsBegin, const XRaySledEntry *SledsEnd,
+const XRayFunctionSledIndex *FnIndexBegin,
+const XRayFunctionSledIndex *FnIndexEnd,
+XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT {
+  // Make sure XRay has been initialized in the main executable.
+  __xray_init();
+
+  if (__xray_num_objects() == 0) {
+if (Verbosity())
+  Report("No XRay instrumentation map in main executable. Not initializing 
"
+ "XRay for DSO.\n");
+return -1;
+  }
+
+  // Register sleds in global map.
+  int ObjId = __xray_register_sleds(SledsBegin, SledsEnd, FnIndexBegin,
+ FnIndexEnd, true, Trampolines);
+
+#ifndef XRAY_NO_PREINIT
+  if (ObjId >= 0 && flags()->patch_premain)
+__xray_patch_object(ObjId);
+#endif
+
+  return ObjId;
+}
+
+SANITIZER_INTERFACE_ATTRIBUTE bool
+__xray_deregister_dso(int32_t ObjId) XRAY_NEVER_INSTRUMENT {
+  // Make sure XRay has been initialized in the main executable.
+  __xray_init();
+
+  if (ObjId <= 0 || ObjId >= __xray_num_objects()) {
+if (Verbosity())
+  Report("Can't deregister object with ID %d: ID is invalid.\n", ObjId);
+return false;
+  }
+
+  {
+SpinMutexLock Guard();
+auto  = XRayInstrMaps[ObjId];
+if (!Entry.FromDSO) {
+  if (Verbosity())
+Report("Can't deregister object with ID %d: object does not correspond 
"
+   "to a shared library.\n",
+   ObjId);
+  return false;
+}
+if (!Entry.Loaded) {

jplehr wrote:

No curly here

https://github.com/llvm/llvm-project/pull/90959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)

2024-05-03 Thread Jan Patrick Lehr via cfe-commits


@@ -80,29 +140,14 @@ void __xray_init() XRAY_NEVER_INSTRUMENT {
 return;
   }
 
-  {
-SpinMutexLock Guard();
-XRayInstrMap.Sleds = __start_xray_instr_map;
-XRayInstrMap.Entries = __stop_xray_instr_map - __start_xray_instr_map;
-if (__start_xray_fn_idx != nullptr) {
-  XRayInstrMap.SledsIndex = __start_xray_fn_idx;
-  XRayInstrMap.Functions = __stop_xray_fn_idx - __start_xray_fn_idx;
-} else {
-  size_t CountFunctions = 0;
-  uint64_t LastFnAddr = 0;
-
-  for (std::size_t I = 0; I < XRayInstrMap.Entries; I++) {
-const auto  = XRayInstrMap.Sleds[I];
-const auto Function = Sled.function();
-if (Function != LastFnAddr) {
-  CountFunctions++;
-  LastFnAddr = Function;
-}
-  }
+  atomic_store(, 0, memory_order_release);
+
+  // Pre-allocation takes up approx. 5kB for XRayMaxObjects=64.
+  XRayInstrMaps = allocateBuffer(XRayMaxObjects);
+
+  __xray_register_sleds(__start_xray_instr_map, __stop_xray_instr_map,

jplehr wrote:

Why is the return value not checked?

https://github.com/llvm/llvm-project/pull/90959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)

2024-05-03 Thread Jan Patrick Lehr via cfe-commits


@@ -50,14 +52,72 @@ atomic_uint8_t XRayInitialized{0};
 
 // This should always be updated before XRayInitialized is updated.
 SpinMutex XRayInstrMapMutex;
-XRaySledMap XRayInstrMap;
+// XRaySledMap XRayInstrMap;
+//  Contains maps for the main executable as well as DSOs.
+// std::vector XRayInstrMaps;
+XRaySledMap *XRayInstrMaps;
+atomic_uint32_t XRayNumObjects;
 
 // Global flag to determine whether the flags have been initialized.
 atomic_uint8_t XRayFlagsInitialized{0};
 
 // A mutex to allow only one thread to initialize the XRay data structures.
 SpinMutex XRayInitMutex;
 
+int32_t
+__xray_register_sleds(const XRaySledEntry *SledsBegin,
+  const XRaySledEntry *SledsEnd,
+  const XRayFunctionSledIndex *FnIndexBegin,
+  const XRayFunctionSledIndex *FnIndexEnd, bool FromDSO,
+  XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT {
+  if (!SledsBegin || !SledsEnd) {

jplehr wrote:

No curly here

https://github.com/llvm/llvm-project/pull/90959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)

2024-05-03 Thread Jan Patrick Lehr via cfe-commits


@@ -50,14 +52,72 @@ atomic_uint8_t XRayInitialized{0};
 
 // This should always be updated before XRayInitialized is updated.
 SpinMutex XRayInstrMapMutex;
-XRaySledMap XRayInstrMap;
+// XRaySledMap XRayInstrMap;
+//  Contains maps for the main executable as well as DSOs.
+// std::vector XRayInstrMaps;
+XRaySledMap *XRayInstrMaps;
+atomic_uint32_t XRayNumObjects;
 
 // Global flag to determine whether the flags have been initialized.
 atomic_uint8_t XRayFlagsInitialized{0};
 
 // A mutex to allow only one thread to initialize the XRay data structures.
 SpinMutex XRayInitMutex;
 
+int32_t
+__xray_register_sleds(const XRaySledEntry *SledsBegin,
+  const XRaySledEntry *SledsEnd,
+  const XRayFunctionSledIndex *FnIndexBegin,
+  const XRayFunctionSledIndex *FnIndexEnd, bool FromDSO,
+  XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT {
+  if (!SledsBegin || !SledsEnd) {
+return -1;
+  }
+  XRaySledMap SledMap;
+  SledMap.FromDSO = FromDSO;
+  SledMap.Loaded = true;
+  SledMap.Trampolines = Trampolines;
+  SledMap.Sleds = SledsBegin;
+  SledMap.Entries = SledsEnd - SledsBegin;
+  if (FnIndexBegin != nullptr) {
+SledMap.SledsIndex = FnIndexBegin;
+SledMap.Functions = FnIndexEnd - FnIndexBegin;
+  } else {
+size_t CountFunctions = 0;
+uint64_t LastFnAddr = 0;
+
+for (std::size_t I = 0; I < SledMap.Entries; I++) {
+  const auto  = SledMap.Sleds[I];
+  const auto Function = Sled.function();
+  if (Function != LastFnAddr) {
+CountFunctions++;
+LastFnAddr = Function;
+  }
+}
+
+SledMap.Functions = CountFunctions;
+  }
+  if (SledMap.Functions >= XRayMaxFunctions) {
+Report("Too many functions! Maximum is %ld\n", XRayMaxFunctions);
+return -1;
+  }
+
+  if (Verbosity()) {

jplehr wrote:

No curly here

https://github.com/llvm/llvm-project/pull/90959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)

2024-05-03 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr commented:

Thank you for contributing this patch. I am by no means an expert in this area, 
but would like to see xray get support for shared libraries.

I did an initial pass to simply look for easy things and left a few comments.

https://github.com/llvm/llvm-project/pull/90959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)

2024-05-03 Thread Jan Patrick Lehr via cfe-commits


@@ -111,6 +156,71 @@ void __xray_init() XRAY_NEVER_INSTRUMENT {
 #endif
 }
 
+// Default visibility is hidden, so we have to explicitly make it visible to
+// DSO.
+SANITIZER_INTERFACE_ATTRIBUTE int32_t __xray_register_dso(
+const XRaySledEntry *SledsBegin, const XRaySledEntry *SledsEnd,
+const XRayFunctionSledIndex *FnIndexBegin,
+const XRayFunctionSledIndex *FnIndexEnd,
+XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT {
+  // Make sure XRay has been initialized in the main executable.
+  __xray_init();
+
+  if (__xray_num_objects() == 0) {
+if (Verbosity())
+  Report("No XRay instrumentation map in main executable. Not initializing 
"
+ "XRay for DSO.\n");
+return -1;
+  }
+
+  // Register sleds in global map.
+  int ObjId = __xray_register_sleds(SledsBegin, SledsEnd, FnIndexBegin,
+ FnIndexEnd, true, Trampolines);
+
+#ifndef XRAY_NO_PREINIT
+  if (ObjId >= 0 && flags()->patch_premain)
+__xray_patch_object(ObjId);
+#endif
+
+  return ObjId;
+}
+
+SANITIZER_INTERFACE_ATTRIBUTE bool
+__xray_deregister_dso(int32_t ObjId) XRAY_NEVER_INSTRUMENT {
+  // Make sure XRay has been initialized in the main executable.
+  __xray_init();
+
+  if (ObjId <= 0 || ObjId >= __xray_num_objects()) {
+if (Verbosity())
+  Report("Can't deregister object with ID %d: ID is invalid.\n", ObjId);
+return false;
+  }
+
+  {
+SpinMutexLock Guard();
+auto  = XRayInstrMaps[ObjId];
+if (!Entry.FromDSO) {
+  if (Verbosity())
+Report("Can't deregister object with ID %d: object does not correspond 
"
+   "to a shared library.\n",
+   ObjId);
+  return false;
+}
+if (!Entry.Loaded) {
+  if (Verbosity())
+Report("Can't deregister object with ID %d: object is not loaded.\n",
+   ObjId);
+}
+// This is all we have to do here.

jplehr wrote:

I don't think this comment is helpful. Maybe better: explain why.

https://github.com/llvm/llvm-project/pull/90959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)

2024-05-03 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/90959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [Offload][NFC] Remove `omp_` prefix from offloading entries (PR #88071)

2024-04-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/88071
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)

2024-04-08 Thread Jan Patrick Lehr via cfe-commits


@@ -16,4 +16,12 @@ typedef unsigned __INTPTR_TYPE__ uintptr_t;
 #error Every target should have __INTPTR_TYPE__
 #endif
 
+#ifdef __INTPTR_MAX__
+#define  INTPTR_MAX__INTPTR_MAX__
+#endif
+
+#ifdef __UINTPTR_MAX__
+#define UINTPTR_MAX   __UINTPTR_MAX__
+#endif
+

jplehr wrote:

Are these changes required to make this work?

https://github.com/llvm/llvm-project/pull/88007
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)

2024-04-08 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr commented:

The changes seem reasonable to me.

https://github.com/llvm/llvm-project/pull/88007
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)

2024-04-08 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/88007
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-repl] Factor out CreateJITBuilder() and allow specialization in derived classes (PR #84461)

2024-03-25 Thread Jan Patrick Lehr via cfe-commits
Stefan =?utf-8?q?Gr=C3=A4nitz?= 
Message-ID:
In-Reply-To: 


jplehr wrote:

I see. Thanks for the explanation and the fix! Bot is back to green.

https://github.com/llvm/llvm-project/pull/84461
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-repl] Factor out CreateJITBuilder() and allow specialization in derived classes (PR #84461)

2024-03-25 Thread Jan Patrick Lehr via cfe-commits
Stefan =?utf-8?q?Gränitz?= 
Message-ID:
In-Reply-To: 


jplehr wrote:

Thanks! I am quite unfamiliar with that part of the code base and wonder if the 
symbol needs to just exist somewhere.
The other thing used there (`InitializeNativeTargetAsmPrinter`) is declared in 
`TargetSelect.h`. So, does it need to be a two parts fix: one to declare the 
symbol and then the magic to do the right thing at runtime. If what you linked 
does both, even better. :)

https://github.com/llvm/llvm-project/pull/84461
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-repl] Factor out CreateJITBuilder() and allow specialization in derived classes (PR #84461)

2024-03-25 Thread Jan Patrick Lehr via cfe-commits
Stefan =?utf-8?q?Gränitz?= 
Message-ID:
In-Reply-To: 


jplehr wrote:

Hi,
I think this one broke one of our buildbots: 
https://lab.llvm.org/buildbot/#/builders/259/builds/1769 
I'm happy to help looking into it.

https://github.com/llvm/llvm-project/pull/84461
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Append target search paths for direct offloading compilation (PR #82699)

2024-02-23 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr approved this pull request.

LG

https://github.com/llvm/llvm-project/pull/82699
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr commented:

I looked at the changes and from the little I understand CMake they seem ok.
I added one nit.

Maybe @saiislam can have a look as well.

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Jan Patrick Lehr via cfe-commits


@@ -102,6 +80,121 @@ function(add_entrypoint_library target_name)
 list(APPEND all_deps ${entrypoint_target})
   endforeach(dep)
   list(REMOVE_DUPLICATES all_deps)
+  set(${result} ${all_deps} PARENT_SCOPE)
+endfunction()
+
+# A rule to build a library from a collection of entrypoint objects and bundle
+# it into a GPU fatbinary. Usage is the same as 'add_entrypoint_library'.
+# Usage:
+# add_gpu_entrypoint_library(
+#   DEPENDS 
+# )
+function(add_gpu_entrypoint_library target_name)
+  cmake_parse_arguments(
+"ENTRYPOINT_LIBRARY"
+"" # No optional arguments
+"" # No single value arguments
+"DEPENDS" # Multi-value arguments
+${ARGN}
+  )
+  if(NOT ENTRYPOINT_LIBRARY_DEPENDS)
+message(FATAL_ERROR "'add_entrypoint_library' target requires a DEPENDS 
list "
+"of 'add_entrypoint_object' targets.")
+  endif()
+
+  get_fq_deps_list(fq_deps_list ${ENTRYPOINT_LIBRARY_DEPENDS})
+  get_all_object_file_deps(all_deps "${fq_deps_list}")
+
+  # The GPU 'libc' needs to be exported in a format that can be linked with
+  # offloading langauges like OpenMP or CUDA. This wraps every GPU object into 
a
+  # fat binary and adds them to a static library.
+  set(objects "")
+  foreach(dep IN LISTS all_deps)
+set(object 
$<$,${dep}>:$>)
+string(FIND ${dep} "." last_dot_loc REVERSE)
+math(EXPR name_loc "${last_dot_loc} + 1")
+string(SUBSTRING ${dep} ${name_loc} -1 name)
+if(LIBC_TARGET_ARCHITECTURE_IS_NVPTX)
+  set(prefix 
--image=arch=generic,triple=nvptx64-nvidia-cuda,feature=+ptx63)
+else()

jplehr wrote:

Other places do `elseif(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU)`. Maybe here as 
well for consistency?

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang] [llvm] [XCOFF] Add compiler version to an auxiliary symbol table entry (PR #80162)

2024-02-06 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

This introduced a spelling mistake that broke some builds.
https://lab.llvm.org/buildbot/#/builders/193/builds/46220

https://github.com/llvm/llvm-project/pull/80162
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CMake] Support perf, LBR, and Instrument CLANG_BOLT options (PR #69133)

2024-01-28 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

Hi @aaupov I think this did break the AMD Hip build bot (another annotated 
builder).
Are you looking into the potential issue?

https://github.com/llvm/llvm-project/pull/69133
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-22 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-20 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/76571

>From a6c437a52674613b90c451c2ed4105265f420a32 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Fri, 29 Dec 2023 04:32:24 -0500
Subject: [PATCH] [OpenMP][USM] Introduces -fopenmp-force-usm flag

This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.

This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
---
 clang/include/clang/Basic/LangOptions.def |  1 +
 clang/include/clang/Driver/Options.td |  4 +
 clang/lib/CodeGen/CGOpenMPRuntime.cpp |  7 ++
 clang/lib/Driver/ToolChains/Clang.cpp |  2 +
 clang/test/OpenMP/force-usm.c | 79 +++
 openmp/libomptarget/test/lit.cfg  |  8 ++
 .../test/offloading/force-usm.cpp | 59 ++
 7 files changed, 160 insertions(+)
 create mode 100644 clang/test/OpenMP/force-usm.c
 create mode 100644 openmp/libomptarget/test/offloading/force-usm.cpp

diff --git a/clang/include/clang/Basic/LangOptions.def 
b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac..81cf2ad9498a7f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription  , 1, 0, "Assume distributed 
loops do not have mo
 LANGOPT(OpenMPNoThreadState  , 1, 0, "Assume that no thread in a parallel 
region will modify an ICV.")
 LANGOPT(OpenMPNoNestedParallelism  , 1, 0, "Assume that no thread in a 
parallel region will encounter a parallel region")
 LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory 
and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode 
via compiler.")
 LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU 
libraries.")
 LANGOPT(RenderScript  , 1, 0, "RenderScript")
 
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..28290da438c62d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], 
"fopenmp-offload-mandatory">, Group<
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not create a host fallback if offloading to the device fails.">,
   MarshallingInfoFlag>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Force behvaior as if the user specified pragma omp requires 
unified_shared_memory.">,
+  MarshallingInfoFlag>;
 def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group,
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
   HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies 
-foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e8321..4855e7410a015a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule )
  ? CGM.getLangOpts().OMPHostIRFile
  : StringRef{});
   OMPBuilder.setConfig(Config);
+
+  // The user forces the compiler to behave as if omp requires
+  // unified_shared_memory was given.
+  if (CGM.getLangOpts().OpenMPForceUSM) {
+HasRequiresUnifiedSharedMemory = true;
+OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+  }
 }
 
 void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d..ffc24201ab2e0b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
   if 

[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-19 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

I just realized that I need to update the clang lit tests, so this is *not 
ready to land*, but I don't see a button to indicate that.

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-19 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/76571

>From 41b227e2c84b3c7eeedb6a9ebf559bec2c34aec3 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Fri, 29 Dec 2023 04:32:24 -0500
Subject: [PATCH] [OpenMP][USM] Introduces -fopenmp-force-usm flag

This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.

This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
---
 clang/include/clang/Basic/LangOptions.def |  1 +
 clang/include/clang/Driver/Options.td |  4 +
 clang/lib/CodeGen/CGOpenMPRuntime.cpp |  7 ++
 clang/lib/Driver/ToolChains/Clang.cpp |  2 +
 clang/test/OpenMP/force-usm.c | 74 +++
 openmp/libomptarget/test/lit.cfg  |  8 ++
 .../test/offloading/force-usm.cpp | 59 +++
 7 files changed, 155 insertions(+)
 create mode 100644 clang/test/OpenMP/force-usm.c
 create mode 100644 openmp/libomptarget/test/offloading/force-usm.cpp

diff --git a/clang/include/clang/Basic/LangOptions.def 
b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac..81cf2ad9498a7f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription  , 1, 0, "Assume distributed 
loops do not have mo
 LANGOPT(OpenMPNoThreadState  , 1, 0, "Assume that no thread in a parallel 
region will modify an ICV.")
 LANGOPT(OpenMPNoNestedParallelism  , 1, 0, "Assume that no thread in a 
parallel region will encounter a parallel region")
 LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory 
and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode 
via compiler.")
 LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU 
libraries.")
 LANGOPT(RenderScript  , 1, 0, "RenderScript")
 
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..28290da438c62d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], 
"fopenmp-offload-mandatory">, Group<
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not create a host fallback if offloading to the device fails.">,
   MarshallingInfoFlag>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Force behvaior as if the user specified pragma omp requires 
unified_shared_memory.">,
+  MarshallingInfoFlag>;
 def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group,
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
   HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies 
-foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e8321..4855e7410a015a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule )
  ? CGM.getLangOpts().OMPHostIRFile
  : StringRef{});
   OMPBuilder.setConfig(Config);
+
+  // The user forces the compiler to behave as if omp requires
+  // unified_shared_memory was given.
+  if (CGM.getLangOpts().OpenMPForceUSM) {
+HasRequiresUnifiedSharedMemory = true;
+OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+  }
 }
 
 void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d..ffc24201ab2e0b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
   if 

[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-19 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

> Automatic zero-copy doesn't work on some of the bbot's. I will have to land 
> this once the lit test harness extension in #77851 re-lands.

Having your work landed would be very helpful indeed.

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-19 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

@carlobertolli can you have another look at the runtime test I added to see if 
that addresses your feedback?

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-18 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/76571

>From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Fri, 29 Dec 2023 04:32:24 -0500
Subject: [PATCH 1/5] [OpenMP][USM] Introduces -fopenmp-force-usm flag

This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.
---
 clang/include/clang/Basic/LangOptions.def | 1 +
 clang/include/clang/Driver/Options.td | 4 
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++
 clang/lib/Driver/ToolChains/Clang.cpp | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/clang/include/clang/Basic/LangOptions.def 
b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac..81cf2ad9498a7f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription  , 1, 0, "Assume distributed 
loops do not have mo
 LANGOPT(OpenMPNoThreadState  , 1, 0, "Assume that no thread in a parallel 
region will modify an ICV.")
 LANGOPT(OpenMPNoNestedParallelism  , 1, 0, "Assume that no thread in a 
parallel region will encounter a parallel region")
 LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory 
and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode 
via compiler.")
 LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU 
libraries.")
 LANGOPT(RenderScript  , 1, 0, "RenderScript")
 
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..28290da438c62d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], 
"fopenmp-offload-mandatory">, Group<
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not create a host fallback if offloading to the device fails.">,
   MarshallingInfoFlag>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Force behvaior as if the user specified pragma omp requires 
unified_shared_memory.">,
+  MarshallingInfoFlag>;
 def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group,
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
   HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies 
-foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e8321..4855e7410a015a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule )
  ? CGM.getLangOpts().OMPHostIRFile
  : StringRef{});
   OMPBuilder.setConfig(Config);
+
+  // The user forces the compiler to behave as if omp requires
+  // unified_shared_memory was given.
+  if (CGM.getLangOpts().OpenMPForceUSM) {
+HasRequiresUnifiedSharedMemory = true;
+OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+  }
 }
 
 void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d..ffc24201ab2e0b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
   if (Args.hasArg(options::OPT_fopenmp_offload_mandatory))
 CmdArgs.push_back("-fopenmp-offload-mandatory");
+  if (Args.hasArg(options::OPT_fopenmp_force_usm))
+CmdArgs.push_back("-fopenmp-force-usm");
   break;
 default:
   // By default, if Clang doesn't know how to generate useful OpenMP code

>From 11ad5633889870d897bfc4e77bc41b569e5ce539 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Wed, 12 Jul 2023 05:04:41 -0400
Subject: [PATCH 2/5] [OpenMP][USM] Adds test for -fopenmp-force-usm flag

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.

Marked as XFAIL to first land test and then enable in subsequent patch.
---
 clang/test/OpenMP/force-usm.c | 74 +++
 1 file changed, 74 insertions(+)
 create mode 100644 clang/test/OpenMP/force-usm.c

diff --git 

[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-18 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/76571

>From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Fri, 29 Dec 2023 04:32:24 -0500
Subject: [PATCH 1/4] [OpenMP][USM] Introduces -fopenmp-force-usm flag

This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.
---
 clang/include/clang/Basic/LangOptions.def | 1 +
 clang/include/clang/Driver/Options.td | 4 
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++
 clang/lib/Driver/ToolChains/Clang.cpp | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/clang/include/clang/Basic/LangOptions.def 
b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac3..81cf2ad9498a7f9 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription  , 1, 0, "Assume distributed 
loops do not have mo
 LANGOPT(OpenMPNoThreadState  , 1, 0, "Assume that no thread in a parallel 
region will modify an ICV.")
 LANGOPT(OpenMPNoNestedParallelism  , 1, 0, "Assume that no thread in a 
parallel region will encounter a parallel region")
 LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory 
and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode 
via compiler.")
 LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU 
libraries.")
 LANGOPT(RenderScript  , 1, 0, "RenderScript")
 
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499cc..28290da438c62db 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], 
"fopenmp-offload-mandatory">, Group<
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not create a host fallback if offloading to the device fails.">,
   MarshallingInfoFlag>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Force behvaior as if the user specified pragma omp requires 
unified_shared_memory.">,
+  MarshallingInfoFlag>;
 def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group,
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
   HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies 
-foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e83218..4855e7410a015aa 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule )
  ? CGM.getLangOpts().OMPHostIRFile
  : StringRef{});
   OMPBuilder.setConfig(Config);
+
+  // The user forces the compiler to behave as if omp requires
+  // unified_shared_memory was given.
+  if (CGM.getLangOpts().OpenMPForceUSM) {
+HasRequiresUnifiedSharedMemory = true;
+OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+  }
 }
 
 void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d2..ffc24201ab2e0b5 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
   if (Args.hasArg(options::OPT_fopenmp_offload_mandatory))
 CmdArgs.push_back("-fopenmp-offload-mandatory");
+  if (Args.hasArg(options::OPT_fopenmp_force_usm))
+CmdArgs.push_back("-fopenmp-force-usm");
   break;
 default:
   // By default, if Clang doesn't know how to generate useful OpenMP code

>From 11ad5633889870d897bfc4e77bc41b569e5ce539 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Wed, 12 Jul 2023 05:04:41 -0400
Subject: [PATCH 2/4] [OpenMP][USM] Adds test for -fopenmp-force-usm flag

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.

Marked as XFAIL to first land test and then enable in subsequent patch.
---
 clang/test/OpenMP/force-usm.c | 74 +++
 1 file changed, 74 insertions(+)
 create mode 100644 clang/test/OpenMP/force-usm.c

diff --git 

[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)

2024-01-18 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/75467
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)

2024-01-18 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

Closing this. Test is now part of feature-PR.

https://github.com/llvm/llvm-project/pull/75467
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-18 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

While I add some documentation, I'd appreciate feedback especially on the lit 
side of things. I would very much like to rename the pretty happy tripple-X 
workaround for substitution debugging into something sane.

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-18 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/76571

>From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Fri, 29 Dec 2023 04:32:24 -0500
Subject: [PATCH 1/3] [OpenMP][USM] Introduces -fopenmp-force-usm flag

This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.
---
 clang/include/clang/Basic/LangOptions.def | 1 +
 clang/include/clang/Driver/Options.td | 4 
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++
 clang/lib/Driver/ToolChains/Clang.cpp | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/clang/include/clang/Basic/LangOptions.def 
b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac..81cf2ad9498a7f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription  , 1, 0, "Assume distributed 
loops do not have mo
 LANGOPT(OpenMPNoThreadState  , 1, 0, "Assume that no thread in a parallel 
region will modify an ICV.")
 LANGOPT(OpenMPNoNestedParallelism  , 1, 0, "Assume that no thread in a 
parallel region will encounter a parallel region")
 LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory 
and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode 
via compiler.")
 LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU 
libraries.")
 LANGOPT(RenderScript  , 1, 0, "RenderScript")
 
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..28290da438c62d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], 
"fopenmp-offload-mandatory">, Group<
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not create a host fallback if offloading to the device fails.">,
   MarshallingInfoFlag>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Force behvaior as if the user specified pragma omp requires 
unified_shared_memory.">,
+  MarshallingInfoFlag>;
 def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group,
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
   HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies 
-foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e8321..4855e7410a015a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule )
  ? CGM.getLangOpts().OMPHostIRFile
  : StringRef{});
   OMPBuilder.setConfig(Config);
+
+  // The user forces the compiler to behave as if omp requires
+  // unified_shared_memory was given.
+  if (CGM.getLangOpts().OpenMPForceUSM) {
+HasRequiresUnifiedSharedMemory = true;
+OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+  }
 }
 
 void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d..ffc24201ab2e0b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
   if (Args.hasArg(options::OPT_fopenmp_offload_mandatory))
 CmdArgs.push_back("-fopenmp-offload-mandatory");
+  if (Args.hasArg(options::OPT_fopenmp_force_usm))
+CmdArgs.push_back("-fopenmp-force-usm");
   break;
 default:
   // By default, if Clang doesn't know how to generate useful OpenMP code

>From 11ad5633889870d897bfc4e77bc41b569e5ce539 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Wed, 12 Jul 2023 05:04:41 -0400
Subject: [PATCH 2/3] [OpenMP][USM] Adds test for -fopenmp-force-usm flag

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.

Marked as XFAIL to first land test and then enable in subsequent patch.
---
 clang/test/OpenMP/force-usm.c | 74 +++
 1 file changed, 74 insertions(+)
 create mode 100644 clang/test/OpenMP/force-usm.c

diff --git 

[flang] [mlir] [clang] [Flang][OpenMP][MLIR] Add support for -nogpulib option (PR #71045)

2024-01-08 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

We have the buildbot now up in staging: 
https://lab.llvm.org/staging/#/builders/188

https://github.com/llvm/llvm-project/pull/71045
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)

2024-01-04 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

The IR is impacted for the global that is in that test case. Lines ~46-50 
(first IR section) vs line ~68 (second IR section). The remaining code is 
indeed the same.
The way that this test is executed is out of date however, given that I have 
reimplemented the flag.
My plan is to keep this test and add a few runtime tests as well, since we 
should be able to validate that we do not see data transfers.

https://github.com/llvm/llvm-project/pull/75467
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2023-12-29 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

> > Is the approach taken in this approach acceptable as opposed to the header 
> > solution I put up earlier?
> 
> Yes, it's pretty much exactly what I had in mind from my suggestion in the 
> last PR. Thanks.

Perfect. I'll go ahead and add lit and runtime tests.

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2023-12-29 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

Is the approach taken in this approach acceptable as opposed to the header 
solution I put up earlier?

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-29 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2023-12-29 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr created 
https://github.com/llvm/llvm-project/pull/76571

This flag forces the compiler to generate code for OpenMP target regions as if 
the user specified the #pragma omp requires unified_shared_memory in each 
source file.

The option does not have a -fno-* friend since OpenMP requires the 
unified_shared_memory clause to be present in all source files. Since this flag 
does no harm if the clause is present, it can be used in conjunction. My 
understanding is that USM should not be turned off selectively, hence, no -fno- 
version.

In favor of https://github.com/llvm/llvm-project/pull/75468, sorry for the 
noise and confusion.

>From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Fri, 29 Dec 2023 04:32:24 -0500
Subject: [PATCH] [OpenMP][USM] Introduces -fopenmp-force-usm flag

This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.
---
 clang/include/clang/Basic/LangOptions.def | 1 +
 clang/include/clang/Driver/Options.td | 4 
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++
 clang/lib/Driver/ToolChains/Clang.cpp | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/clang/include/clang/Basic/LangOptions.def 
b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac..81cf2ad9498a7f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription  , 1, 0, "Assume distributed 
loops do not have mo
 LANGOPT(OpenMPNoThreadState  , 1, 0, "Assume that no thread in a parallel 
region will modify an ICV.")
 LANGOPT(OpenMPNoNestedParallelism  , 1, 0, "Assume that no thread in a 
parallel region will encounter a parallel region")
 LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory 
and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode 
via compiler.")
 LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU 
libraries.")
 LANGOPT(RenderScript  , 1, 0, "RenderScript")
 
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..28290da438c62d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], 
"fopenmp-offload-mandatory">, Group<
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not create a host fallback if offloading to the device fails.">,
   MarshallingInfoFlag>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Force behvaior as if the user specified pragma omp requires 
unified_shared_memory.">,
+  MarshallingInfoFlag>;
 def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group,
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
   HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies 
-foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e8321..4855e7410a015a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule )
  ? CGM.getLangOpts().OMPHostIRFile
  : StringRef{});
   OMPBuilder.setConfig(Config);
+
+  // The user forces the compiler to behave as if omp requires
+  // unified_shared_memory was given.
+  if (CGM.getLangOpts().OpenMPForceUSM) {
+HasRequiresUnifiedSharedMemory = true;
+OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+  }
 }
 
 void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d..ffc24201ab2e0b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
   if (Args.hasArg(options::OPT_fopenmp_offload_mandatory))
 CmdArgs.push_back("-fopenmp-offload-mandatory");
+  if (Args.hasArg(options::OPT_fopenmp_force_usm))
+CmdArgs.push_back("-fopenmp-force-usm");
   break;
 default:
   // By default, if Clang doesn't know how to generate useful OpenMP code


[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-29 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

Hmm.. I guess I screwed something up with git and the history.

https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)

2023-12-29 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

I updated the feature PR (#75468) with a different solution. Will update the 
test after feedback if the route I took in the other PR is seen as OK.

https://github.com/llvm/llvm-project/pull/75467
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-29 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/75468

>From 8f381c760fca8a4abd7550c492ff22fa8972933a Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Thu, 6 Jul 2023 16:47:21 -0400
Subject: [PATCH 1/3] [OpenMP] Introduce -fopenmp-force-usm flag

The new flag implements logic to include #pragma omp requires
unified_shared_memory in every translation unit.
This enables a straightforward way to enable USM for an application
without the need to modify sources.
---
 clang/include/clang/Driver/Options.td|  2 ++
 clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 
 clang/lib/Headers/CMakeLists.txt |  1 +
 .../lib/Headers/openmp_wrappers/usm/force_usm.h  |  6 ++
 4 files changed, 25 insertions(+)
 create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..e33bc7d1b10d71 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 
 
//===--===//
 // Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b012b7cb729378..a077f2f06d7728 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList 
) const {
 void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
 const ArgList , ArgStringList ) const {
   HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+  CC1Args.push_back("-internal-isystem");
+  SmallString<128> P(HostTC.getDriver().ResourceDir);
+  llvm::sys::path::append(P, "include/cuda_wrappers");
+  CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+  // Force USM mode will forcefully include #pragma omp requires
+  // unified_shared_memory via the force_usm header
+  // XXX This may result in a compilation error if the source
+  // file already includes that pragma.
+  if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+CC1Args.push_back("-include");
+CC1Args.push_back(
+DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));
+  }
 }
 
 void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList ,
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index 735e4e4e3be89b..ed491779abcd00 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -320,6 +320,7 @@ set(openmp_wrapper_files
   openmp_wrappers/__clang_openmp_device_functions.h
   openmp_wrappers/complex_cmath.h
   openmp_wrappers/new
+  openmp_wrappers/usm/force_usm.h
 )
 
 set(llvm_libc_wrapper_files
diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h 
b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
new file mode 100644
index 00..15c394e27ce9c2
--- /dev/null
+++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
@@ -0,0 +1,6 @@
+#ifndef __CLANG_FORCE_OPENMP_USM
+#define __CLANG_FORCE_OPENMP_USM
+
+#pragma omp requires unified_shared_memory
+
+#endif

>From 4d5a1f670b3bdd5b183515e347610414cb12cb90 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Fri, 29 Dec 2023 04:33:19 -0500
Subject: [PATCH 2/3] Revert "[OpenMP] Introduce -fopenmp-force-usm flag"

This reverts commit 4ecd07d786a5a994b33b9177d4e21d839bfe3fc9.

To test the other solution.
---
 clang/include/clang/Driver/Options.td|  2 --
 clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 
 clang/lib/Headers/CMakeLists.txt |  1 -
 .../lib/Headers/openmp_wrappers/usm/force_usm.h  |  6 --
 4 files changed, 25 deletions(-)
 delete mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index e33bc7d1b10d71..2b93ddf033499c 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,8 +3381,6 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   

[openmp] [clang] [Clang][OpenMP] Fix mapping of structs to device (PR #75642)

2023-12-18 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

It appears that this patch made the buildbot unhappy 
(https://lab.llvm.org/buildbot/#/builders/193/builds/43948).

Let me know if you need help with this.

https://github.com/llvm/llvm-project/pull/75642
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-15 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/75468

>From 4ecd07d786a5a994b33b9177d4e21d839bfe3fc9 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Thu, 6 Jul 2023 16:47:21 -0400
Subject: [PATCH] [OpenMP] Introduce -fopenmp-force-usm flag

The new flag implements logic to include #pragma omp requires
unified_shared_memory in every translation unit.
This enables a straightforward way to enable USM for an application
without the need to modify sources.
---
 clang/include/clang/Driver/Options.td|  2 ++
 clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 
 clang/lib/Headers/CMakeLists.txt |  1 +
 .../lib/Headers/openmp_wrappers/usm/force_usm.h  |  6 ++
 4 files changed, 25 insertions(+)
 create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1b02087425b751..73325d5620cc10 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 
 
//===--===//
 // Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b012b7cb729378..a077f2f06d7728 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList 
) const {
 void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
 const ArgList , ArgStringList ) const {
   HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+  CC1Args.push_back("-internal-isystem");
+  SmallString<128> P(HostTC.getDriver().ResourceDir);
+  llvm::sys::path::append(P, "include/cuda_wrappers");
+  CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+  // Force USM mode will forcefully include #pragma omp requires
+  // unified_shared_memory via the force_usm header
+  // XXX This may result in a compilation error if the source
+  // file already includes that pragma.
+  if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+CC1Args.push_back("-include");
+CC1Args.push_back(
+DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));
+  }
 }
 
 void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList ,
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index f8fdd402777e48..aac232fa8b4405 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -319,6 +319,7 @@ set(openmp_wrapper_files
   openmp_wrappers/__clang_openmp_device_functions.h
   openmp_wrappers/complex_cmath.h
   openmp_wrappers/new
+  openmp_wrappers/usm/force_usm.h
 )
 
 set(llvm_libc_wrapper_files
diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h 
b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
new file mode 100644
index 00..15c394e27ce9c2
--- /dev/null
+++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
@@ -0,0 +1,6 @@
+#ifndef __CLANG_FORCE_OPENMP_USM
+#define __CLANG_FORCE_OPENMP_USM
+
+#pragma omp requires unified_shared_memory
+
+#endif

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Jan Patrick Lehr via cfe-commits


@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;

jplehr wrote:

With the intent to remove the USM behavior from a codebase that has the 
requires pragma, by basically just ignoring it?

https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)

2023-12-14 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/75467

>From d3d073d7f57f2a5d06cd8c1de8c1503034af3b6b Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Wed, 12 Jul 2023 05:04:41 -0400
Subject: [PATCH] [OpenMP][USM] Adds test for -fopenmp-force-usm flag

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.

Marked as XFAIL to first land test and then enable in subsequent patch.
---
 clang/test/OpenMP/force-usm.c | 74 +++
 1 file changed, 74 insertions(+)
 create mode 100644 clang/test/OpenMP/force-usm.c

diff --git a/clang/test/OpenMP/force-usm.c b/clang/test/OpenMP/force-usm.c
new file mode 100644
index 00..f04d499e5f71c2
--- /dev/null
+++ b/clang/test/OpenMP/force-usm.c
@@ -0,0 +1,74 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --include-generated-funcs --replace-value-regex 
"__omp_offloading_[0-9a-z]+_[0-9a-z]+" "pl_cond[.].+[.|,]" 
--prefix-filecheck-ir-name _ --version 3
+// REQUIRES: amdgpu-registered-target
+// XFAIL: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -include 
%S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -emit-llvm-bc %s -o 
%t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -include 
%S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck 
-check-prefix=CHECK-USM %s
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck 
-check-prefix=CHECK-DEFAULT %s
+// expected-no-diagnostics
+
+extern "C" void *malloc(unsigned int b);
+
+int GI;
+#pragma omp declare target
+int *pGI;
+#pragma omp end declare target
+
+int main(void) {
+
+  GI = 0;
+
+  pGI = (int *) malloc(sizeof(int));
+  *pGI = 42;
+
+#pragma omp target map(pGI[:1], GI)
+  {
+GI = 1;
+*pGI = 2;
+  }
+
+  return 0;
+}
+
+// CHECK-USM-LABEL: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25
+// CHECK-USM-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) 
#[[ATTR0:[0-9]+]] {
+// CHECK-USM-NEXT:  entry:
+// CHECK-USM-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-USM-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[GI_ADDR]] to ptr
+// CHECK-USM-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-USM-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-USM-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr 
addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-USM-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-USM-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], 
label [[WORKER_EXIT:%.*]]
+// CHECK-USM:   user_code.entry:
+// CHECK-USM-NEXT:store i32 1, ptr [[TMP0]], align 4
+// CHECK-USM-NEXT:[[TMP2:%.*]] = load ptr, ptr @pGI_decl_tgt_ref_ptr, 
align 8
+// CHECK-USM-NEXT:[[TMP3:%.*]] = load ptr, ptr [[TMP2]], align 8
+// CHECK-USM-NEXT:store i32 2, ptr [[TMP3]], align 4
+// CHECK-USM-NEXT:call void @__kmpc_target_deinit(ptr addrspacecast (ptr 
addrspace(1) @[[GLOB1]] to ptr), i8 1)
+// CHECK-USM-NEXT:ret void
+// CHECK-USM:   worker.exit:
+// CHECK-USM-NEXT:ret void
+//
+//
+// CHECK-DEFAULT-LABEL: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25
+// CHECK-DEFAULT-SAME: (ptr noundef nonnull align 4 dereferenceable(4) 
[[GI:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-DEFAULT-NEXT:  entry:
+// CHECK-DEFAULT-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-DEFAULT-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr 
addrspace(5) [[GI_ADDR]] to ptr
+// CHECK-DEFAULT-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-DEFAULT-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], 
align 8
+// CHECK-DEFAULT-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr 
addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-DEFAULT-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-DEFAULT-NEXT:br i1 [[EXEC_USER_CODE]], label 
[[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK-DEFAULT:   user_code.entry:
+// CHECK-DEFAULT-NEXT:store i32 1, ptr [[TMP0]], align 4
+// CHECK-DEFAULT-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspacecast (ptr 
addrspace(1) @pGI to ptr), align 8
+// 

[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Jan Patrick Lehr via cfe-commits


@@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList 
) const {
 void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
 const ArgList , ArgStringList ) const {
   HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+  CC1Args.push_back("-internal-isystem");
+  SmallString<128> P(HostTC.getDriver().ResourceDir);
+  llvm::sys::path::append(P, "include/cuda_wrappers");
+  CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+  // Force USM mode will forcefully include #pragma omp requires
+  // unified_shared_memory via the force_usm header
+  // XXX This may result in a compilation error if the source
+  // file already includes that pragma.
+  if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+CC1Args.push_back("-include");
+CC1Args.push_back(
+DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));

jplehr wrote:

I'm happy to change that to something more reasonable, if you can point out 
where to look for inspiration on how to do it properly.

https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr updated 
https://github.com/llvm/llvm-project/pull/75468

>From 9809ba1ec31cb1a4a066f709ae8bd3e965e1 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Thu, 6 Jul 2023 16:47:21 -0400
Subject: [PATCH] [OpenMP] Introduce -fopenmp-force-usm flag

The new flag implements logic to include #pragma omp requires
unified_shared_memory in every translation unit.
This enables a straightforward way to enable USM for an application
without the need to modify sources.
---
 clang/include/clang/Driver/Options.td|  2 ++
 clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 
 clang/lib/Headers/CMakeLists.txt |  1 +
 .../lib/Headers/openmp_wrappers/usm/force_usm.h  |  6 ++
 4 files changed, 25 insertions(+)
 create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1b02087425b751..b9cd3043a13a9a 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;
 
 
//===--===//
 // Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b012b7cb729378..a077f2f06d7728 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList 
) const {
 void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
 const ArgList , ArgStringList ) const {
   HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+  CC1Args.push_back("-internal-isystem");
+  SmallString<128> P(HostTC.getDriver().ResourceDir);
+  llvm::sys::path::append(P, "include/cuda_wrappers");
+  CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+  // Force USM mode will forcefully include #pragma omp requires
+  // unified_shared_memory via the force_usm header
+  // XXX This may result in a compilation error if the source
+  // file already includes that pragma.
+  if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+CC1Args.push_back("-include");
+CC1Args.push_back(
+DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));
+  }
 }
 
 void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList ,
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index f8fdd402777e48..aac232fa8b4405 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -319,6 +319,7 @@ set(openmp_wrapper_files
   openmp_wrappers/__clang_openmp_device_functions.h
   openmp_wrappers/complex_cmath.h
   openmp_wrappers/new
+  openmp_wrappers/usm/force_usm.h
 )
 
 set(llvm_libc_wrapper_files
diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h 
b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
new file mode 100644
index 00..15c394e27ce9c2
--- /dev/null
+++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
@@ -0,0 +1,6 @@
+#ifndef __CLANG_FORCE_OPENMP_USM
+#define __CLANG_FORCE_OPENMP_USM
+
+#pragma omp requires unified_shared_memory
+
+#endif

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr created 
https://github.com/llvm/llvm-project/pull/75468

The new flag implements logic to include `#pragma omp requires 
unified_shared_memory` in every translation unit.
This enables a straightforward way to enable USM for an application without the 
need to modify sources.

This is the flag mentioned in https://github.com/llvm/llvm-project/pull/75467
Once the test landed, I'll rebase and enable the test with this patch.

>From bc912bf0a63e6d10b60655d26846731d961021f3 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Thu, 6 Jul 2023 16:47:21 -0400
Subject: [PATCH] [OpenMP] Introduce -fopenmp-force-usm flag

The new flag implements logic to include #pragma omp requires
unified_shared_memory in every translation unit.
This enables a straightforward way to enable USM for an application
without the need to modify sources.
---
 clang/include/clang/Driver/Options.td |  2 ++
 clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp  | 14 ++
 clang/lib/Headers/CMakeLists.txt  |  1 +
 clang/lib/Headers/openmp_wrappers/usm/force_usm.h |  6 ++
 4 files changed, 23 insertions(+)
 create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1b02087425b751..b9cd3043a13a9a 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;
 
 
//===--===//
 // Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b012b7cb729378..2484a59085c276 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -129,6 +129,20 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList 
) const {
 void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
 const ArgList , ArgStringList ) const {
   HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+  CC1Args.push_back("-internal-isystem");
+  SmallString<128> P(HostTC.getDriver().ResourceDir);
+  llvm::sys::path::append(P, "include/cuda_wrappers");
+  CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+  // Force APU mode will focefully include #pragma omp requires
+  // unified_shared_memory via the force_usm header
+  if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+CC1Args.push_back("-include");
+CC1Args.push_back(
+DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));
+  }
 }
 
 void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList ,
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index f8fdd402777e48..aac232fa8b4405 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -319,6 +319,7 @@ set(openmp_wrapper_files
   openmp_wrappers/__clang_openmp_device_functions.h
   openmp_wrappers/complex_cmath.h
   openmp_wrappers/new
+  openmp_wrappers/usm/force_usm.h
 )
 
 set(llvm_libc_wrapper_files
diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h 
b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
new file mode 100644
index 00..15c394e27ce9c2
--- /dev/null
+++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
@@ -0,0 +1,6 @@
+#ifndef __CLANG_FORCE_OPENMP_USM
+#define __CLANG_FORCE_OPENMP_USM
+
+#pragma omp requires unified_shared_memory
+
+#endif

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)

2023-12-14 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr created 
https://github.com/llvm/llvm-project/pull/75467

This adds a basic test to check the correct generation of double indirect 
access to declare target globals in USM mode vs non-USM mode.
I am a bit unhappy with the way this test is set up, but could not find a 
better way to do it. Happy to improve that and add more tests then.

Marked as XFAIL to first land test and then enable in subsequent patch.

>From ea2a9191122c5659aac380803b381f763c816e07 Mon Sep 17 00:00:00 2001
From: JP Lehr 
Date: Wed, 12 Jul 2023 05:04:41 -0400
Subject: [PATCH] [OpenMP][USM] Adds test for -fopenmp-force-usm flag

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.

Marked as XFAIL to first land test and then enable in subsequent patch.
---
 clang/test/OpenMP/force-usm.c | 73 +++
 1 file changed, 73 insertions(+)
 create mode 100644 clang/test/OpenMP/force-usm.c

diff --git a/clang/test/OpenMP/force-usm.c b/clang/test/OpenMP/force-usm.c
new file mode 100644
index 00..222705322b8976
--- /dev/null
+++ b/clang/test/OpenMP/force-usm.c
@@ -0,0 +1,73 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --include-generated-funcs --replace-value-regex 
"__omp_offloading_[0-9a-z]+_[0-9a-z]+" "pl_cond[.].+[.|,]" 
--prefix-filecheck-ir-name _ --version 3
+// XFAIL: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -include 
%S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -emit-llvm-bc %s -o 
%t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -include 
%S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck 
-check-prefix=CHECK-USM %s
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck 
-check-prefix=CHECK-DEFAULT %s
+// expected-no-diagnostics
+
+extern "C" void *malloc(unsigned int b);
+
+int GI;
+#pragma omp declare target
+int *pGI;
+#pragma omp end declare target
+
+int main(void) {
+
+  GI = 0;
+
+  pGI = (int *) malloc(sizeof(int));
+  *pGI = 42;
+
+#pragma omp target map(pGI[:1], GI)
+  {
+GI = 1;
+*pGI = 2;
+  }
+
+  return 0;
+}
+
+// CHECK-USM-LABEL: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25
+// CHECK-USM-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) 
#[[ATTR0:[0-9]+]] {
+// CHECK-USM-NEXT:  entry:
+// CHECK-USM-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-USM-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[GI_ADDR]] to ptr
+// CHECK-USM-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-USM-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-USM-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr 
addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-USM-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-USM-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], 
label [[WORKER_EXIT:%.*]]
+// CHECK-USM:   user_code.entry:
+// CHECK-USM-NEXT:store i32 1, ptr [[TMP0]], align 4
+// CHECK-USM-NEXT:[[TMP2:%.*]] = load ptr, ptr @pGI_decl_tgt_ref_ptr, 
align 8
+// CHECK-USM-NEXT:[[TMP3:%.*]] = load ptr, ptr [[TMP2]], align 8
+// CHECK-USM-NEXT:store i32 2, ptr [[TMP3]], align 4
+// CHECK-USM-NEXT:call void @__kmpc_target_deinit(ptr addrspacecast (ptr 
addrspace(1) @[[GLOB1]] to ptr), i8 1)
+// CHECK-USM-NEXT:ret void
+// CHECK-USM:   worker.exit:
+// CHECK-USM-NEXT:ret void
+//
+//
+// CHECK-DEFAULT-LABEL: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25
+// CHECK-DEFAULT-SAME: (ptr noundef nonnull align 4 dereferenceable(4) 
[[GI:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-DEFAULT-NEXT:  entry:
+// CHECK-DEFAULT-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-DEFAULT-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr 
addrspace(5) [[GI_ADDR]] to ptr
+// CHECK-DEFAULT-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8
+// CHECK-DEFAULT-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], 
align 8
+// CHECK-DEFAULT-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr 
addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-DEFAULT-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-DEFAULT-NEXT:   

[clang] [openmp] [OpenMP] Fix runtime problem due to wrong map size. (PR #74692)

2023-12-07 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

The issue comes from the bot building without setting CMake option 
`-DLIBOMPTARGET_ENABLE_DEBUG=ON`. This makes the environment variable 
`LIBOMPTARGET_DEBUG=1` have no effect in the test, i.e., no output for 
FileCheck to test.

https://github.com/llvm/llvm-project/pull/74692
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Fix runtime problem due to wrong map size. (PR #74692)

2023-12-07 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

I'm looking into it locally

https://github.com/llvm/llvm-project/pull/74692
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Fix runtime problem due to wrong map size. (PR #74692)

2023-12-07 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

It appears that this breaks the AMDGPU OpenMP Offload buildbot: 
https://lab.llvm.org/buildbot/#/builders/193/builds/43297

https://github.com/llvm/llvm-project/pull/74692
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [mlir] [clang] [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (PR #70401)

2023-11-30 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

This was brought up and discussed in the weekly meeting.

https://github.com/llvm/llvm-project/pull/70401
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy , DeviceImageTy ,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy  = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();

jplehr wrote:

That would certainly make it more obvious.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   Error synchronize(__tgt_async_info *AsyncInfo);
   virtual Error synchronizeImpl(__tgt_async_info ) = 0;
 
+  /// Invokes any global constructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalConstructors(GenericPluginTy ,
+   DeviceImageTy ) {
+return Error::success();

jplehr wrote:

Plugin::success()

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   Error synchronize(__tgt_async_info *AsyncInfo);
   virtual Error synchronizeImpl(__tgt_async_info ) = 0;
 
+  /// Invokes any global constructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalConstructors(GenericPluginTy ,
+   DeviceImageTy ) {
+return Error::success();
+  }
+
+  /// Invokes any global destructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalDestructors(GenericPluginTy ,
+  DeviceImageTy ) {
+return Error::success();

jplehr wrote:

Plugin::success()

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr commented:

Thanks Joseph. Another two nits.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy , DeviceImageTy ,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy  = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();

jplehr wrote:

Is there a specific reason we do not return the error here, but instead consume 
and return success?

Also, I think this should be `Plugin::success()` to not deviate from what is 
used in the plugin.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy , DeviceImageTy ,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy  = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();
+}
+
+// Allocate and construct the AMDGPU kernel.
+GenericKernelTy *AMDGPUKernel = Plugin.allocate();
+if (!AMDGPUKernel)
+  return Plugin::error("Failed to allocate memory for AMDGPU kernel");
+
+new (AMDGPUKernel) AMDGPUKernelTy(Name);
+if (auto Err = AMDGPUKernel->initImpl(*this, Image))
+  return std::move(Err);
+
+auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>();
+AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr);
+
+if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper))
+  return std::move(Err);
+
+KernelArgsTy KernelArgs = {};
+if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u,
+/*NumBlocks=*/1ul, KernelArgs,
+/*Args=*/nullptr, 
AsyncInfoWrapper))
+  return std::move(Err);
+
+if (auto Err = synchronize(AsyncInfoPtr))
+  return std::move(Err);
+Error Err = Error::success();

jplehr wrote:

Should this be `Plugin::success()` instead here as well?

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr commented:

I have only briefly looked at the NVPTX implementation.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Introduce the initial support for OpenMP kernel language (PR #66844)

2023-09-29 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

It seems that this broke the AMDGPU OpenMP buildbot 
https://lab.llvm.org/buildbot/#/builders/193/builds/39393
I saw that you have since pushed up one patch regarding pointer compares. Are 
you looking at the remaining test fails as well?

https://github.com/llvm/llvm-project/pull/66844
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (PR #66882)

2023-09-25 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

Sure, I'll look into it later today and get back to you.

https://github.com/llvm/llvm-project/pull/66882
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (PR #66882)

2023-09-25 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

It appears that this change made the AMDGPU OpenMP buildbot unhappy
https://lab.llvm.org/buildbot/#/builders/193/builds/39050

https://github.com/llvm/llvm-project/pull/66882
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][OpenMP] Clang adding the addrSpace according to DataLayout fix (PR #65483)

2023-09-12 Thread Jan Patrick Lehr via cfe-commits

jplehr wrote:

I believe this broke one of the AMDGPU OpenMP buildbots 
https://lab.llvm.org/staging/#/builders/247/builds/6351 

https://github.com/llvm/llvm-project/pull/65483
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits