[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-24 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> Thank you ever so much for the review @VyacheslavLevytskyy! I will create a 
> PR for the Translator as well, since there's some handling missing there; I 
> will refer to it here for future readers. Final check: are you OK with the 
> OpenCL changes @yxsamliu?

LGTM

https://github.com/llvm/llvm-project/pull/106429
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Use original file path for CUID (PR #107734)

2024-09-14 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/107734
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-11 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.

LGTM. Thanks

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-10 Thread Yaxun Liu via cfe-commits


@@ -905,10 +907,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
 GpuBinaryHandle = new llvm::GlobalVariable(
 TheModule, PtrTy, /*isConstant=*/false, Linkage,
 /*Initializer=*/
-CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,

yxsamliu wrote:

or change it to:

`RelocatableDeviceCode ? nullptr : llvm::ConstantPointerNull::get(PtrTy)`

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-10 Thread Yaxun Liu via cfe-commits


@@ -175,7 +175,6 @@ __device__ void device_use() {
 // HIP-SAME: section ".hipFatBinSegment"
 // * variable to save GPU binary handle after initialization
 // CUDANORDC: @__[[PREFIX]]_gpubin_handle = internal global ptr null
-// HIPNEF: @__[[PREFIX]]_gpubin_handle_{{[0-9a-f]+}} = external hidden global 
ptr, align 8

yxsamliu wrote:

should not remove the check line

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-10 Thread Yaxun Liu via cfe-commits


@@ -30,8 +28,6 @@
 // RUN:   2>&1 | FileCheck -check-prefix=LD-R %s
 // LD-R: Found undefined HIP fatbin symbol: __hip_fatbin_[[ID1:[0-9a-f]+]]
 // LD-R: Found undefined HIP fatbin symbol: __hip_fatbin_[[ID2:[0-9a-f]+]]
-// LD-R: Found undefined HIP gpubin handle symbol: __hip_gpubin_handle_[[ID1]]

yxsamliu wrote:

these lines should not be changed. same as below

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-10 Thread Yaxun Liu via cfe-commits


@@ -905,10 +907,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
 GpuBinaryHandle = new llvm::GlobalVariable(
 TheModule, PtrTy, /*isConstant=*/false, Linkage,
 /*Initializer=*/
-CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,

yxsamliu wrote:

this line should not be changed

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][HIP] Target-dependent overload resolution in declarators and specifiers (PR #103031)

2024-09-10 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,703 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only 
-verify=expected,onhost %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device 
-verify=expected,ondevice %s
+
+
+// Tests to ensure that functions with host and device overloads in that are
+// called outside of function bodies and variable initializers, e.g., in
+// template arguments are resolved with respect to the declaration to which 
they
+// belong.
+
+// Opaque types used for tests:
+struct DeviceTy {};
+struct HostTy {};
+struct HostDeviceTy {};
+struct TemplateTy {};
+
+struct TrueTy { static const bool value = true; };
+struct FalseTy { static const bool value = false; };
+
+// Select one of two types based on a boolean condition.
+template  struct select_type {};
+template  struct select_type { typedef T 
type; };
+template  struct select_type { typedef F 
type; };
+
+template  struct check : public select_type { };
+
+// Check if two types are the same.
+template struct is_same : public FalseTy { };
+template struct is_same : public TrueTy { };
+
+// A static assertion that fails at compile time if the expression E does not
+// have type T.
+#define ASSERT_HAS_TYPE(E, T) static_assert(is_same::value);
+
+
+// is_on_device() is true when called in a device context and false if called 
in a host context.
+__attribute__((host)) constexpr bool is_on_device(void) { return false; }
+__attribute__((device)) constexpr bool is_on_device(void) { return true; }
+
+
+// this type depends on whether it occurs in host or device code
+#define targetdep_t select_type::type
+
+// Defines and typedefs with different values in host and device compilation.
+#ifdef __CUDA_ARCH__
+#define CurrentTarget DEVICE
+typedef DeviceTy CurrentTargetTy;
+typedef DeviceTy TemplateIfHostTy;
+#else
+#define CurrentTarget HOST
+typedef HostTy CurrentTargetTy;
+typedef TemplateTy TemplateIfHostTy;
+#endif
+
+
+
+// targetdep_t in function declarations should depend on the target of the
+// declared function.
+__attribute__((device)) targetdep_t decl_ret_early_device(void);
+ASSERT_HAS_TYPE(decl_ret_early_device(), DeviceTy)
+
+__attribute__((host)) targetdep_t decl_ret_early_host(void);
+ASSERT_HAS_TYPE(decl_ret_early_host(), HostTy)
+
+__attribute__((host,device)) targetdep_t decl_ret_early_host_device(void);
+ASSERT_HAS_TYPE(decl_ret_early_host_device(), CurrentTargetTy)
+
+// If the function target is specified too late and can therefore not be
+// considered for overload resolution in targetdep_t, warn.
+targetdep_t __attribute__((device)) decl_ret_late_device(void); // 
expected-warning {{target attribute has been ignored for overload resolution}}
+ASSERT_HAS_TYPE(decl_ret_late_device(), HostTy)
+
+// No warning necessary if the ignored attribute doesn't change the result.
+targetdep_t __attribute__((host)) decl_ret_late_host(void);
+ASSERT_HAS_TYPE(decl_ret_late_host(), HostTy)
+
+targetdep_t __attribute__((host,device)) decl_ret_late_host_device(void); // 
expected-warning {{target attribute has been ignored for overload resolution}}
+ASSERT_HAS_TYPE(decl_ret_late_host_device(), HostTy)
+
+// An odd way of writing this, but it's possible.
+__attribute__((device)) targetdep_t __attribute__((host)) 
decl_ret_early_device_late_host(void); // expected-warning {{target attribute 
has been ignored for overload resolution}}
+ASSERT_HAS_TYPE(decl_ret_early_device_late_host(), DeviceTy)
+
+
+// The same for function definitions and parameter types:
+__attribute__((device)) targetdep_t ret_early_device(targetdep_t x) {
+  ASSERT_HAS_TYPE(ret_early_device({}), DeviceTy)
+  ASSERT_HAS_TYPE(x, DeviceTy)
+  return {};
+}
+
+__attribute__((host)) targetdep_t ret_early_host(targetdep_t x) {
+  ASSERT_HAS_TYPE(ret_early_host({}), HostTy)
+  ASSERT_HAS_TYPE(x, HostTy)
+  return {};
+}
+
+__attribute__((host, device)) targetdep_t ret_early_hostdevice(targetdep_t x) {
+  ASSERT_HAS_TYPE(ret_early_hostdevice({}), CurrentTargetTy)
+  ASSERT_HAS_TYPE(x, CurrentTargetTy)
+  return {};
+}
+
+// The parameter is still after the attribute, so it needs no warning.
+targetdep_t __attribute__((device)) // expected-warning {{target attribute has 
been ignored for overload resolution}}
+ret_late_device(targetdep_t x) {
+  ASSERT_HAS_TYPE(ret_late_device({}), HostTy)
+  ASSERT_HAS_TYPE(x, DeviceTy)
+  return {};
+}
+
+targetdep_t __attribute__((host, device)) // expected-warning {{target 
attribute has been ignored for overload resolution}}
+ret_late_hostdevice(targetdep_t x) {
+  ASSERT_HAS_TYPE(ret_late_hostdevice({}), HostTy)
+  ASSERT_HAS_TYPE(x, CurrentTargetTy)
+  return {};
+}
+
+targetdep_t __attribute__((host)) ret_late_host(targetdep_t x) {
+  ASSERT_HAS_TYPE(ret_late_host({}), HostTy)
+  ASSERT_HAS_TYPE(x, HostTy)
+  return {};
+}
+
+__attribute__((device)) targetdep_t __attribute__((host)) // expected-warning 
{{target attribute has been ignored for overload resolution}}
+ret_early_device_late_host(targetdep_t x) 

[clang] [NFC][AMDGPU][Driver] Move 'shouldSkipSanitizeOption' utility to AMDGPU. (PR #107997)

2024-09-10 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/107997
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA/HIP] propagate -cuid to a host-only compilation. (PR #107483)

2024-09-07 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.

LGTM. Thanks

https://github.com/llvm/llvm-project/pull/107483
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Use original file path for CUID (PR #107734)

2024-09-07 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/107734

to avoid being nondeterministic due to random path in distributed build.

>From 725953ccbdb1f57eaac234cf5729f64a9fdbce13 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Sat, 7 Sep 2024 20:32:48 -0400
Subject: [PATCH] [HIP] Use original file path for CUID

to avoid being nondeterministic due to random path
in distributed build.
---
 clang/lib/Driver/Driver.cpp |  5 +
 clang/test/Driver/hip-cuid-hash.hip | 12 +++-
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 5b3783e20eabba..6a25ca4de137b6 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3040,10 +3040,7 @@ class OffloadingActionBuilder final {
   else if (UseCUID == CUID_Hash) {
 llvm::MD5 Hasher;
 llvm::MD5::MD5Result Hash;
-SmallString<256> RealPath;
-llvm::sys::fs::real_path(IA->getInputArg().getValue(), RealPath,
- /*expand_tilde=*/true);
-Hasher.update(RealPath);
+Hasher.update(IA->getInputArg().getValue());
 for (auto *A : Args) {
   if (A->getOption().matches(options::OPT_INPUT))
 continue;
diff --git a/clang/test/Driver/hip-cuid-hash.hip 
b/clang/test/Driver/hip-cuid-hash.hip
index 103a1cbf26d50a..6987a98470c6c7 100644
--- a/clang/test/Driver/hip-cuid-hash.hip
+++ b/clang/test/Driver/hip-cuid-hash.hip
@@ -1,13 +1,15 @@
 // Check CUID generated by hash.
 // The same CUID is generated for the same file with the same options.
 
+// RUN: cd %S
+
 // RUN: %clang -### -x hip --target=x86_64-unknown-linux-gnu 
--no-offload-new-driver \
 // RUN:   --offload-arch=gfx906 -c -nogpuinc -nogpulib -fuse-cuid=hash \
-// RUN:   %S/Inputs/hip_multiple_inputs/a.cu >%t.out 2>&1
+// RUN:   Inputs/hip_multiple_inputs/a.cu >%t.out 2>&1
 
 // RUN: %clang -### -x hip --target=x86_64-unknown-linux-gnu 
--no-offload-new-driver \
 // RUN:   --offload-arch=gfx906 -c -nogpuinc -nogpulib -fuse-cuid=hash \
-// RUN:   %S/Inputs/hip_multiple_inputs/a.cu >>%t.out 2>&1
+// RUN:   Inputs/hip_multiple_inputs/a.cu >>%t.out 2>&1
 
 // RUN: FileCheck %s -check-prefixes=SAME -input-file %t.out
 
@@ -16,15 +18,15 @@
 
 // RUN: %clang -### -x hip --target=x86_64-unknown-linux-gnu -DX=1 
--no-offload-new-driver \
 // RUN:   --offload-arch=gfx906 -c -nogpuinc -nogpulib -fuse-cuid=hash \
-// RUN:   %S/Inputs/hip_multiple_inputs/a.cu >%t.out 2>&1
+// RUN:   Inputs/hip_multiple_inputs/a.cu >%t.out 2>&1
 
 // RUN: %clang -### -x hip --target=x86_64-unknown-linux-gnu -DX=2 
--no-offload-new-driver \
 // RUN:   --offload-arch=gfx906 -c -nogpuinc -nogpulib -fuse-cuid=hash \
-// RUN:   %S/Inputs/../Inputs/hip_multiple_inputs/a.cu >>%t.out 2>&1
+// RUN:   Inputs/../Inputs/hip_multiple_inputs/a.cu >>%t.out 2>&1
 
 // RUN: FileCheck %s -check-prefixes=DIFF -input-file %t.out
 
-// SAME: "-cc1"{{.*}} "-target-cpu" "gfx906" {{.*}}"-cuid=[[CUID:[0-9a-f]+]]"
+// SAME: "-cc1"{{.*}} "-target-cpu" "gfx906" 
{{.*}}"-cuid=[[CUID:3c08c1ef86ef439d]]"
 // SAME: "-cc1"{{.*}} "-target-cpu" "gfx906" {{.*}}"-cuid=[[CUID]]"
 
 // DIFF: "-cc1"{{.*}} "-target-cpu" "gfx906" {{.*}}"-cuid=[[CUID:[0-9a-f]+]]"

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-06 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

you need to update lit test clang/test/CodeGenCUDA/device-stub.cu as it fails 
now

[clang/test/CodeGenCUDA/device-stub.cu](https://buildkite.com/llvm-project/github-pull-requests/builds/98392#0191c73e-12f8-4cf3-9618-d3fd752f9149)

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-05 Thread Yaxun Liu via cfe-commits


@@ -840,8 +840,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
   FatBinStr = new llvm::GlobalVariable(
   CGM.getModule(), CGM.Int8Ty,
   /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
-  "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr,
-  llvm::GlobalVariable::NotThreadLocal);
+  "__hip_fatbin" + (CGM.getLangOpts().CUID.empty()
+? ""

yxsamliu wrote:

this change is unnecessary and may break some existing app

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][CodeGen] Handle hip bin symbols properly. (PR #107458)

2024-09-05 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

The current behavior of clang is expected.

when gpu binary is not specified, it is expected to be used for -fgpu-rdc and 
the __hip_gpubin_handle_ symbol needs to be external and unique since they may 
need to be merged for partial linking. Make them internal will break partial 
linking.

https://github.com/llvm/llvm-project/pull/107458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Move HIP and CUDA to new driver by default (PR #84420)

2024-08-29 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> @yxsamliu Do you know what the next steps for merging this would be? I'd like 
> to get it into the Clang 20 release if possible. The only thing this loses 
> currently is managed variables being registered in RDC mode, but I'm going to 
> assume that's hardly seen in practice so I could probably punt that until 
> later. I unfortunately haven't figured out a way to reproduce the build 
> failures on rocBLAS that the fork saw. I think @saiislam was looking into 
> that but couldn't get docker to work.

I think at least we need to get this PR pass internal PSDB. We could discuss 
the docker issues internally.

https://github.com/llvm/llvm-project/pull/84420
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Fixing Clang HIP inconsistent order for template functions (PR #101627)

2024-08-28 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

is it already fixed by https://github.com/llvm/llvm-project/pull/102661 ?

https://github.com/llvm/llvm-project/pull/101627
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Fix amdgpu-arch for dll name on Windows (PR #101350)

2024-08-23 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

ping

https://github.com/llvm/llvm-project/pull/101350
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. (PR #102776)

2024-08-20 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/102776
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] search fatbin symbols for libs passed by -l (PR #104638)

2024-08-18 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/104638
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] search fatbin symbols for libs passed by -l (PR #104638)

2024-08-18 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/104638

>From 6e6bb355f2cf79f30d01c97b580d4354cbb7e727 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Fri, 16 Aug 2024 14:24:08 -0400
Subject: [PATCH] [HIP] search fatbin symbols for libs passed by -l

For -fgpu-rdc linking, clang needs to collect undefined fatbin
symbols and resolve them to the embedded fatbin.

This has been done for object files and archive files passed
as input files to clang.

However, the same action is not performed for archive files passed
through -l options, which causes missing fatbin symbols.
---
 clang/lib/Driver/ToolChains/HIPUtility.cpp | 75 --
 clang/test/Driver/hip-toolchain-rdc.hip| 23 +++
 2 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/HIPUtility.cpp 
b/clang/lib/Driver/ToolChains/HIPUtility.cpp
index f32a23f111e4bf..1b707376dea819 100644
--- a/clang/lib/Driver/ToolChains/HIPUtility.cpp
+++ b/clang/lib/Driver/ToolChains/HIPUtility.cpp
@@ -52,13 +52,16 @@ static std::string normalizeForBundler(const llvm::Triple 
&T,
 // input object or archive files.
 class HIPUndefinedFatBinSymbols {
 public:
-  HIPUndefinedFatBinSymbols(const Compilation &C)
-  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
-  DiagnosticsEngine::Error,
-  "Error collecting HIP undefined fatbin symbols: %0")),
+  HIPUndefinedFatBinSymbols(const Compilation &C,
+const llvm::opt::ArgList &Args_)
+  : C(C), Args(Args_),
+DiagID(C.getDriver().getDiags().getCustomDiagID(
+DiagnosticsEngine::Error,
+"Error collecting HIP undefined fatbin symbols: %0")),
 Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
 Verbose(C.getArgs().hasArg(options::OPT_v)) {
 populateSymbols();
+processStaticLibraries();
 if (Verbose) {
   for (const auto &Name : FatBinSymbols)
 llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
@@ -76,8 +79,70 @@ class HIPUndefinedFatBinSymbols {
 return GPUBinHandleSymbols;
   }
 
+  // Collect symbols from static libraries specified by -l options.
+  void processStaticLibraries() {
+llvm::SmallVector LibNames;
+llvm::SmallVector LibPaths;
+llvm::SmallVector ExactLibNames;
+llvm::Triple Triple(C.getDriver().getTargetTriple());
+bool IsMSVC = Triple.isWindowsMSVCEnvironment();
+llvm::StringRef Ext = IsMSVC ? ".lib" : ".a";
+
+for (const auto *Arg : Args.filtered(options::OPT_l)) {
+  llvm::StringRef Value = Arg->getValue();
+  if (Value.starts_with(":"))
+ExactLibNames.push_back(Value.drop_front());
+  else
+LibNames.push_back(Value);
+}
+for (const auto *Arg : Args.filtered(options::OPT_L)) {
+  auto Path = Arg->getValue();
+  LibPaths.push_back(Path);
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search uses library path:  " << Path
+ << "\n";
+}
+
+auto ProcessLib = [&](llvm::StringRef LibName, bool IsExact) {
+  llvm::SmallString<256> FullLibName(
+  IsExact  ? Twine(LibName).str()
+  : IsMSVC ? (Twine(LibName) + Ext).str()
+   : (Twine("lib") + LibName + Ext).str());
+
+  bool Found = false;
+  for (const auto Path : LibPaths) {
+llvm::SmallString<256> FullPath = Path;
+llvm::sys::path::append(FullPath, FullLibName);
+
+if (llvm::sys::fs::exists(FullPath)) {
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search found library: "
+ << FullPath << "\n";
+  auto BufferOrErr = llvm::MemoryBuffer::getFile(FullPath);
+  if (!BufferOrErr) {
+errorHandler(llvm::errorCodeToError(BufferOrErr.getError()));
+continue;
+  }
+  processInput(BufferOrErr.get()->getMemBufferRef());
+  Found = true;
+  break;
+}
+  }
+  if (!Found && Verbose)
+llvm::errs() << "HIP fatbin symbol search could not find library: "
+ << FullLibName << "\n";
+};
+
+for (const auto LibName : ExactLibNames)
+  ProcessLib(LibName, true);
+
+for (const auto LibName : LibNames)
+  ProcessLib(LibName, false);
+  }
+
 private:
   const Compilation &C;
+  const llvm::opt::ArgList &Args;
   unsigned DiagID;
   bool Quiet;
   bool Verbose;
@@ -301,7 +366,7 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
   auto HostTriple =
   C.getSingleOffloadToolChain()->getTriple();
 
-  HIPUndefinedFatBinSymbols Symbols(C);
+  HIPUndefinedFatBinSymbols Symbols(C, Args);
 
   std::string PrimaryHipFatbinSymbol;
   std::string PrimaryGpuBinHandleSymbol;
diff --git a/clang/test/Driver/hip-toolchain-rdc.hip 
b/clang/test/Driver/hip-toolchain-rdc.hip
index 7e6697a0e254f6..ec79bf06afb92c 100644
--- a/clang/test/Driver/hip-toolchain-rdc.hip
+++ b/clang/te

[clang] [HIP] search fatbin symbols for libs passed by -l (PR #104638)

2024-08-18 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/104638

>From 3c281d8cfc99674f2a4de0dfe5e1f02e35e68d6d Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Fri, 16 Aug 2024 14:24:08 -0400
Subject: [PATCH] [HIP] search fatbin symbols for libs passed by -l

For -fgpu-rdc linking, clang needs to collect undefined fatbin
symbols and resolve them to the embedded fatbin.

This has been done for object files and archive files passed
as input files to clang.

However, the same action is not performed for archive files passed
through -l options, which causes missing fatbin symbols.
---
 clang/lib/Driver/ToolChains/HIPUtility.cpp | 75 --
 clang/test/Driver/hip-toolchain-rdc.hip| 23 +++
 2 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/HIPUtility.cpp 
b/clang/lib/Driver/ToolChains/HIPUtility.cpp
index f32a23f111e4bf..f3adb8bbc72aea 100644
--- a/clang/lib/Driver/ToolChains/HIPUtility.cpp
+++ b/clang/lib/Driver/ToolChains/HIPUtility.cpp
@@ -52,13 +52,16 @@ static std::string normalizeForBundler(const llvm::Triple 
&T,
 // input object or archive files.
 class HIPUndefinedFatBinSymbols {
 public:
-  HIPUndefinedFatBinSymbols(const Compilation &C)
-  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
-  DiagnosticsEngine::Error,
-  "Error collecting HIP undefined fatbin symbols: %0")),
+  HIPUndefinedFatBinSymbols(const Compilation &C,
+const llvm::opt::ArgList &Args_)
+  : C(C), Args(Args_),
+DiagID(C.getDriver().getDiags().getCustomDiagID(
+DiagnosticsEngine::Error,
+"Error collecting HIP undefined fatbin symbols: %0")),
 Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
 Verbose(C.getArgs().hasArg(options::OPT_v)) {
 populateSymbols();
+processStaticLibraries();
 if (Verbose) {
   for (const auto &Name : FatBinSymbols)
 llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
@@ -76,8 +79,70 @@ class HIPUndefinedFatBinSymbols {
 return GPUBinHandleSymbols;
   }
 
+  // Collect symbols from static libraries specified by -l options.
+  void processStaticLibraries() {
+llvm::SmallVector LibNames;
+llvm::SmallVector LibPaths;
+llvm::SmallVector ExactLibNames;
+llvm::Triple Triple(C.getDriver().getTargetTriple());
+bool IsMSVC = Triple.isWindowsMSVCEnvironment();
+llvm::StringRef Ext = IsMSVC ? ".lib" : ".a";
+
+for (const auto *Arg : Args.filtered(options::OPT_l)) {
+  llvm::StringRef Value = Arg->getValue();
+  if (Value.starts_with(":"))
+ExactLibNames.push_back(Value.drop_front());
+  else
+LibNames.push_back(Value);
+}
+for (const auto *Arg : Args.filtered(options::OPT_L)) {
+  auto Path = Arg->getValue();
+  LibPaths.push_back(Path);
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search uses library path:  " << Path
+ << "\n";
+}
+
+auto ProcessLib = [&](llvm::StringRef LibName, bool IsExact) {
+  Twine LibNameTwine = IsExact  ? LibName
+   : IsMSVC ? LibName + Ext
+: Twine("lib") + LibName + Ext;
+  llvm::SmallString<256> FullLibName(LibNameTwine.str());
+
+  bool Found = false;
+  for (const auto Path : LibPaths) {
+llvm::SmallString<256> FullPath = Path;
+llvm::sys::path::append(FullPath, FullLibName);
+
+if (llvm::sys::fs::exists(FullPath)) {
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search found library: "
+ << FullPath << "\n";
+  auto BufferOrErr = llvm::MemoryBuffer::getFile(FullPath);
+  if (!BufferOrErr) {
+errorHandler(llvm::errorCodeToError(BufferOrErr.getError()));
+continue;
+  }
+  processInput(BufferOrErr.get()->getMemBufferRef());
+  Found = true;
+  break;
+}
+  }
+  if (!Found && Verbose)
+llvm::errs() << "HIP fatbin symbol search could not find library: "
+ << FullLibName << "\n";
+};
+
+for (const auto LibName : ExactLibNames)
+  ProcessLib(LibName, true);
+
+for (const auto LibName : LibNames)
+  ProcessLib(LibName, false);
+  }
+
 private:
   const Compilation &C;
+  const llvm::opt::ArgList &Args;
   unsigned DiagID;
   bool Quiet;
   bool Verbose;
@@ -301,7 +366,7 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
   auto HostTriple =
   C.getSingleOffloadToolChain()->getTriple();
 
-  HIPUndefinedFatBinSymbols Symbols(C);
+  HIPUndefinedFatBinSymbols Symbols(C, Args);
 
   std::string PrimaryHipFatbinSymbol;
   std::string PrimaryGpuBinHandleSymbol;
diff --git a/clang/test/Driver/hip-toolchain-rdc.hip 
b/clang/test/Driver/hip-toolchain-rdc.hip
index 7e6697a0e254f6..ec79bf06afb92c 100644
--- a/clang/test/Driver/hi

[clang] [HIP] search fatbin symbols for libs passed by -l (PR #104638)

2024-08-18 Thread Yaxun Liu via cfe-commits


@@ -76,8 +79,75 @@ class HIPUndefinedFatBinSymbols {
 return GPUBinHandleSymbols;
   }
 
+  // Collect symbols from static libraries specified by -l options.
+  void processStaticLibraries() {
+llvm::SmallVector LibNames;
+llvm::SmallVector LibPaths;
+llvm::SmallVector ExactLibNames;
+llvm::Triple Triple(C.getDriver().getTargetTriple());
+bool IsMSVC = Triple.isWindowsMSVCEnvironment();
+llvm::StringRef Ext = IsMSVC ? ".lib" : ".a";
+
+for (const auto *Arg : Args.filtered(options::OPT_l)) {
+  llvm::StringRef Value = Arg->getValue();
+  if (Value.starts_with(":"))
+ExactLibNames.push_back(Value.drop_front());
+  else
+LibNames.push_back(Value);
+}
+for (const auto *Arg : Args.filtered(options::OPT_L)) {
+  auto Path = Arg->getValue();
+  LibPaths.push_back(Path);
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search uses library path:  " << Path
+ << "\n";
+}
+
+auto ProcessLib = [&](llvm::StringRef LibName, bool IsExact) {
+  llvm::SmallString<256> FullLibName;
+  if (IsExact)
+FullLibName = LibName;
+  else {
+if (IsMSVC)
+  (llvm::Twine(LibName) + Ext).toVector(FullLibName);
+else
+  (llvm::Twine("lib") + LibName + Ext).toVector(FullLibName);
+  }
+
+  bool Found = false;
+  for (const auto &Path : LibPaths) {
+llvm::SmallString<256> FullPath = Path;
+llvm::sys::path::append(FullPath, FullLibName);
+
+if (llvm::sys::fs::exists(FullPath)) {
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search found library: "
+ << FullPath << "\n";
+  auto BufferOrErr = llvm::MemoryBuffer::getFile(FullPath);
+  if (!BufferOrErr) {
+errorHandler(llvm::errorCodeToError(BufferOrErr.getError()));
+continue;
+  }
+  processInput(BufferOrErr.get()->getMemBufferRef());
+  Found = true;
+  break;
+}
+  }
+  if (!Found && Verbose)
+llvm::errs() << "HIP fatbin symbol search could not find library: "
+ << FullLibName << "\n";
+};
+
+for (const auto &LibName : ExactLibNames)

yxsamliu wrote:

will do

https://github.com/llvm/llvm-project/pull/104638
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] search fatbin symbols for libs passed by -l (PR #104638)

2024-08-18 Thread Yaxun Liu via cfe-commits


@@ -76,8 +79,75 @@ class HIPUndefinedFatBinSymbols {
 return GPUBinHandleSymbols;
   }
 
+  // Collect symbols from static libraries specified by -l options.
+  void processStaticLibraries() {
+llvm::SmallVector LibNames;
+llvm::SmallVector LibPaths;
+llvm::SmallVector ExactLibNames;
+llvm::Triple Triple(C.getDriver().getTargetTriple());
+bool IsMSVC = Triple.isWindowsMSVCEnvironment();
+llvm::StringRef Ext = IsMSVC ? ".lib" : ".a";
+
+for (const auto *Arg : Args.filtered(options::OPT_l)) {
+  llvm::StringRef Value = Arg->getValue();
+  if (Value.starts_with(":"))
+ExactLibNames.push_back(Value.drop_front());
+  else
+LibNames.push_back(Value);
+}
+for (const auto *Arg : Args.filtered(options::OPT_L)) {
+  auto Path = Arg->getValue();
+  LibPaths.push_back(Path);
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search uses library path:  " << Path
+ << "\n";
+}
+
+auto ProcessLib = [&](llvm::StringRef LibName, bool IsExact) {
+  llvm::SmallString<256> FullLibName;
+  if (IsExact)
+FullLibName = LibName;
+  else {
+if (IsMSVC)
+  (llvm::Twine(LibName) + Ext).toVector(FullLibName);
+else
+  (llvm::Twine("lib") + LibName + Ext).toVector(FullLibName);
+  }

yxsamliu wrote:

will use the first suggestion. with minor change Twine("lib") for it to work

https://github.com/llvm/llvm-project/pull/104638
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] search fatbin symbols for libs passed by -l (PR #104638)

2024-08-16 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/104638

For -fgpu-rdc linking, clang needs to collect undefined fatbin symbols and 
resolve them to the embedded fatbin.

This has been done for object files and archive files passed as input files to 
clang.

However, the same action is not performed for archive files passed through -l 
options, which causes missing symbols.

This patch adds that.

>From 469df54001aed765ff968850603c768452193f2a Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Fri, 16 Aug 2024 14:24:08 -0400
Subject: [PATCH] [HIP] search fatbin symbols for libs passed by -l

For -fgpu-rdc linking, clang needs to collect undefined fatbin
symbols and resolve them to the embedded fatbin.

This has been done for object files and archive files passed
as input files to clang.

However, the same action is not performed for archive files passed
through -l options, which causes missing fatbin symbols.
---
 clang/lib/Driver/ToolChains/HIPUtility.cpp | 80 --
 clang/test/Driver/hip-toolchain-rdc.hip| 23 +++
 2 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/HIPUtility.cpp 
b/clang/lib/Driver/ToolChains/HIPUtility.cpp
index f32a23f111e4bf..bf11272fef867b 100644
--- a/clang/lib/Driver/ToolChains/HIPUtility.cpp
+++ b/clang/lib/Driver/ToolChains/HIPUtility.cpp
@@ -52,13 +52,16 @@ static std::string normalizeForBundler(const llvm::Triple 
&T,
 // input object or archive files.
 class HIPUndefinedFatBinSymbols {
 public:
-  HIPUndefinedFatBinSymbols(const Compilation &C)
-  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
-  DiagnosticsEngine::Error,
-  "Error collecting HIP undefined fatbin symbols: %0")),
+  HIPUndefinedFatBinSymbols(const Compilation &C,
+const llvm::opt::ArgList &Args_)
+  : C(C), Args(Args_),
+DiagID(C.getDriver().getDiags().getCustomDiagID(
+DiagnosticsEngine::Error,
+"Error collecting HIP undefined fatbin symbols: %0")),
 Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
 Verbose(C.getArgs().hasArg(options::OPT_v)) {
 populateSymbols();
+processStaticLibraries();
 if (Verbose) {
   for (const auto &Name : FatBinSymbols)
 llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
@@ -76,8 +79,75 @@ class HIPUndefinedFatBinSymbols {
 return GPUBinHandleSymbols;
   }
 
+  // Collect symbols from static libraries specified by -l options.
+  void processStaticLibraries() {
+llvm::SmallVector LibNames;
+llvm::SmallVector LibPaths;
+llvm::SmallVector ExactLibNames;
+llvm::Triple Triple(C.getDriver().getTargetTriple());
+bool IsMSVC = Triple.isWindowsMSVCEnvironment();
+llvm::StringRef Ext = IsMSVC ? ".lib" : ".a";
+
+for (const auto *Arg : Args.filtered(options::OPT_l)) {
+  llvm::StringRef Value = Arg->getValue();
+  if (Value.starts_with(":"))
+ExactLibNames.push_back(Value.drop_front());
+  else
+LibNames.push_back(Value);
+}
+for (const auto *Arg : Args.filtered(options::OPT_L)) {
+  auto Path = Arg->getValue();
+  LibPaths.push_back(Path);
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search uses library path:  " << Path
+ << "\n";
+}
+
+auto ProcessLib = [&](llvm::StringRef LibName, bool IsExact) {
+  llvm::SmallString<256> FullLibName;
+  if (IsExact)
+FullLibName = LibName;
+  else {
+if (IsMSVC)
+  (llvm::Twine(LibName) + Ext).toVector(FullLibName);
+else
+  (llvm::Twine("lib") + LibName + Ext).toVector(FullLibName);
+  }
+
+  bool Found = false;
+  for (const auto &Path : LibPaths) {
+llvm::SmallString<256> FullPath = Path;
+llvm::sys::path::append(FullPath, FullLibName);
+
+if (llvm::sys::fs::exists(FullPath)) {
+  if (Verbose)
+llvm::errs() << "HIP fatbin symbol search found library: "
+ << FullPath << "\n";
+  auto BufferOrErr = llvm::MemoryBuffer::getFile(FullPath);
+  if (!BufferOrErr) {
+errorHandler(llvm::errorCodeToError(BufferOrErr.getError()));
+continue;
+  }
+  processInput(BufferOrErr.get()->getMemBufferRef());
+  Found = true;
+  break;
+}
+  }
+  if (!Found && Verbose)
+llvm::errs() << "HIP fatbin symbol search could not find library: "
+ << FullLibName << "\n";
+};
+
+for (const auto &LibName : ExactLibNames)
+  ProcessLib(LibName, true);
+
+for (const auto &LibName : LibNames)
+  ProcessLib(LibName, false);
+  }
+
 private:
   const Compilation &C;
+  const llvm::opt::ArgList &Args;
   unsigned DiagID;
   bool Quiet;
   bool Verbose;
@@ -301,7 +371,7 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
   auto H

[clang] [Clang] Fix sema checks thinking kernels aren't kernels (PR #104460)

2024-08-15 Thread Yaxun Liu via cfe-commits


@@ -7163,7 +7165,8 @@ void Sema::ProcessDeclAttributeList(
 } else if (const auto *A = D->getAttr()) {
   Diag(D->getLocation(), diag::err_opencl_kernel_attr) << A;
   D->setInvalidDecl();
-} else if (!D->hasAttr()) {
+} else if (!D->hasAttr() &&

yxsamliu wrote:

This part checks amdgpu kernel attributes. I think we should move it inside the 
above if body and change it to
```
if (!D->hasAttr() && !FnTy->getCallConv() != 
CallingConv::CC_AMDGPUKernelCall) {
//...
}
```
instead of using `else if`.

https://github.com/llvm/llvm-project/pull/104460
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Fix sema checks thinking kernels aren't kernels (PR #104460)

2024-08-15 Thread Yaxun Liu via cfe-commits


@@ -7147,7 +7147,9 @@ void Sema::ProcessDeclAttributeList(
   // good to have a way to specify "these attributes must appear as a group",
   // for these. Additionally, it would be good to have a way to specify "these
   // attribute must never appear as a group" for attributes like cold and hot.
-  if (!D->hasAttr()) {

yxsamliu wrote:

This part checks OpenCL attributes. The condition should be kept unchanged.

https://github.com/llvm/llvm-project/pull/104460
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-08-15 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/96872
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Emit used function list in deterministic order. (PR #102661)

2024-08-12 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.

LGTM. Thanks

https://github.com/llvm/llvm-project/pull/102661
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu edited 
https://github.com/llvm/llvm-project/pull/102569
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> Thank you for the patch, but RFCs for Clang should be published in 
> https://discourse.llvm.org/c/clang/6. PRs doesn't have the visibility we want 
> RFCs to have.

Discourse topic created: 
https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641.
 Thanks.

https://github.com/llvm/llvm-project/pull/102569
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-08 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu edited 
https://github.com/llvm/llvm-project/pull/102569
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Use the AMDGPUToolChain when targeting C/C++ directly (PR #99687)

2024-08-07 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

I feel choosing toolchain based on input files does not solve all the use cases.

You may want to handle the object files, bitcodes, or assembly files 
differently by using different toolchains, e.g. you may want to choose rocm 
toolchain or amdgpu toolchain or HIPAMD toolchain to hand the object file, even 
though it has the same extension. Although you could introduce -x for different 
object type, e.g. cl_obj, cxx_obj, or hip_obj, but that is quite cubbersome.

I am wondering whether we should introduce a clang option which allows us to 
directly choose a toolchain.

https://github.com/llvm/llvm-project/pull/99687
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NFC] Make OffloadLTOMode getter a separate method (PR #101200)

2024-08-06 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> @yxsamliu Would you mind reviewing this change?

Sorry for the delay. LGTM

https://github.com/llvm/llvm-project/pull/101200
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Fix __clang_hip_cmath.hip for ambiguity (PR #101341)

2024-08-02 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/101341
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [Clang] Suppress missing architecture error when doing LTO (PR #100652)

2024-07-31 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/100652
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Fix amdgpu-arch for dll name on Windows (PR #101350)

2024-07-31 Thread Yaxun Liu via cfe-commits


@@ -31,16 +43,108 @@ typedef hipError_t (*hipGetDeviceCount_t)(int *);
 typedef hipError_t (*hipDeviceGet_t)(int *, int);
 typedef hipError_t (*hipGetDeviceProperties_t)(hipDeviceProp_t *, int);
 
-int printGPUsByHIP() {
+extern cl::opt Verbose;
+
 #ifdef _WIN32
-  constexpr const char *DynamicHIPPath = "amdhip64.dll";
+std::vector getSearchPaths() {
+  std::vector Paths;
+
+  // Get the directory of the current executable
+  if (auto MainExe = sys::fs::getMainExecutable(nullptr, nullptr);
+  !MainExe.empty())
+Paths.push_back(sys::path::parent_path(MainExe).str());
+
+  // Get the system directory
+  char SystemDirectory[MAX_PATH];
+  if (GetSystemDirectoryA(SystemDirectory, MAX_PATH) > 0) {
+Paths.push_back(SystemDirectory);
+  }
+
+  // Get the Windows directory
+  char WindowsDirectory[MAX_PATH];
+  if (GetWindowsDirectoryA(WindowsDirectory, MAX_PATH) > 0) {

yxsamliu wrote:

yes. for cases where system or windows directories containing wide characters. 
done.

https://github.com/llvm/llvm-project/pull/101350
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Fix amdgpu-arch for dll name on Windows (PR #101350)

2024-07-31 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/101350

>From e7c39dbcb05d8fa9232a68c90b0ec4fc4d2a126b Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Wed, 31 Jul 2024 09:23:05 -0400
Subject: [PATCH] Fix amdgpu-arch for dll name on Windows

Recently HIP runtime changed dll name to amdhip64_n.dll on Windows, where
n is ROCm major version number.

Fix amdgpu-arch to search for amdhip64_n.dll on Windows.
---
 clang/tools/amdgpu-arch/AMDGPUArch.cpp  |   3 +
 clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp | 123 +++-
 2 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/clang/tools/amdgpu-arch/AMDGPUArch.cpp 
b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
index 7ae57b7877e1f..fefd4f08d5ed2 100644
--- a/clang/tools/amdgpu-arch/AMDGPUArch.cpp
+++ b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
@@ -21,6 +21,9 @@ static cl::opt Help("h", cl::desc("Alias for -help"), 
cl::Hidden);
 // Mark all our options with this category.
 static cl::OptionCategory AMDGPUArchCategory("amdgpu-arch options");
 
+cl::opt Verbose("verbose", cl::desc("Enable verbose output"),
+  cl::init(false), cl::cat(AMDGPUArchCategory));
+
 static void PrintVersion(raw_ostream &OS) {
   OS << clang::getClangToolFullVersion("amdgpu-arch") << '\n';
 }
diff --git a/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp 
b/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
index 7338872dbf32f..0ae4cbe34e934 100644
--- a/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
+++ b/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
@@ -11,9 +11,22 @@
 //
 
//===--===//
 
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/DynamicLibrary.h"
 #include "llvm/Support/Error.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Process.h"
+#include "llvm/Support/Program.h"
 #include "llvm/Support/raw_ostream.h"
+#include 
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#endif
 
 using namespace llvm;
 
@@ -31,16 +44,118 @@ typedef hipError_t (*hipGetDeviceCount_t)(int *);
 typedef hipError_t (*hipDeviceGet_t)(int *, int);
 typedef hipError_t (*hipGetDeviceProperties_t)(hipDeviceProp_t *, int);
 
-int printGPUsByHIP() {
+extern cl::opt Verbose;
+
 #ifdef _WIN32
-  constexpr const char *DynamicHIPPath = "amdhip64.dll";
+static std::vector getSearchPaths() {
+  std::vector Paths;
+
+  // Get the directory of the current executable
+  if (auto MainExe = sys::fs::getMainExecutable(nullptr, nullptr);
+  !MainExe.empty())
+Paths.push_back(sys::path::parent_path(MainExe).str());
+
+  // Get the system directory
+  wchar_t SystemDirectory[MAX_PATH];
+  if (GetSystemDirectoryW(SystemDirectory, MAX_PATH) > 0) {
+std::string Utf8SystemDir;
+if (convertUTF16ToUTF8String(
+ArrayRef(reinterpret_cast(SystemDirectory),
+wcslen(SystemDirectory)),
+Utf8SystemDir))
+  Paths.push_back(Utf8SystemDir);
+  }
+
+  // Get the Windows directory
+  wchar_t WindowsDirectory[MAX_PATH];
+  if (GetWindowsDirectoryW(WindowsDirectory, MAX_PATH) > 0) {
+std::string Utf8WindowsDir;
+if (convertUTF16ToUTF8String(
+ArrayRef(reinterpret_cast(WindowsDirectory),
+wcslen(WindowsDirectory)),
+Utf8WindowsDir))
+  Paths.push_back(Utf8WindowsDir);
+  }
+
+  // Get the current working directory
+  SmallVector CWD;
+  if (sys::fs::current_path(CWD))
+Paths.push_back(std::string(CWD.begin(), CWD.end()));
+
+  // Get the PATH environment variable
+  if (auto PathEnv = sys::Process::GetEnv("PATH")) {
+SmallVector PathList;
+StringRef(*PathEnv).split(PathList, sys::EnvPathSeparator);
+for (auto &Path : PathList)
+  Paths.push_back(Path.str());
+  }
+
+  return Paths;
+}
+
+// Custom comparison function for dll name
+static bool compareVersions(const std::string &a, const std::string &b) {
+  // Extract version numbers
+  int versionA = std::stoi(a.substr(a.find_last_of('_') + 1));
+  int versionB = std::stoi(b.substr(b.find_last_of('_') + 1));
+  return versionA > versionB;
+}
+
+#endif
+
+// On Windows, prefer amdhip64_n.dll where n is ROCm major version and greater
+// value of n takes precedence. If amdhip64_n.dll is not found, fall back to
+// amdhip64.dll. The reason is that a normal driver installation only has
+// amdhip64_n.dll but we do not know what n is since this progrm may be used
+// with a future version of HIP runtime.
+//
+// On Linux, always use default libamdhip64.so.
+static std::pair findNewestHIPDLL() {
+#ifdef _WIN32
+  StringRef HipDLLPrefix = "amdhip64_";
+  StringRef HipDLLSuffix = ".dll";
+
+  std::vector SearchPaths = getSearchPaths();
+  std::vector DLLNames;
+
+  for (const auto &Dir : SearchPaths) {
+std::error_code EC;
+for (sys::fs::directory_iterator DirIt(Dir, EC), DirEnd;
+ DirIt != DirEnd && !EC; DirIt.incr

[clang] Fix amdgpu-arch for dll name on Windows (PR #101350)

2024-07-31 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/101350

Recently HIP runtime changed dll name to amdhip64_n.dll on Windows, where n is 
ROCm major version number.

Fix amdgpu-arch to search for amdhip64_n.dll on Windows.

>From 8819e99b64f3293a758f8a81258a25c91fab6ef6 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Wed, 31 Jul 2024 09:23:05 -0400
Subject: [PATCH] Fix amdgpu-arch for dll name on Windows

Recently HIP runtime changed dll name to amdhip64_n.dll on Windows, where
n is ROCm major version number.

Fix amdgpu-arch to search for amdhip64_n.dll on Windows.
---
 clang/tools/amdgpu-arch/AMDGPUArch.cpp  |   3 +
 clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp | 112 +++-
 2 files changed, 111 insertions(+), 4 deletions(-)

diff --git a/clang/tools/amdgpu-arch/AMDGPUArch.cpp 
b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
index 7ae57b7877e1f..fefd4f08d5ed2 100644
--- a/clang/tools/amdgpu-arch/AMDGPUArch.cpp
+++ b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
@@ -21,6 +21,9 @@ static cl::opt Help("h", cl::desc("Alias for -help"), 
cl::Hidden);
 // Mark all our options with this category.
 static cl::OptionCategory AMDGPUArchCategory("amdgpu-arch options");
 
+cl::opt Verbose("verbose", cl::desc("Enable verbose output"),
+  cl::init(false), cl::cat(AMDGPUArchCategory));
+
 static void PrintVersion(raw_ostream &OS) {
   OS << clang::getClangToolFullVersion("amdgpu-arch") << '\n';
 }
diff --git a/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp 
b/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
index 7338872dbf32f..f9beb5046568c 100644
--- a/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
+++ b/clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
@@ -11,9 +11,21 @@
 //
 
//===--===//
 
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/DynamicLibrary.h"
 #include "llvm/Support/Error.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Process.h"
+#include "llvm/Support/Program.h"
 #include "llvm/Support/raw_ostream.h"
+#include 
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#endif
 
 using namespace llvm;
 
@@ -31,16 +43,108 @@ typedef hipError_t (*hipGetDeviceCount_t)(int *);
 typedef hipError_t (*hipDeviceGet_t)(int *, int);
 typedef hipError_t (*hipGetDeviceProperties_t)(hipDeviceProp_t *, int);
 
-int printGPUsByHIP() {
+extern cl::opt Verbose;
+
 #ifdef _WIN32
-  constexpr const char *DynamicHIPPath = "amdhip64.dll";
+std::vector getSearchPaths() {
+  std::vector Paths;
+
+  // Get the directory of the current executable
+  if (auto MainExe = sys::fs::getMainExecutable(nullptr, nullptr);
+  !MainExe.empty())
+Paths.push_back(sys::path::parent_path(MainExe).str());
+
+  // Get the system directory
+  char SystemDirectory[MAX_PATH];
+  if (GetSystemDirectoryA(SystemDirectory, MAX_PATH) > 0) {
+Paths.push_back(SystemDirectory);
+  }
+
+  // Get the Windows directory
+  char WindowsDirectory[MAX_PATH];
+  if (GetWindowsDirectoryA(WindowsDirectory, MAX_PATH) > 0) {
+Paths.push_back(WindowsDirectory);
+  }
+
+  // Get the current working directory
+  SmallVector CWD;
+  if (sys::fs::current_path(CWD))
+Paths.push_back(std::string(CWD.begin(), CWD.end()));
+
+  // Get the PATH environment variable
+  if (auto PathEnv = llvm::sys::Process::GetEnv("PATH")) {
+SmallVector PathList;
+StringRef(*PathEnv).split(PathList, sys::EnvPathSeparator);
+for (auto &Path : PathList)
+  Paths.push_back(Path.str());
+  }
+
+  return Paths;
+}
+
+// Custom comparison function for dll name
+bool compareVersions(const std::string &a, const std::string &b) {
+  // Extract version numbers
+  int versionA = std::stoi(a.substr(a.find_last_of('_') + 1));
+  int versionB = std::stoi(b.substr(b.find_last_of('_') + 1));
+  return versionA > versionB;
+}
+
+#endif
+
+// On Windows, prefer amdhip64_n.dll where n is ROCm major version and greater
+// value of n takes precedence. If amdhip64_n.dll is not found, fall back to
+// amdhip64.dll. The reason is that a normal driver installation only has
+// amdhip64_n.dll but we do not know what n is since this progrm may be used
+// with a future version of HIP runtime.
+//
+// On Linux, always use default libamdhip64.so.
+std::pair findNewestHIPDLL() {
+#ifdef _WIN32
+  const char *HipDLLPrefix = "amdhip64_";
+  const char *HipDLLSuffix = ".dll";
+
+  std::vector SearchPaths = getSearchPaths();
+  std::vector DLLNames;
+
+  for (const auto &Dir : SearchPaths) {
+std::error_code EC;
+for (sys::fs::directory_iterator DirIt(Dir, EC), DirEnd;
+ DirIt != DirEnd && !EC; DirIt.increment(EC)) {
+  StringRef Filename = sys::path::filename(DirIt->path());
+  if (Filename.starts_with(HipDLLPrefix) &&
+  Filename.ends_with(HipDLLSuffix))
+DLLNames.push_back(sys::path::convert_to_slash(DirIt->path()));
+}
+if (!DLLNames.empty())
+  break;
+  }
+
+  if (DL

[clang] [HIP] Fix __clang_hip_cmath.hip for ambiguity (PR #101341)

2024-07-31 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/101341

If there is a type T which can be converted to both float and double etc but 
itself is not specialized for __numeric_type, and it is called for math 
functions eg. fma, it will cause ambiguity with test function of __numeric_type.

Since test is not template, this error is not bypassed by SFINAE. This is a 
design flaw of __numeric_type. This patch fixes clang wrapper header to use 
SFINAE to avoid such ambiguity.

Fixes: SWDEV-461604

Fixes: https://github.com/llvm/llvm-project/issues/101239

>From 07d6d9392d3ec9f6054f988252da75203b346fd4 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Tue, 30 Jul 2024 16:51:23 -0400
Subject: [PATCH] [HIP] Fix __clang_hip_cmath.hip for ambiguity

If there is a type T which can be converted to both float and double etc but 
itself
is not specialized for __numeric_type, and it is called for math functions eg. 
fma,
it will cause ambiguity with test function of __numeric_type.

Since test is not template, this error is not bypassed by SFINAE. This is a 
design
flaw of __numeric_type. This patch fixes clang wrapper header to use SFINAE to
avoid such ambiguity.

Fixes: SWDEV-461604

Fixes: https://github.com/llvm/llvm-project/issues/101239
---
 clang/lib/Headers/__clang_hip_cmath.h|  7 ++-
 clang/test/Headers/__clang_hip_cmath.hip | 19 +++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/clang/lib/Headers/__clang_hip_cmath.h 
b/clang/lib/Headers/__clang_hip_cmath.h
index b52d6b7816611..7d982ad9af7ee 100644
--- a/clang/lib/Headers/__clang_hip_cmath.h
+++ b/clang/lib/Headers/__clang_hip_cmath.h
@@ -395,7 +395,12 @@ template  struct __numeric_type {
   // No support for long double, use double instead.
   static double __test(long double);
 
-  typedef decltype(__test(declval<_Tp>())) type;
+  template 
+  static auto __test_impl(int) -> decltype(__test(declval<_U>()));
+
+  template  static void __test_impl(...);
+
+  typedef decltype(__test_impl<_Tp>(0)) type;
   static const bool value = !is_same::value;
 };
 
diff --git a/clang/test/Headers/__clang_hip_cmath.hip 
b/clang/test/Headers/__clang_hip_cmath.hip
index ed1030b820627..0c9ff4cdd7808 100644
--- a/clang/test/Headers/__clang_hip_cmath.hip
+++ b/clang/test/Headers/__clang_hip_cmath.hip
@@ -87,3 +87,22 @@ extern "C" __device__ float test_sin_f32(float x) {
 extern "C" __device__ float test_cos_f32(float x) {
   return cos(x);
 }
+
+// Check user defined type which can be converted to float and double but not
+// specializes __numeric_type will not cause ambiguity diagnostics.
+struct user_bfloat16 {
+  __host__ __device__ user_bfloat16(float);
+  operator float();
+  operator double();
+};
+
+namespace user_namespace {
+  __device__ user_bfloat16 fma(const user_bfloat16 a, const user_bfloat16 b, 
const user_bfloat16 c) {
+return a;
+  }
+
+  __global__ void test_fma() {
+user_bfloat16 a = 1.0f, b = 2.0f;
+fma(a, b, b);
+  }
+}

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] fix host min/max in header (PR #82956)

2024-07-18 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

found another library using mixed min: 
https://github.com/ROCm/Tensile/issues/1977

https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] Fix template static member (PR #98580)

2024-07-12 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/98580
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] Fix template static member (PR #98580)

2024-07-11 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

sorry for the trouble. It is the same change but rebased to main branch.

https://github.com/llvm/llvm-project/pull/98580
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] Fix template static member (PR #98580)

2024-07-11 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/98580

Should check host/device attributes before emitting static member of template 
instantiation.

Fixes: https://github.com/llvm/llvm-project/issues/98151

>From ba7ab88308c5af2e1c5e6c841524a932c42afeb2 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Thu, 11 Jul 2024 13:32:54 -0400
Subject: [PATCH] [CUDA][HIP] Fix template static member

Should check host/device attributes before emitting static member
of template instantiation.

Fixes: https://github.com/llvm/llvm-project/issues/98151
---
 clang/lib/CodeGen/CodeGenModule.cpp   |  3 +-
 .../template-class-static-member.cu   | 50 +++
 2 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGenCUDA/template-class-static-member.cu

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 6c10b4a2edef8..599e20634bf72 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -5935,7 +5935,8 @@ static void 
ReplaceUsesOfNonProtoTypeWithRealFunction(llvm::GlobalValue *Old,
 
 void CodeGenModule::HandleCXXStaticMemberVarInstantiation(VarDecl *VD) {
   auto DK = VD->isThisDeclarationADefinition();
-  if (DK == VarDecl::Definition && VD->hasAttr())
+  if ((DK == VarDecl::Definition && VD->hasAttr()) ||
+  (LangOpts.CUDA && !shouldEmitCUDAGlobalVar(VD)))
 return;
 
   TemplateSpecializationKind TSK = VD->getTemplateSpecializationKind();
diff --git a/clang/test/CodeGenCUDA/template-class-static-member.cu 
b/clang/test/CodeGenCUDA/template-class-static-member.cu
new file mode 100644
index 0..d790d2dea66ba
--- /dev/null
+++ b/clang/test/CodeGenCUDA/template-class-static-member.cu
@@ -0,0 +1,50 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \
+// RUN:   -emit-llvm -o - -x hip %s | FileCheck -check-prefix=DEV %s
+
+// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
+// RUN:   -emit-llvm -o - -x hip %s | FileCheck -check-prefix=HOST %s
+
+// Negative tests.
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \
+// RUN:   -emit-llvm -o - -x hip %s | FileCheck -check-prefix=DEV-NEG %s
+
+#include "Inputs/cuda.h"
+
+template 
+class A {
+static int h_member;
+__device__ static int d_member;
+__constant__ static int c_member;
+__managed__ static int m_member;
+const static int const_member = 0;
+};
+
+template 
+int A::h_member;
+
+template 
+__device__ int A::d_member;
+
+template 
+__constant__ int A::c_member;
+
+template 
+__managed__ int A::m_member;
+
+template 
+const int A::const_member;
+
+template class A;
+
+//DEV-DAG: @_ZN1AIiE8d_memberE = internal addrspace(1) global i32 0, comdat, 
align 4
+//DEV-DAG: @_ZN1AIiE8c_memberE = internal addrspace(4) global i32 0, comdat, 
align 4
+//DEV-DAG: @_ZN1AIiE8m_memberE = internal addrspace(1) externally_initialized 
global ptr addrspace(1) null
+//DEV-DAG: @_ZN1AIiE12const_memberE = internal addrspace(4) constant i32 0, 
comdat, align 4
+//DEV-NEG-NOT: @_ZN1AIiE8h_memberE
+
+//HOST-DAG: @_ZN1AIiE8h_memberE = weak_odr global i32 0, comdat, align 4
+//HOST-DAG: @_ZN1AIiE8d_memberE = internal global i32 undef, comdat, align 4
+//HOST-DAG: @_ZN1AIiE8c_memberE = internal global i32 undef, comdat, align 4
+//HOST-DAG: @_ZN1AIiE8m_memberE = internal externally_initialized global ptr 
null
+//HOST-DAG: @_ZN1AIiE12const_memberE = weak_odr constant i32 0, comdat, align 4

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP][NFC] add CodeGenModule::shouldEmitCUDAGlobalVar (PR #98543)

2024-07-11 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/98543
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [nsan] Add shared runtime (PR #98415)

2024-07-10 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> It seems to cause a build failure:
> 
> https://lab.llvm.org/buildbot/#/builders/123/builds/1580

It seems the issue was due to old system linker. If I use lld to build 
compier-rt the build passes. I will fix the buildbot 
https://github.com/llvm/llvm-zorg/pull/225

https://github.com/llvm/llvm-project/pull/98415
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [nsan] Add shared runtime (PR #98415)

2024-07-10 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

It seems to cause a build failure:

https://lab.llvm.org/buildbot/#/builders/123/builds/1580

https://github.com/llvm/llvm-project/pull/98415
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add `__CLANG_GPU_DISABLE_MATH_WRAPPERS` macro for offloading math (PR #98234)

2024-07-10 Thread Yaxun Liu via cfe-commits


@@ -345,4 +349,5 @@ __DEVICE__ float ynf(int __a, float __b) { return 
__nv_ynf(__a, __b); }
 #pragma pop_macro("__DEVICE_VOID__")
 #pragma pop_macro("__FAST_OR_SLOW")
 
+#endif // __CLANG_GPU_DISABLE_MATH_WRAPPERS

yxsamliu wrote:

some non-libm functions e.g. `__fadd_rd`, are included in the if/endif. Do you 
plan to add them to libm?

https://github.com/llvm/llvm-project/pull/98234
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu edited 
https://github.com/llvm/llvm-project/pull/98170
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Yaxun Liu via cfe-commits


@@ -633,6 +633,17 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
   else if (Args.hasArg(options::OPT_mcpu_EQ))
 CmdArgs.push_back(Args.MakeArgString(
 "-plugin-opt=mcpu=" + Args.getLastArgValue(options::OPT_mcpu_EQ)));
+

yxsamliu wrote:

HIPAMD toolchain has its lld command argument handling in 
AMDGCN::Linker::constructLldCommand. Also need change there.

https://github.com/llvm/llvm-project/pull/98170
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Yaxun Liu via cfe-commits


@@ -633,6 +633,17 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
   else if (Args.hasArg(options::OPT_mcpu_EQ))
 CmdArgs.push_back(Args.MakeArgString(
 "-plugin-opt=mcpu=" + Args.getLastArgValue(options::OPT_mcpu_EQ)));
+
+  // If the user's toolchain has the 'include/amdgcn-amd-amdhsa/` path, we
+  // assume it supports the standard C libraries for the GPU and include them.
+  bool HasLibC = getToolChain().getStdlibIncludePath().has_value();

yxsamliu wrote:

maybe refactor as a member of ToolChain::addOffloadLibCArgs

https://github.com/llvm/llvm-project/pull/98170
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Compiler messages on HIP SDK for Windows (PR #97668)

2024-07-09 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,7 @@
+// UNSUPPORTED: system-linux

yxsamliu wrote:

You may need `// REQUIRES: system-windows` since only on Windows these paths 
are not added.

https://github.com/llvm/llvm-project/pull/97668
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Compiler messages on HIP SDK for Windows (PR #97668)

2024-07-04 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

better add a lit test like 
https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/rocm-detect.hip
 , but for windows only (`REQUIRES: system-windows`), using 
`--print-rocm-search-dirs`, and checks `ROCm installation search path`.

https://github.com/llvm/llvm-project/pull/97668
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][CodeGen][AMDGPU] Enable AMDGPU `printf` for `spirv64-amd-amdhsa` (PR #97132)

2024-07-02 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/97132
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-07-02 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> I still think we should not need this. DefaultIsPrivate is junk that needs to 
> be deleted. Querying for LangAS::Default should always give the answer 0 for 
> AMDGPU, which is what this is working around.
> 
> This clang notion of address space has nothing to do with your troubles with 
> llvm.used in the IR or SPIRV

LangAS::Default is not just determined by target. It also depends on language. 
For OpenCL 1.2 it is private. For example, the argument of `void foo(int*)` by 
language spec points to private addr space, and it translates to addr space 5 
instead of 0 in IR. (https://godbolt.org/z/E71E3Wb5e). Due to this, it is hard 
to let LangAS::Default maps to 0 for amdgpu target for all languages.

https://github.com/llvm/llvm-project/pull/95728
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-28 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][CodeGen] Remove unnecessary ShouldLinkFiles conditional (PR #96951)

2024-06-27 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/96951
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][Sema] Fix crash when calling builtins with pointer arguments (PR #95957)

2024-06-26 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/95957
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-26 Thread Yaxun Liu via cfe-commits


@@ -284,3 +284,48 @@ Example Usage
   Base* basePtr = &obj;
   basePtr->virtualFunction(); // Allowed since obj is constructed in 
device code
}
+
+SPIR-V Support on HIPAMD ToolChain
+==
+
+The HIPAMD ToolChain supports targetting
+`AMDGCN Flavoured SPIR-V 
`_.
+The support for SPIR-V in the ROCm and HIPAMD ToolChain is under active
+development.
+
+Compilation Process
+---
+
+When compiling HIP programs with the intent of utilizing SPIR-V, the process
+diverges from the traditional compilation flow:
+
+Using ``--offload-arch=amdgcnspirv``
+
+
+- **Target Triple**: The ``--offload-arch=amdgcnspirv`` flag instructs the
+  compiler to use the target triple ``spirv64-amd-amdhsa``. This approach does
+  generates generic AMDGCN SPIR-V which retains architecture specific elements
+  without hardcoding them, thus allowing for optimal target specific code to be
+  generated at run time, when the concrete target is known.
+
+- **LLVM IR Translation**: The program is compiled to LLVM Intermediate
+  Representation (IR), which is subsequently translated into SPIR-V. In the
+  future, this translation step will be replaced by direct SPIR-V emission via
+  the SPIR-V Back-end.
+
+- **Clang Offload Bundler**: The resulting SPIR-V is embedded in the Clang
+  offload bundler with the bundle ID ``hipv4-hip-spirv64-amd-amdhsa-generic``.
+
+Mixed with Normal ``--offload-arch``
+
+
+**Mixing ``amdgcnspirv`` and concrete ``gfx###`` targets via ``--offload-arch``

yxsamliu wrote:

I think we need HIPAMD toolchain instead of HIPSPIRV toolchain because we want 
to locate the tools and device libraries on ROCm platform and do argument 
translation using amdgpu information like we are compiling for any amdgpu 
processors.

https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-26 Thread Yaxun Liu via cfe-commits


@@ -284,3 +284,48 @@ Example Usage
   Base* basePtr = &obj;
   basePtr->virtualFunction(); // Allowed since obj is constructed in 
device code
}
+
+SPIR-V Support on HIPAMD ToolChain
+==
+
+The HIPAMD ToolChain supports targetting
+`AMDGCN Flavoured SPIR-V 
`_.
+The support for SPIR-V in the ROCm and HIPAMD ToolChain is under active
+development.
+
+Compilation Process
+---
+
+When compiling HIP programs with the intent of utilizing SPIR-V, the process
+diverges from the traditional compilation flow:
+
+Using ``--offload-arch=amdgcnspirv``
+
+
+- **Target Triple**: The ``--offload-arch=amdgcnspirv`` flag instructs the
+  compiler to use the target triple ``spirv64-amd-amdhsa``. This approach does
+  generates generic AMDGCN SPIR-V which retains architecture specific elements
+  without hardcoding them, thus allowing for optimal target specific code to be
+  generated at run time, when the concrete target is known.
+
+- **LLVM IR Translation**: The program is compiled to LLVM Intermediate
+  Representation (IR), which is subsequently translated into SPIR-V. In the
+  future, this translation step will be replaced by direct SPIR-V emission via
+  the SPIR-V Back-end.
+
+- **Clang Offload Bundler**: The resulting SPIR-V is embedded in the Clang
+  offload bundler with the bundle ID ``hipv4-hip-spirv64-amd-amdhsa-generic``.

yxsamliu wrote:

should it be ``hipv4-spirv64-amd-amdhsa-generic`` ?

https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-26 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32

yxsamliu wrote:

never mind. it is expected by C

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-26 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32

yxsamliu wrote:

i8 and i16 are converted to i32 then passed to the vaarg func. Is that 
expected? Shouldn't they be passed as they are?

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (PR #96738)

2024-06-26 Thread Yaxun Liu via cfe-commits


@@ -158,23 +158,85 @@ void test_ds_faddf(local float *out, float src) {
 }
 
 // CHECK-LABEL: @test_ds_fmin
-// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.ds.fmin.f32(ptr addrspace(3) 
%out, float %src, i32 0, i32 0, i1 false)
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
+// CHECK: atomicrmw volatile fmin ptr addrspace(3) %out, float %src monotonic, 
align 4{{$}}
+
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src release, align 
4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src acq_rel, align 
4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src syncscope("agent") 
monotonic, align 4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src 
syncscope("workgroup") monotonic, align 4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src 
syncscope("wavefront") monotonic, align 4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src 
syncscope("singlethread") monotonic, align 4{{$}}
+// CHECK: atomicrmw fmin ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
+
 #if !defined(__SPIRV__)
 void test_ds_fminf(local float *out, float src) {
 #else
 void test_ds_fminf(__attribute__((address_space(3))) float *out, float src) {
 #endif
   *out = __builtin_amdgcn_ds_fminf(out, src, 0, 0, false);
+  *out = __builtin_amdgcn_ds_fminf(out, src, 0, 0, true);
+
+  // Test all orders.
+  *out = __builtin_amdgcn_ds_fminf(out, src, 1, 0, false);

yxsamliu wrote:

pls use clang predefined macros for atomic memory order and scope. same as below

https://github.com/llvm/llvm-project/pull/96738
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-24 Thread Yaxun Liu via cfe-commits


@@ -147,6 +147,14 @@ getNVIDIAOffloadTargetTriple(const Driver &D, const 
ArgList &Args,
 static std::optional
 getHIPOffloadTargetTriple(const Driver &D, const ArgList &Args) {
   if (!Args.hasArg(options::OPT_offload_EQ)) {
+auto OffloadArchs = Args.getAllArgValues(options::OPT_offload_arch_EQ);
+if (llvm::find(OffloadArchs, "amdgcnspirv") != OffloadArchs.cend()) {
+  if (OffloadArchs.size() == 1)
+return llvm::Triple("spirv64-amd-amdhsa");

yxsamliu wrote:

I am OK to commit this since the command line option won't change so users are 
not affected.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-24 Thread Yaxun Liu via cfe-commits


@@ -147,6 +147,14 @@ getNVIDIAOffloadTargetTriple(const Driver &D, const 
ArgList &Args,
 static std::optional
 getHIPOffloadTargetTriple(const Driver &D, const ArgList &Args) {
   if (!Args.hasArg(options::OPT_offload_EQ)) {
+auto OffloadArchs = Args.getAllArgValues(options::OPT_offload_arch_EQ);
+if (llvm::find(OffloadArchs, "amdgcnspirv") != OffloadArchs.cend()) {
+  if (OffloadArchs.size() == 1)
+return llvm::Triple("spirv64-amd-amdhsa");

yxsamliu wrote:

Use a toolchain with spirv64 as triple will cause trouble for us to support 
mixed amdgcn and spirv fat binaries, which is critical for us.

Better to take the approach similar to 
https://github.com/llvm/llvm-project/pull/75357, i.e. treat spirv as a 
processor of amgcn triple, so that we can use HIPAMD toolchain for both spirv 
and real amdgcn processor.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/96262
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Yaxun Liu via cfe-commits


@@ -169,6 +180,11 @@
 // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_finite_only_off.bc"
 // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc"
 
+// ASAN-SAME: "-fsanitize=address"
+
+// NOASAN-NOT: "-fsanitize=address"
+// NOASAN-NOT: amdgcn/bitcode/asanrtl.bc

yxsamliu wrote:

understood. that's why I single out the negative tests to a separate run line. 
since there is only negative check in this run, they do not depend on order. 
hopefully it will be stable

https://github.com/llvm/llvm-project/pull/96262
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-20 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/96262

None

>From 16659ca492234b234cc5da2f36563e1d8009620f Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Thu, 20 Jun 2024 21:00:58 -0400
Subject: [PATCH] Enable ASAN in amdgpu toolchain for OpenCL

---
 clang/lib/Driver/ToolChains/AMDGPU.cpp |  6 ++
 clang/lib/Driver/ToolChains/AMDGPU.h   |  3 +++
 clang/test/Driver/rocm-device-libs.cl  | 16 
 3 files changed, 25 insertions(+)

diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp 
b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 20f879e2f75cb..453daed7cc7d5 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -14,6 +14,7 @@
 #include "clang/Driver/DriverDiagnostic.h"
 #include "clang/Driver/InputInfo.h"
 #include "clang/Driver/Options.h"
+#include "clang/Driver/SanitizerArgs.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Option/ArgList.h"
 #include "llvm/Support/Error.h"
@@ -946,6 +947,11 @@ void ROCMToolChain::addClangTargetOptions(
   DriverArgs, LibDeviceFile, Wave64, DAZ, FiniteOnly, UnsafeMathOpt,
   FastRelaxedMath, CorrectSqrt, ABIVer, false));
 
+  if (getSanitizerArgs(DriverArgs).needsAsanRt()) {
+CC1Args.push_back("-mlink-bitcode-file");
+CC1Args.push_back(
+DriverArgs.MakeArgString(RocmInstallation->getAsanRTLPath()));
+  }
   for (StringRef BCFile : BCLibs) {
 CC1Args.push_back("-mlink-builtin-bitcode");
 CC1Args.push_back(DriverArgs.MakeArgString(BCFile));
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.h 
b/clang/lib/Driver/ToolChains/AMDGPU.h
index 13c0e138f08f3..7e70dae8ce152 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.h
+++ b/clang/lib/Driver/ToolChains/AMDGPU.h
@@ -140,6 +140,9 @@ class LLVM_LIBRARY_VISIBILITY ROCMToolChain : public 
AMDGPUToolChain {
   getCommonDeviceLibNames(const llvm::opt::ArgList &DriverArgs,
   const std::string &GPUArch,
   bool isOpenMP = false) const;
+  SanitizerMask getSupportedSanitizers() const override {
+return SanitizerKind::Address;
+  }
 };
 
 } // end namespace toolchains
diff --git a/clang/test/Driver/rocm-device-libs.cl 
b/clang/test/Driver/rocm-device-libs.cl
index 415719105d5dc..6837e219dc35d 100644
--- a/clang/test/Driver/rocm-device-libs.cl
+++ b/clang/test/Driver/rocm-device-libs.cl
@@ -132,9 +132,20 @@
 // RUN:   %S/opencl.cl \
 // RUN: 2>&1 | FileCheck  
--check-prefixes=COMMON,COMMON-DEFAULT,GFX900-DEFAULT,GFX900,WAVE64 %s
 
+// RUN: %clang -### -target amdgcn-amd-amdhsa \
+// RUN:   -x cl -mcpu=gfx908:xnack+ -fsanitize=address \
+// RUN:   --rocm-path=%S/Inputs/rocm \
+// RUN:   %s \
+// RUN: 2>&1 | FileCheck  --check-prefixes=ASAN,COMMON %s
 
+// RUN: %clang -### -target amdgcn-amd-amdhsa \
+// RUN:   -x cl -mcpu=gfx908:xnack+ \
+// RUN:   --rocm-path=%S/Inputs/rocm \
+// RUN:   %s \
+// RUN: 2>&1 | FileCheck  --check-prefixes=NOASAN %s
 
 // COMMON: "-triple" "amdgcn-amd-amdhsa"
+// ASAN-SAME: "-mlink-bitcode-file" "{{.*}}/amdgcn/bitcode/asanrtl.bc"
 // COMMON-SAME: "-mlink-builtin-bitcode" "{{.*}}/amdgcn/bitcode/opencl.bc"
 // COMMON-SAME: "-mlink-builtin-bitcode" "{{.*}}/amdgcn/bitcode/ocml.bc"
 // COMMON-SAME: "-mlink-builtin-bitcode" "{{.*}}/amdgcn/bitcode/ockl.bc"
@@ -169,6 +180,11 @@
 // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_finite_only_off.bc"
 // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc"
 
+// ASAN-SAME: "-fsanitize=address"
+
+// NOASAN-NOT: "-fsanitize=address"
+// NOASAN-NOT: amdgcn/bitcode/asanrtl.bc
+
 // WAVE64: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_wavefrontsize64_on.bc"
 // WAVE32: "-mlink-builtin-bitcode" 
"{{.*}}/amdgcn/bitcode/oclc_wavefrontsize64_off.bc"
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-20 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

maybe add a test for non-constant offset?

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for `llvm.amdgcn.make.buffer.rsrc` intrinsic (PR #95276)

2024-06-20 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-18 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.

LGTM. Thanks.

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-18 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

These builtins generate atomic instructions in IR but the builtin function name 
does not have atomic. Is that a concern? Should they be renamed with atomic in 
name?

https://github.com/llvm/llvm-project/pull/95395
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][Clang][Sema] Fix crash when calling builtins with pointer arguments (PR #95957)

2024-06-18 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/95957
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-17 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.

LGTM. Thanks

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,19 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s
+

yxsamliu wrote:

Add a run line:

// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64 -aux-triple amdgcn 
-Wno-unused-value %s

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,6 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -x hip -aux-triple 
amdgcn-amd-amdhsa %s -fsyntax-only -verify
+
+#define __device__ __attribute__((device))
+
+__device__ __amdgcn_buffer_rsrc_t test_buffer_rsrc_t_device() {} // 
expected-warning {{non-void function does not return a value}}
+__amdgcn_buffer_rsrc_t test_buffer_rsrc_t_host() {} // expected-error 
{{'__amdgcn_buffer_rsrc_t' can only be used in device-side function}}

yxsamliu wrote:

As discussed in https://github.com/llvm/llvm-project/pull/69366, I think the 
trend is to make HIP more like C++ where every function is both device and host 
function, and de-emphasize handling based on host/device attributes. Ideally, 
we can imagine we are compiling a HIP program for a processor that has the 
capability of both the host CPU and the device GPU, so that we can ignore 
host/device difference during semantic checking, and we defer the diagnosing to 
codegen or linker.

The reason is that C++ is not designed with host/device in mind and the current 
parser/sema does not consider host/device attributes in many cases, especially 
about templates. Adding more host/device based sema seems to make things more 
complicated and not to help making generic C++ code (e.g. the standard C++ 
library) work for both host/device. Another reason not to emphasize the 
host/device difference is that difference in device/host AST risks violation of 
ODR and causes issues difficult to diagnose.

In a word, I would not recommend restricting a type to device only. 

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,95 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
verde -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
tonga -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
gfx1100 -emit-llvm -o - %s | FileCheck %s
+
+// CHECK-LABEL: @test_amdgcn_make_buffer_rsrc_p0(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call ptr addrspace(8) 
@llvm.amdgcn.make.buffer.rsrc.p0(ptr [[P:%.*]], i16 [[STRIDE:%.*]], i32 
[[NUM:%.*]], i32 [[FLAGS:%.*]])
+// CHECK-NEXT:ret ptr addrspace(8) [[TMP0]]
+//
+__buffer_rsrc_t test_amdgcn_make_buffer_rsrc_p0(void *p, short stride, int 
num, int flags) {
+  return __builtin_amdgcn_make_buffer_rsrc(p, stride, num, flags);

yxsamliu wrote:

> No, we don't allow to have that. Per the discussion with @arsenm , 
> `__buffer_rsrc_t` is a sizeless target opaque type. It can't be used in 
> anywhere that requires its size to be known.

If you cannot assign it to a variable, how are you going to use it? Can you 
provide some pseudo code about how to use the returned value of 
`__builtin_amdgcn_make_buffer_rsrc` ?

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> > I am wondering whether prefix the builtin type with `__amdgcn_` would be 
> > better since I envision risk of conflicting with reserved names of other 
> > compilers or standard libraries.
> 
> In the patch where the type was introduced we had a brief back-and-forth. I 
> checked the reference type WASM introduced and they don't have prefix. I 
> don't think in the future we'd have a cross-platform/-compiler type called 
> `__buffer_rsrc_t`, and if it happens, it is not supposed to have `__` prefix. 
> However, I'm by no means a language expert, so I'm fine if we really want to 
> add that.

we are introducing `__buffer_rsrc_t` in global namespace, which is seen in any 
other namespace. Imagine some libstdc++ or libc++ header files use the same 
name in some namespaces and a HIP program includes these header files, there 
may be compilation error.

A search of `__buffer` shows libstdc++ and libc++ do use names starting with 
`__buffer`:

https://github.com/search?q=repo%3Agcc-mirror%2Fgcc%20path%3A%2F%5Elibstdc%5C%2B%5C%2B-v3%5C%2Finclude%5C%2F%2F%20__buffer&type=code

https://github.com/search?q=repo%3Allvm%2Fllvm-project+path%3A%2F%5Elibcxx%5C%2F%2F+__buffer&type=code&p=1

I understand the chance of conflict is low. It may be like the chance of 
hitting by a meteor. However, if we prefix with `__amdgcn_`, there is no such 
risk. And we have the benefit to clearly indicate it is a amdgcn 
target-specific type.

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,95 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
verde -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
tonga -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu 
gfx1100 -emit-llvm -o - %s | FileCheck %s
+
+// CHECK-LABEL: @test_amdgcn_make_buffer_rsrc_p0(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call ptr addrspace(8) 
@llvm.amdgcn.make.buffer.rsrc.p0(ptr [[P:%.*]], i16 [[STRIDE:%.*]], i32 
[[NUM:%.*]], i32 [[FLAGS:%.*]])
+// CHECK-NEXT:ret ptr addrspace(8) [[TMP0]]
+//
+__buffer_rsrc_t test_amdgcn_make_buffer_rsrc_p0(void *p, short stride, int 
num, int flags) {
+  return __builtin_amdgcn_make_buffer_rsrc(p, stride, num, flags);

yxsamliu wrote:

I am wondering whether we can have test like

```
__buffer_rsrc_t x = __builtin_amdgcn_make_buffer_rsrc(p, stride, num, flags);
```
```

struct X {
__buffer_rsrc_t src;
int a;
};
```
Otherwise, the usefulness of the builtin type is quite limited.

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

I am wondering whether prefix the builtin type with `__amdgcn_` would be better 
since I envision risk of conflicting with reserved names of other compilers or 
standard libraries.

https://github.com/llvm/llvm-project/pull/95276
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-12 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> > how does a user initialize/populate this type of objects? by calling a 
> > builtin function?
> 
> yes. The builtin functions will come next.
> 
> > need a SemaCUDA test to make sure it is defined with %clang_cc1 -triple 
> > x86_64-unknown-gnu-linux -aux-triple amdgcn-amd-amdhsa , like 
> > https://github.com/llvm/llvm-project/blob/main/clang/test/SemaCUDA/builtin-mangled-name.cu
> 
> Why do we need this to be defined in that way? This should only be used in 
> kernel. I purposely made the type available only when the target triple is 
> amdgcn. @yxsamliu

HIP is single source program, meaning the same source code is visible to both 
device compilation and host compilation. If you have code that only compiles in 
device compilation, you have to put `#if __HIP_DEVICE_COMPILE__ #endif` around 
it. That would clutter the program. The difference between source code seen in 
device compilation and host compilation also increases the risk of violation of 
one-definition-rule. Therefore, in general, we request types, function 
declarations/definitions and pre-defined macros to be visible to both device 
and host compilation. This is true to the existing amdgcn builtin functions. 
Due to the mechanism already in place about -aux-triple, we see device builtin 
functions in current host compilation, otherwise, HIP programs will be 
cluttered with `#if __HIP_DEVICE_COMPILE__ #endif` everywhere. We just request 
the newly added builtin type to follow this existing convention.

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-12 Thread Yaxun Liu via cfe-commits


@@ -128,12 +128,13 @@ enum class CudaArch {
   GFX12_GENERIC,
   GFX1200,
   GFX1201,
+  AMDGCNSPIRV,
   Generic, // A processor model named 'generic' if the target backend defines a
// public one.
   LAST,
 
   CudaDefault = CudaArch::SM_52,
-  HIPDefault = CudaArch::GFX906,
+  HIPDefault = CudaArch::AMDGCNSPIRV,

yxsamliu wrote:

Did this patch pass our internal PSDB?

I would recommend deferring changing the default offload arch to future while 
checkin the other changes first.

I doubt the downstream build is ready to have llvm-link and llvm-spirv. Some 
HIP apps do a test compilation of HIP program with default offload arch and 
this will break them.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-11 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

how does a user initialize/populate this type of objects? by calling a builtin 
function?

need a SemaCUDA test to make sure it is defined with  %clang_cc1 -triple 
x86_64-unknown-gnu-linux -aux-triple amdgcn-amd-amdhsa , like 
https://github.com/llvm/llvm-project/blob/main/clang/test/SemaCUDA/builtin-mangled-name.cu

https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-10 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/77359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-07 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu edited 
https://github.com/llvm/llvm-project/pull/77359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-07 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/77359

>From dd589653a94faba3a458134b5713a82886271c86 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Thu, 30 May 2024 16:02:37 -0400
Subject: [PATCH 1/2] [CUDA][HIP] warn incompatible redeclare

nvcc warns about the following code:

but clang does not since clang allows device function to
overload host function.

Users want clang to emit similar warning to help code to be
compatible with nvcc.

Since this may cause regression with existing code, the warning
is off by default and can be enabled by -Woffload-incompatible-redeclare.

It won't cause warning in system headers, even with
-Woffload-incompatible-redeclare.
---
 .../clang/Basic/DiagnosticSemaKinds.td|  5 +++
 clang/lib/Sema/SemaCUDA.cpp   | 41 +++
 clang/test/SemaCUDA/function-redclare.cu  | 19 +
 llvm/docs/CompileCudaWithLLVM.rst | 11 +
 4 files changed, 60 insertions(+), 16 deletions(-)
 create mode 100644 clang/test/SemaCUDA/function-redclare.cu

diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 9f0b6f5a36389..2394ef7de6494 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9013,6 +9013,11 @@ def err_cuda_ovl_target : Error<
   "cannot overload %select{__device__|__global__|__host__|__host__ 
__device__}2 function %3">;
 def note_cuda_ovl_candidate_target_mismatch : Note<
 "candidate template ignored: target attributes do not match">;
+def warn_offload_incompatible_redeclare : Warning<
+  "target-attribute based function overloads are not supported by NVCC and 
will be treated as a function redeclaration:"
+  "new declaration is %select{__device__|__global__|__host__|__host__ 
__device__}0 function, "
+  "old declaration is %select{__device__|__global__|__host__|__host__ 
__device__}1 function">,
+  InGroup>, DefaultIgnore;
 
 def err_cuda_device_builtin_surftex_cls_template : Error<
 "illegal device builtin %select{surface|texture}0 reference "
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 80ea43dc5316e..580b9872c6a1d 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -1018,24 +1018,33 @@ void SemaCUDA::checkTargetOverload(FunctionDecl *NewFD,
 // HD/global functions "exist" in some sense on both the host and device, 
so
 // should have the same implementation on both sides.
 if (NewTarget != OldTarget &&
-((NewTarget == CUDAFunctionTarget::HostDevice &&
-  !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
-isImplicitHostDeviceFunction(NewFD) &&
-OldTarget == CUDAFunctionTarget::Device)) ||
- (OldTarget == CUDAFunctionTarget::HostDevice &&
-  !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
-isImplicitHostDeviceFunction(OldFD) &&
-NewTarget == CUDAFunctionTarget::Device)) ||
- (NewTarget == CUDAFunctionTarget::Global) ||
- (OldTarget == CUDAFunctionTarget::Global)) &&
 !SemaRef.IsOverload(NewFD, OldFD, /* UseMemberUsingDeclRules = */ 
false,
 /* ConsiderCudaAttrs = */ false)) {
-  Diag(NewFD->getLocation(), diag::err_cuda_ovl_target)
-  << llvm::to_underlying(NewTarget) << NewFD->getDeclName()
-  << llvm::to_underlying(OldTarget) << OldFD;
-  Diag(OldFD->getLocation(), diag::note_previous_declaration);
-  NewFD->setInvalidDecl();
-  break;
+  if ((NewTarget == CUDAFunctionTarget::HostDevice &&
+   !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
+ isImplicitHostDeviceFunction(NewFD) &&
+ OldTarget == CUDAFunctionTarget::Device)) ||
+  (OldTarget == CUDAFunctionTarget::HostDevice &&
+   !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
+ isImplicitHostDeviceFunction(OldFD) &&
+ NewTarget == CUDAFunctionTarget::Device)) ||
+  (NewTarget == CUDAFunctionTarget::Global) ||
+  (OldTarget == CUDAFunctionTarget::Global)) {
+Diag(NewFD->getLocation(), diag::err_cuda_ovl_target)
+<< llvm::to_underlying(NewTarget) << NewFD->getDeclName()
+<< llvm::to_underlying(OldTarget) << OldFD;
+Diag(OldFD->getLocation(), diag::note_previous_declaration);
+NewFD->setInvalidDecl();
+break;
+  }
+  if ((NewTarget == CUDAFunctionTarget::Host &&
+   OldTarget == CUDAFunctionTarget::Device) ||
+  (NewTarget == CUDAFunctionTarget::Device &&
+   OldTarget == CUDAFunctionTarget::Host)) {
+Diag(NewFD->getLocation(), diag::warn_offload_incompatible_redeclare)
+<< llvm::to_underlying(NewTarget) << 
llvm::to_underlying(OldTarget);
+Diag(OldFD->getLocation(), diag::note_previous_declaration);
+  }
 }
   }
 }
diff --git a/c

[clang] [llvm] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-07 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only \
+// RUN:   -isystem %S/Inputs -verify %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:   -isystem %S/Inputs -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only \
+// RUN:   -isystem %S/Inputs -verify=redecl -Woffload-incompatible-redeclare %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:   -isystem %S/Inputs -fcuda-is-device -Woffload-incompatible-redeclare 
-verify=redecl %s
+
+// expected-no-diagnostics
+#include "cuda.h"
+
+__device__ void f(); // redecl-note {{previous declaration is here}}
+
+void f() {} // redecl-warning {{incompatible host/device attribute with 
redeclaration: new declaration is __host__ function, old declaration is 
__device__ function. It will cause warning with nvcc}}

yxsamliu wrote:

> IMO that looks like a nice "this is incompatible with X" warning like we have 
> for GCC, different C/C++ versions etc., but I'm not an expert here, so maybe 
> this isn't actually an incompatibility? FWIW neither the diagnostic nor your 
> comment make this really clear to me. (Also I'd drop the `It will cause 
> warning with nvcc` and make the flag something like `-Wnvcc-compat`)

sorry. I missed your comments. will rename the flag

https://github.com/llvm/llvm-project/pull/77359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-07 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/77359

>From dd589653a94faba3a458134b5713a82886271c86 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Thu, 30 May 2024 16:02:37 -0400
Subject: [PATCH] [CUDA][HIP] warn incompatible redeclare

nvcc warns about the following code:

but clang does not since clang allows device function to
overload host function.

Users want clang to emit similar warning to help code to be
compatible with nvcc.

Since this may cause regression with existing code, the warning
is off by default and can be enabled by -Woffload-incompatible-redeclare.

It won't cause warning in system headers, even with
-Woffload-incompatible-redeclare.
---
 .../clang/Basic/DiagnosticSemaKinds.td|  5 +++
 clang/lib/Sema/SemaCUDA.cpp   | 41 +++
 clang/test/SemaCUDA/function-redclare.cu  | 19 +
 llvm/docs/CompileCudaWithLLVM.rst | 11 +
 4 files changed, 60 insertions(+), 16 deletions(-)
 create mode 100644 clang/test/SemaCUDA/function-redclare.cu

diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 9f0b6f5a36389..2394ef7de6494 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9013,6 +9013,11 @@ def err_cuda_ovl_target : Error<
   "cannot overload %select{__device__|__global__|__host__|__host__ 
__device__}2 function %3">;
 def note_cuda_ovl_candidate_target_mismatch : Note<
 "candidate template ignored: target attributes do not match">;
+def warn_offload_incompatible_redeclare : Warning<
+  "target-attribute based function overloads are not supported by NVCC and 
will be treated as a function redeclaration:"
+  "new declaration is %select{__device__|__global__|__host__|__host__ 
__device__}0 function, "
+  "old declaration is %select{__device__|__global__|__host__|__host__ 
__device__}1 function">,
+  InGroup>, DefaultIgnore;
 
 def err_cuda_device_builtin_surftex_cls_template : Error<
 "illegal device builtin %select{surface|texture}0 reference "
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 80ea43dc5316e..580b9872c6a1d 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -1018,24 +1018,33 @@ void SemaCUDA::checkTargetOverload(FunctionDecl *NewFD,
 // HD/global functions "exist" in some sense on both the host and device, 
so
 // should have the same implementation on both sides.
 if (NewTarget != OldTarget &&
-((NewTarget == CUDAFunctionTarget::HostDevice &&
-  !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
-isImplicitHostDeviceFunction(NewFD) &&
-OldTarget == CUDAFunctionTarget::Device)) ||
- (OldTarget == CUDAFunctionTarget::HostDevice &&
-  !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
-isImplicitHostDeviceFunction(OldFD) &&
-NewTarget == CUDAFunctionTarget::Device)) ||
- (NewTarget == CUDAFunctionTarget::Global) ||
- (OldTarget == CUDAFunctionTarget::Global)) &&
 !SemaRef.IsOverload(NewFD, OldFD, /* UseMemberUsingDeclRules = */ 
false,
 /* ConsiderCudaAttrs = */ false)) {
-  Diag(NewFD->getLocation(), diag::err_cuda_ovl_target)
-  << llvm::to_underlying(NewTarget) << NewFD->getDeclName()
-  << llvm::to_underlying(OldTarget) << OldFD;
-  Diag(OldFD->getLocation(), diag::note_previous_declaration);
-  NewFD->setInvalidDecl();
-  break;
+  if ((NewTarget == CUDAFunctionTarget::HostDevice &&
+   !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
+ isImplicitHostDeviceFunction(NewFD) &&
+ OldTarget == CUDAFunctionTarget::Device)) ||
+  (OldTarget == CUDAFunctionTarget::HostDevice &&
+   !(getLangOpts().OffloadImplicitHostDeviceTemplates &&
+ isImplicitHostDeviceFunction(OldFD) &&
+ NewTarget == CUDAFunctionTarget::Device)) ||
+  (NewTarget == CUDAFunctionTarget::Global) ||
+  (OldTarget == CUDAFunctionTarget::Global)) {
+Diag(NewFD->getLocation(), diag::err_cuda_ovl_target)
+<< llvm::to_underlying(NewTarget) << NewFD->getDeclName()
+<< llvm::to_underlying(OldTarget) << OldFD;
+Diag(OldFD->getLocation(), diag::note_previous_declaration);
+NewFD->setInvalidDecl();
+break;
+  }
+  if ((NewTarget == CUDAFunctionTarget::Host &&
+   OldTarget == CUDAFunctionTarget::Device) ||
+  (NewTarget == CUDAFunctionTarget::Device &&
+   OldTarget == CUDAFunctionTarget::Host)) {
+Diag(NewFD->getLocation(), diag::warn_offload_incompatible_redeclare)
+<< llvm::to_underlying(NewTarget) << 
llvm::to_underlying(OldTarget);
+Diag(OldFD->getLocation(), diag::note_previous_declaration);
+  }
 }
   }
 }
diff --git a/clang

[clang] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-07 Thread Yaxun Liu via cfe-commits


@@ -9013,6 +9013,12 @@ def err_cuda_ovl_target : Error<
   "cannot overload %select{__device__|__global__|__host__|__host__ 
__device__}2 function %3">;
 def note_cuda_ovl_candidate_target_mismatch : Note<
 "candidate template ignored: target attributes do not match">;
+def warn_offload_incompatible_redeclare : Warning<

yxsamliu wrote:

will document it

https://github.com/llvm/llvm-project/pull/77359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Fix passing target id features to AMDGPU offloading (PR #94765)

2024-06-07 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/94765
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-07 Thread Yaxun Liu via cfe-commits


@@ -9013,6 +9013,12 @@ def err_cuda_ovl_target : Error<
   "cannot overload %select{__device__|__global__|__host__|__host__ 
__device__}2 function %3">;
 def note_cuda_ovl_candidate_target_mismatch : Note<
 "candidate template ignored: target attributes do not match">;
+def warn_offload_incompatible_redeclare : Warning<
+  "incompatible host/device attribute with redeclaration: "
+  "new declaration is %select{__device__|__global__|__host__|__host__ 
__device__}0 function, "
+  "old declaration is %select{__device__|__global__|__host__|__host__ 
__device__}1 function. "
+  "It will cause warning with nvcc">,

yxsamliu wrote:

will modify the diagnostic message

https://github.com/llvm/llvm-project/pull/77359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][CodeGen] `used` globals are fake (PR #93601)

2024-06-06 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/93601
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-06-06 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

ping

Our users keep requesting this feature since they want their HIP code works 
with both nvcc and clang. I tested it with real HIP apps and did not see 
warnings emitted for clang wrapper headers and HIP system headers. Only 
warnings for users' own code were emitted. Also since this warning is off by 
default, it won't affect normal users.

https://github.com/llvm/llvm-project/pull/77359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][HIP] Suppress availability diagnostics for mismatched host/device overloads (PR #93546)

2024-06-05 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> Ping.

You situation is similar to overloading resolution of functions called in 
global variable initializer. You may consider using a similar approach as 
https://reviews.llvm.org/D158247

https://github.com/llvm/llvm-project/pull/93546
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] [Clang] Implement CWG2137 (list-initialization from objects of the same type) (PR #94355)

2024-06-04 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> @yxsamliu Re: [#77768 
> (comment)](https://github.com/llvm/llvm-project/pull/77768#issuecomment-1957171805):
>  That is the expected behaviour, since CWG2137 expressly wants to use 
> initializer_list constructors over non-initializer_list constructors 
> (especially copy constructors)

Agreed. We will fix on app side.

Do you have a plan to reland this PR since we have a library hipDF depending on 
it. Thanks.

https://github.com/llvm/llvm-project/pull/94355
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] Fix std::min in wrapper header (PR #93976)

2024-06-03 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/93976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA] Fix a couple of driver tests that really weren't being run (PR #93960)

2024-05-31 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.


https://github.com/llvm/llvm-project/pull/93960
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] Fix std::min in wrapper header (PR #93976)

2024-05-31 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/93976

The std::min behaves like 'ahttps://github.com/llvm/llvm-project/issues/93962

Fixes: https://github.com/ROCm/HIP/issues/3502

>From ac8100056c81d1b4d3d40c31574be93ca78cf80c Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Fri, 31 May 2024 11:33:32 -0400
Subject: [PATCH] [CUDA][HIP] Fix std::min in wrapper header

The std::min behaves like 'ahttps://github.com/llvm/llvm-project/issues/93962

Fixes: https://github.com/ROCm/HIP/issues/3502
---
 clang/lib/Headers/cuda_wrappers/algorithm|  2 +-
 clang/test/Headers/cuda_wrapper_algorithm.cu | 48 
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/Headers/cuda_wrapper_algorithm.cu

diff --git a/clang/lib/Headers/cuda_wrappers/algorithm 
b/clang/lib/Headers/cuda_wrappers/algorithm
index f14a0b00bb046..3f59f28ae35b3 100644
--- a/clang/lib/Headers/cuda_wrappers/algorithm
+++ b/clang/lib/Headers/cuda_wrappers/algorithm
@@ -99,7 +99,7 @@ template 
 __attribute__((enable_if(true, "")))
 inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
-  return __a < __b ? __a : __b;
+  return __b < __a ? __b : __a;
 }
 
 #pragma pop_macro("_CPP14_CONSTEXPR")
diff --git a/clang/test/Headers/cuda_wrapper_algorithm.cu 
b/clang/test/Headers/cuda_wrapper_algorithm.cu
new file mode 100644
index 0..d514285f7e17b
--- /dev/null
+++ b/clang/test/Headers/cuda_wrapper_algorithm.cu
@@ -0,0 +1,48 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// RUN: %clang_cc1 \
+// RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
+// RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -triple x86_64-unknown-unknown \
+// RUN:   -emit-llvm %s -O1 -o - \
+// RUN:   | FileCheck %s
+
+#define __host__ __attribute__((host))
+#define __device__ __attribute__((device))
+
+#include 
+
+extern "C" bool cmp(double a, double b) { return ahttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][CodeGen] Global constructors/destructors are globals (PR #93914)

2024-05-31 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> > Perhaps an alternative is to tweak LangRef wording to say that that these 
> > are always emitted as unqualified ptrs, and that their ephemeral nature 
> > implies that their AS is meaningless?
> 
> I think this is the correct way to handle it. Also we'll need a few 
> stripPointerCasts added somewhere

+1

https://github.com/llvm/llvm-project/pull/93914
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][HIP] warn incompatible redeclare (PR #77359)

2024-05-30 Thread Yaxun Liu via cfe-commits


@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only \
+// RUN:   -isystem %S/Inputs -verify %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:   -isystem %S/Inputs -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only \
+// RUN:   -isystem %S/Inputs -verify=redecl -Woffload-incompatible-redeclare %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:   -isystem %S/Inputs -fcuda-is-device -Woffload-incompatible-redeclare 
-verify=redecl %s
+
+// expected-no-diagnostics
+#include "cuda.h"
+
+__device__ void f(); // redecl-note {{previous declaration is here}}
+
+void f() {} // redecl-warning {{incompatible host/device attribute with 
redeclaration: new declaration is __host__ function, old declaration is 
__device__ function. It will cause warning with nvcc}}

yxsamliu wrote:

I checked with real HIP apps and the warnings will only show up for user's code 
or header files. There are no warnings for host/device redeclarations in HIP or 
clang headers since they are included as system headers.

https://github.com/llvm/llvm-project/pull/77359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][CodeGen] Global constructors/destructors are globals (PR #93914)

2024-05-30 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

llvm datalayout defines 

P - program addr space for functions
G - global addr space for global variables

https://llvm.org/docs/LangRef.html#langref-datalayout

should we use P for llvm.global_ctors instead of G?

https://github.com/llvm/llvm-project/pull/93914
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >