[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-16 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam closed 
https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-16 Thread Saiyedul Islam via cfe-commits

saiislam wrote:

Thanks for the review and comments. Closing the PR.

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Saiyedul Islam via cfe-commits

saiislam wrote:

> > If `-march` is the wrong option then let's start deprecating it and remove 
> > it altogether in the next llvm release. But, as long as it is here, it 
> > should be equivalent to `--offload-arch`.
> 
> Honestly not a bad idea. I could make a patch warning users to use 
> `--offload-arch` instead for now.

Sure, let's do that. But, let this land as long as this option is supported.

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Saiyedul Islam via cfe-commits

saiislam wrote:

> > > I don't think we want to support this. `-march` was the wrong option to 
> > > use in the first place, and upstream LLVM never supported specifying 
> > > multiple device images with `-march` so there isn't a legacy argument in 
> > > trunk. However, AOMP did support this and if it's deemed too disruptive 
> > > to request users move to `--offload-arch=a,b,c` then we can carry that 
> > > change in the fork.
> > > > It will fix tests like: 
> > > > [targetid_multi_image](https://github.com/ROCm/aomp/tree/aomp-dev/test/smoke/targetid_multi_image)
> > > 
> > > 
> > > I think the easier way to fix this is to update the Makefile.
> > 
> > 
> > Irrespective of what AOMP does, I think it makes sense to ensure parity 
> > between the two ways of specifying architecture. People have been 
> > historically using `-Xopenmp-target -march` style, and using the same for 
> > multiple architectures seems to be the most obvious choice. Isn't it quite 
> > confusing to tell the users that they can use `offload-arch` style for 
> > single as well as multiple archs, but can use `-march` style only for 
> > single arch?
> 
> `-march` was the wrong option to use for this from the beginning. It's 
> supposed to be an overriding option and it shouldn't be overloaded to mean 
> something different here. In LLVM / trunk we never supported multiple 
> architectures with the `-march` option so I don't see any reason to start 
> now. `--offload-arch=` is a complete replacement for this behavior and I 
> consider the single `-march` option to be legacy. Even within this it's 
> divergent because HIP / OpenCL / AMDGPU use `-mcpu` but the OpenMP toolchain 
> ignored that and uses `-march=`.
> 
> Using `--offload-arch=` is a direct replacement for `-march` in all 
> use-cases. It's also easier to use and interoperable with CUDA. I would just 
> change the test, you can replace every use of `-march` with `--offload-arch` 
> and it will work. See the following.
> 
> ```
> > clang input.c -fopenmp 
> > -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
>   -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
>   -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \
>   -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_89 
> > llvm-objdump --offloading a.out
>   
>   
> a.out:file format elf64-x86-64
> 
> OFFLOADING IMAGE [0]:
> kindelf
> archsm_89
> triple  nvptx64-nvidia-cuda
> produceropenmp
> 
> OFFLOADING IMAGE [1]:
> kindelf
> archgfx90a
> triple  amdgcn-amd-amdhsa
> produceropenmp
> 
> OFFLOADING IMAGE [2]:
> kindelf
> archgfx1030
> triple  amdgcn-amd-amdhsa
> produceropenmp
> ```

If `-march` is the wrong option then let's start deprecating it and remove it 
altogether in the next llvm release. But, as long as it is here, it should be 
equivalent to `--offload-arch`.

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Saiyedul Islam via cfe-commits

saiislam wrote:

> I don't think we want to support this. `-march` was the wrong option to use 
> in the first place, and upstream LLVM never supported specifying multiple 
> device images with `-march` so there isn't a legacy argument in trunk. 
> However, AOMP did support this and if it's deemed too disruptive to request 
> users move to `--offload-arch=a,b,c` then we can carry that change in the 
> fork.
> 
> > It will fix tests like: 
> > [targetid_multi_image](https://github.com/ROCm/aomp/tree/aomp-dev/test/smoke/targetid_multi_image)
> 
> I think the easier way to fix this is to update the Makefile.

Irrespective of what AOMP does, I think it makes sense to ensure parity between 
the two ways of specifying architecture. People have been historically using 
`-Xopenmp-target -march` style, and using the same for multiple architectures 
seems to be the most obvious choice.
Isn't it quite confusing to tell the users that they can use `offload-arch` 
style for single as well as multiple archs, but can use `-march` style only for 
single arch?

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Saiyedul Islam via cfe-commits

saiislam wrote:

It will fix tests like: 
[targetid_multi_image](https://github.com/ROCm/aomp/tree/aomp-dev/test/smoke/targetid_multi_image)

https://github.com/llvm/llvm-project/pull/92290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][OpenMP] Fix multi arch compilation for -march option (PR #92290)

2024-05-15 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam created 
https://github.com/llvm/llvm-project/pull/92290

Legacy toolchain to handle multiple target architectures specified
using `-Xopenmp-target= -march=` was only
processing a single architecture. This patch also fixes the use of
comma to specify multiple archs for a single triple.

>From a6611634d03d102a8b69df8ff20d324efd81ae48 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Wed, 15 May 2024 11:07:08 -0500
Subject: [PATCH] [Clang][OpenMP] Fix multi arch compilation for -march option

Legacy toolchain to handle multiple target architectures specified
using `-Xopenmp-target= -march=` was only processing
a single architecture. This patch also fixes the use of comma to
specify multiple archs for a single triple.
---
 clang/lib/Driver/Driver.cpp |  3 +-
 clang/test/Driver/amdgpu-openmp-toolchain.c | 42 ++---
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 2868b4f2b02e9..9ba148dd93d0a 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4486,7 +4486,8 @@ Driver::getOffloadArchs(Compilation , const 
llvm::opt::DerivedArgList ,
 
 // Add or remove the seen architectures in order of appearance. If an
 // invalid architecture is given we simply exit.
-if (Arg->getOption().matches(options::OPT_offload_arch_EQ)) {
+if (Arg->getOption().matches(options::OPT_offload_arch_EQ)||
+Arg->getOption().matches(options::OPT_march_EQ)) {
   for (StringRef Arch : llvm::split(Arg->getValue(), ",")) {
 if (Arch == "native" || Arch.empty()) {
   auto GPUsOrErr = TC->getSystemGPUArchs(Args);
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index 849afb871ddbf..f4172841e160d 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -18,18 +18,48 @@
 // CHECK-PHASES: 0: input, "[[INPUT:.+]]", c, (host-openmp)
 // CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
 // CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
-// CHECK-PHASES: 3: input, "[[INPUT]]", c, (device-openmp)
-// CHECK-PHASES: 4: preprocessor, {3}, cpp-output, (device-openmp)
-// CHECK-PHASES: 5: compiler, {4}, ir, (device-openmp)
-// CHECK-PHASES: 6: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, 
"device-openmp (amdgcn-amd-amdhsa)" {5}, ir
-// CHECK-PHASES: 7: backend, {6}, ir, (device-openmp)
-// CHECK-PHASES: 8: offload, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir
+// CHECK-PHASES: 3: input, "[[INPUT]]", c, (device-openmp, gfx906)
+// CHECK-PHASES: 4: preprocessor, {3}, cpp-output, (device-openmp, gfx906)
+// CHECK-PHASES: 5: compiler, {4}, ir, (device-openmp, gfx906)
+// CHECK-PHASES: 6: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, 
"device-openmp (amdgcn-amd-amdhsa:gfx906)" {5}, ir
+// CHECK-PHASES: 7: backend, {6}, ir, (device-openmp, gfx906)
+// CHECK-PHASES: 8: offload, "device-openmp (amdgcn-amd-amdhsa:gfx906)" {7}, ir
 // CHECK-PHASES: 9: clang-offload-packager, {8}, image, (device-openmp)
 // CHECK-PHASES: 10: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, 
"device-openmp (x86_64-unknown-linux-gnu)" {9}, ir
 // CHECK-PHASES: 11: backend, {10}, assembler, (host-openmp)
 // CHECK-PHASES: 12: assembler, {11}, object, (host-openmp)
 // CHECK-PHASES: 13: clang-linker-wrapper, {12}, image, (host-openmp)
 
+// RUN:   %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp 
-fopenmp-targets=amdgcn-amd-amdhsa \
+// RUN:   -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a:xnack+ \
+// RUN:   -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a:xnack- %s 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-PHASES-MULTI %s
+
+// RUN:   %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp 
-fopenmp-targets=amdgcn-amd-amdhsa \
+// RUN:   -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a:xnack+,gfx90a:xnack- 
%s 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-PHASES-MULTI %s
+
+// CHECK-PHASES-MULTI: 0: input, "[[INPUT:.+]]", c, (host-openmp)
+// CHECK-PHASES-MULTI: 1: preprocessor, {0}, cpp-output, (host-openmp)
+// CHECK-PHASES-MULTI: 2: compiler, {1}, ir, (host-openmp)
+// CHECK-PHASES-MULTI: 3: input, "[[INPUT]]", c, (device-openmp, gfx90a:xnack+)
+// CHECK-PHASES-MULTI: 4: preprocessor, {3}, cpp-output, (device-openmp, 
gfx90a:xnack+)
+// CHECK-PHASES-MULTI: 5: compiler, {4}, ir, (device-openmp, gfx90a:xnack+)
+// CHECK-PHASES-MULTI: 6: offload, "host-openmp (x86_64-unknown-linux-gnu)" 
{2}, "device-openmp (amdgcn-amd-amdhsa:gfx90a:xnack+)" {5}, ir
+// CHECK-PHASES-MULTI: 7: backend, {6}, ir, (device-openmp, gfx90a:xnack+)
+// CHECK-PHASES-MULTI: 8: offload, "device-openmp 
(amdgcn-amd-amdhsa:gfx90a:xnack+)" {7}, ir
+// CHECK-PHASES-MULTI: 9: input, "[[INPUT]]", c, (device-openmp, gfx90a:xnack-)
+// CHECK-PHASES-MULTI: 10: preprocessor, {9}, cpp-output, (device-openmp, 
gfx90a:xnack-)
+// 

[clang] [OpenMP][Clang] Enable inscan modifier for generic datatypes (PR #82220)

2024-02-18 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam commented:

May be merge the two PRs in one?
They both are not independent.

https://github.com/llvm/llvm-project/pull/82220
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][Clang] Handle unsupported inscan modifier for generic types (PR #79431)

2024-01-31 Thread Saiyedul Islam via cfe-commits


@@ -19520,6 +19520,13 @@ static bool actOnOMPReductionKindClause(
   bool FirstIter = true;
   for (Expr *RefExpr : VarList) {
 assert(RefExpr && "nullptr expr in OpenMP reduction clause.");
+if (ClauseKind == OMPC_reduction &&
+RD.RedModifier == OMPC_REDUCTION_inscan && RefExpr->isTypeDependent()) 
{
+  S.Diag(RefExpr->getExprLoc(),
+ diag::err_omp_inscan_reduction_on_template_type);
+  continue;
+}
+

saiislam wrote:

> This is definetely wrong, templates should be supported

Did you mean that templates are currently supported, or did you mean that they 
should be supported?

https://github.com/llvm/llvm-project/pull/79431
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [lld] [AMDGPU] Rename COV module flag to amdhsa_code_object_version (PR #79905)

2024-01-30 Thread Saiyedul Islam via cfe-commits


@@ -25,4 +25,4 @@ entry:
 }
 
 !llvm.module.flags = !{!0}
-!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
+!0 = !{i32 1, !"amdhsa_code_object_version", i32 500}

saiislam wrote:

Can we remove explicit mention of COV5 module flag in all these test files 
given that the current default is COV5?
Or, may be a separate patch would be better for that.

https://github.com/llvm/llvm-project/pull/79905
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][Clang] Handle unsupported inscan modifier for generic types (PR #79431)

2024-01-25 Thread Saiyedul Islam via cfe-commits


@@ -12,28 +12,6 @@
 
 void foo() {}
 
-template 

saiislam wrote:

How was this test case working till now when templates were not supported in 
scan?

https://github.com/llvm/llvm-project/pull/79431
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [mlir] [llvm] [clang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-23 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam closed 
https://github.com/llvm/llvm-project/pull/79039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [mlir] [llvm] [clang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-23 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam closed 
https://github.com/llvm/llvm-project/pull/79038
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [mlir] [flang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-23 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/79039

>From 384a90e5f161e4647a6ab803906a93f730c5df4b Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH 1/2] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests, docs, and release notes for Clang and
LLVM.

For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/docs/ReleaseNotes.rst   |  3 +++
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 flang/test/Driver/driver-help-hidden.f90  |  2 +-
 flang/test/Driver/driver-help.f90 |  4 ++--
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/docs/ReleaseNotes.rst|  2 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 16 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5846503af3acdfe..069dfcd22e3b667 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1104,6 +1104,9 @@ AMDGPU Support
   arguments in C ABI. Callee is responsible for allocating stack memory and
   copying the value of the struct if modified. Note that AMDGPU backend still
   supports byval for struct arguments.
+- The default value for ``-mcode-object-version`` is now 5.
+  See `AMDHSA Code Object V5 Metadata 
`_
+  for more details.
 
 X86 Support
 ^^^
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..d4b82b301f12e64 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed3..ae2c61439f4ca53 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d2..3cb6632fc0b63d3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10b..0c846e0936b58b1 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | 

[mlir] [flang] [clang] [llvm] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-23 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/79038

>From 384a90e5f161e4647a6ab803906a93f730c5df4b Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests, docs, and release notes for Clang and
LLVM.

For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/docs/ReleaseNotes.rst   |  3 +++
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 flang/test/Driver/driver-help-hidden.f90  |  2 +-
 flang/test/Driver/driver-help.f90 |  4 ++--
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/docs/ReleaseNotes.rst|  2 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 16 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5846503af3acdfe..069dfcd22e3b667 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1104,6 +1104,9 @@ AMDGPU Support
   arguments in C ABI. Callee is responsible for allocating stack memory and
   copying the value of the struct if modified. Note that AMDGPU backend still
   supports byval for struct arguments.
+- The default value for ``-mcode-object-version`` is now 5.
+  See `AMDHSA Code Object V5 Metadata 
`_
+  for more details.
 
 X86 Support
 ^^^
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..d4b82b301f12e64 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed3..ae2c61439f4ca53 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d2..3cb6632fc0b63d3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10b..0c846e0936b58b1 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | FileCheck 

[llvm] [mlir] [clang] [flang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-23 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/79039

>From 04377914831f54f0572d5b1b233826fd0e204685 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH 1/2] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests, docs, and release notes for Clang and
LLVM.

For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/docs/ReleaseNotes.rst   |  3 +++
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 flang/test/Driver/driver-help-hidden.f90  |  2 +-
 flang/test/Driver/driver-help.f90 |  2 +-
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/docs/ReleaseNotes.rst|  2 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 16 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5846503af3acdf..069dfcd22e3b66 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1104,6 +1104,9 @@ AMDGPU Support
   arguments in C ABI. Callee is responsible for allocating stack memory and
   copying the value of the struct if modified. Note that AMDGPU backend still
   supports byval for struct arguments.
+- The default value for ``-mcode-object-version`` is now 5.
+  See `AMDHSA Code Object V5 Metadata 
`_
+  for more details.
 
 X86 Support
 ^^^
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de8..d4b82b301f12e6 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed..ae2c61439f4ca5 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d..3cb6632fc0b63d 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10..0c846e0936b58b 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | FileCheck 

[clang] [llvm] [mlir] [flang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-23 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/79038

>From 04377914831f54f0572d5b1b233826fd0e204685 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests, docs, and release notes for Clang and
LLVM.

For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/docs/ReleaseNotes.rst   |  3 +++
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 flang/test/Driver/driver-help-hidden.f90  |  2 +-
 flang/test/Driver/driver-help.f90 |  2 +-
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/docs/ReleaseNotes.rst|  2 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 16 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5846503af3acdfe..069dfcd22e3b667 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1104,6 +1104,9 @@ AMDGPU Support
   arguments in C ABI. Callee is responsible for allocating stack memory and
   copying the value of the struct if modified. Note that AMDGPU backend still
   supports byval for struct arguments.
+- The default value for ``-mcode-object-version`` is now 5.
+  See `AMDHSA Code Object V5 Metadata 
`_
+  for more details.
 
 X86 Support
 ^^^
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..d4b82b301f12e64 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed3..ae2c61439f4ca53 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d2..3cb6632fc0b63d3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10b..0c846e0936b58b1 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | FileCheck 

[clang] [llvm] [flang] [mlir] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam edited 
https://github.com/llvm/llvm-project/pull/79039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [flang] [llvm] [clang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/79039

>From 9791643aa93f70bcaf89cd9ca679dbd1bed58676 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH 1/2] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests, docs, and release notes for Clang and
LLVM.

For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/docs/ReleaseNotes.rst   |  2 ++
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 flang/test/Driver/driver-help-hidden.f90  |  2 +-
 flang/test/Driver/driver-help.f90 |  2 +-
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/docs/ReleaseNotes.rst|  2 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 16 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5846503af3acdfe..472e9fcde2c468a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1104,6 +1104,8 @@ AMDGPU Support
   arguments in C ABI. Callee is responsible for allocating stack memory and
   copying the value of the struct if modified. Note that AMDGPU backend still
   supports byval for struct arguments.
+- The default value for ``-mcode-object-version`` is now 5. See :ref:`AMDHSA 
code object version `
+  for more details.
 
 X86 Support
 ^^^
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..d4b82b301f12e64 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed3..ae2c61439f4ca53 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d2..3cb6632fc0b63d3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10b..0c846e0936b58b1 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple 

[mlir] [flang] [llvm] [clang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-22 Thread Saiyedul Islam via cfe-commits

saiislam wrote:

> Should get a mention in the release notes

Thanks for pointing it out. I have updated it.

https://github.com/llvm/llvm-project/pull/79038
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [flang] [llvm] [clang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-22 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/79038

>From 9791643aa93f70bcaf89cd9ca679dbd1bed58676 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests, docs, and release notes for Clang and
LLVM.

For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/docs/ReleaseNotes.rst   |  2 ++
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 flang/test/Driver/driver-help-hidden.f90  |  2 +-
 flang/test/Driver/driver-help.f90 |  2 +-
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/docs/ReleaseNotes.rst|  2 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 16 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5846503af3acdf..472e9fcde2c468 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1104,6 +1104,8 @@ AMDGPU Support
   arguments in C ABI. Callee is responsible for allocating stack memory and
   copying the value of the struct if modified. Note that AMDGPU backend still
   supports byval for struct arguments.
+- The default value for ``-mcode-object-version`` is now 5. See :ref:`AMDHSA 
code object version `
+  for more details.
 
 X86 Support
 ^^^
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de8..d4b82b301f12e6 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed..ae2c61439f4ca5 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d..3cb6632fc0b63d 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10..0c846e0936b58b 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// 

[mlir] [llvm] [clang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam edited 
https://github.com/llvm/llvm-project/pull/79039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [llvm] [clang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Saiyedul Islam via cfe-commits


@@ -99,6 +99,7 @@ class ROCDLDialectLLVMIRTranslationInterface
   if (!llvmFunc->hasFnAttribute("amdgpu-flat-work-group-size")) {
 llvmFunc->addFnAttr("amdgpu-flat-work-group-size", "1,256");
   }
+  llvmFunc->addFnAttr("amdgpu-implicitarg-num-bytes", "256");

saiislam wrote:

This PR depends on 79038 (the other review) and they both need to land 
together. When I created stacked PRs, it seems that Github brought both the 
commits together in this PR.

You may use [this 
](https://github.com/llvm/llvm-project/pull/79039/commits/a66ac33975381b1acdae0c177842d2f711ad9ab9)
 to see changes of this review only.

https://github.com/llvm/llvm-project/pull/79039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [mlir] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam created 
https://github.com/llvm/llvm-project/pull/79039

Depends on #79038 which makes cov5 as the default code
object version.

>From 4c156a11e943b85c1fe9f7f0ff5b651cf4d3946d Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH 1/2] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 12 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..d4b82b301f12e64 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed3..ae2c61439f4ca53 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d2..3cb6632fc0b63d3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10b..0c846e0936b58b1 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s 
\
+// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN: | FileCheck -check-prefix=COV5 %s
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip 
b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
 // OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind 
willreturn memory(none) "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" }
 // OPT: attributes #1 = { mustprogress nofree norecurse 

[llvm] [clang] [mlir] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-22 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam created 
https://github.com/llvm/llvm-project/pull/79038

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.

>From 4c156a11e943b85c1fe9f7f0ff5b651cf4d3946d Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Mon, 22 Jan 2024 13:11:22 -0600
Subject: [PATCH] [AMDGPU] Change default AMDHSA Code Object version to 5

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
---
 clang/include/clang/Driver/Options.td |  4 ++--
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../CodeGenCUDA/amdgpu-code-object-version.cu |  2 +-
 clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu   |  4 ++--
 clang/test/CodeGenHIP/default-attributes.hip  |  4 ++--
 clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl |  4 ++--
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl   | 10 +-
 llvm/docs/AMDGPUUsage.rst | 15 +++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp   |  2 +-
 .../Dialect/GPU/Transforms/SerializeToHsaco.cpp   |  2 +-
 .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp|  1 +
 mlir/test/Target/LLVMIR/rocdl.mlir|  2 +-
 12 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..d4b82b301f12e64 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4777,12 +4777,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
   Values<"none,4,5">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
   NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp 
b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index 0a808aa6cc75ed3..ae2c61439f4ca53 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,7 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 400
+// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 500
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu 
b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d2..3cb6632fc0b63d3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=4 -o - %s | FileCheck -check-prefix=V4 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu 
b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10b..0c846e0936b58b1 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s 
\
 // RUN: | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s 
\
+// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN: | FileCheck -check-prefix=COV5 %s
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip 
b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
 // OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind 
willreturn memory(none) 

[llvm] [clang] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-18 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam approved this pull request.

I tested with different combinations of compatible TargetIDs, the patch seems 
to work fine.
Thanks a lot for working on this.
LGTM!

https://github.com/llvm/llvm-project/pull/78359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-offload-bundler] Add support for -check-input-archive (PR #73709)

2023-11-28 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam approved this pull request.

Thanks, LGTM!

https://github.com/llvm/llvm-project/pull/73709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [mlir] [lld] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #73000)

2023-11-22 Thread Saiyedul Islam via cfe-commits

saiislam wrote:

> (patches like this should probably be broken up - test changes to the 
> defaults in lld and llvm for instance don't depend on the change to the clang 
> driver which is the only real semantic change in this patch, right? So 
> probably only change the semantics of clang, and the tests that need to 
> change there - then follow-up with separate commits that update other test 
> coverage to be closer to the current defaults)

Thanks for the suggestion! Based on @dwblaikie and @JonChesterfield comments, I 
have broken this PR into 7 commits. Each commit deals with a specific subset 
(clang, lld, mlir, llvm manual tests, llvm autogenerated tests, etc.).
Hopefully this will simplify reviewers' job.

I will squash and merge these 7 commits into a single commit at the time of 
merging.

https://github.com/llvm/llvm-project/pull/73000
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [mlir] [clang] [lld] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #73000)

2023-11-21 Thread Saiyedul Islam via cfe-commits


@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_5;

saiislam wrote:

Default was never `none`. It anyway gets overwritten with 
`MarshallingInfoEnum,"COV_5">` which was 
earlier COV4.

This was defined as COV_None in the commit [[NFC] Fix uninitalized member 
variable use in 
ASTReader::ParseTargetOptions()](https://github.com/llvm/llvm-project/commit/6f2a865d2f6bc426a61939a0a1acfcb25d5c1a18).

I changed it to cov_5 because it is the new default and leaving it as cov_none 
might create confusion.

https://github.com/llvm/llvm-project/pull/73000
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-08 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam closed 
https://github.com/llvm/llvm-project/pull/71234
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-08 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/71234

>From a6627248612fd2ab577b456a791e08164674efcc Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Fri, 3 Nov 2023 16:16:25 -0500
Subject: [PATCH 1/3] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL

Fixes the DeviceRTL compilation to ensure it is ABI agnostic.
Uses already available global variable "oclc_ABI_version" instead
of "llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
---
 clang/include/clang/Basic/TargetOptions.h |  2 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  6 +-
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  5 +-
 clang/test/CodeGen/amdgpu-abi-version.c   |  4 +-
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../amdgpu-code-object-version-linking.cu | 16 +++---
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |  6 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp| 57 +++
 .../amdgpu/utils/UtilitiesRTL.h   |  4 +-
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/TargetOptions.h 
b/clang/include/clang/Basic/TargetOptions.h
index ba3acd029587160..7497e580d27338d 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;
 
   /// \brief Enumeration values for AMDGPU printf lowering scheme
   enum class AMDGPUPrintfKind {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index e7e498e8a933131..d49c44dbaace3a8 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17432,11 +17432,11 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
 ///and use its value for COV_4 or COV_5 approach. It is used for
 ///compiling device libraries in an ABI-agnostic way.
 ///
-/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
 ///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
   llvm::LoadInst *LD;
@@ -17444,7 +17444,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
   if (Cov == clang::TargetOptions::COV_None) {
-StringRef Name = "llvm.amdgcn.abi.version";
+StringRef Name = "__oclc_ABI_version";
 auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
 if (!ABIVersionC)
   ABIVersionC = new llvm::GlobalVariable(
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0411846cf9b02bd..d793d27e0db8b80 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -362,11 +362,14 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 /// AMDGPU ROCm device libraries.
 void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
 CodeGen::CodeGenModule ) const {
-  StringRef Name = "llvm.amdgcn.abi.version";
+  StringRef Name = "__oclc_ABI_version";
   llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
   if (OriginalGV && 
!llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
 return;
 
+  if(CGM.getTarget().getTargetOpts().CodeObjectVersion == 
clang::TargetOptions::COV_None)
+return;
+
   auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
   llvm::Constant *COV = llvm::ConstantInt::get(
   Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c 
b/clang/test/CodeGen/amdgpu-abi-version.c
index d1189545139e2a6..4e5ad87655f2305 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -2,14 +2,14 @@
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm 
-mcode-object-version=none %s -o - | FileCheck %s
 
 //.
-// CHECK: @llvm.amdgcn.abi.version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 0
+// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
 //.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[RETVAL]] to ptr
-// CHECK-NEXT:[[TMP0:%.*]] = load 

[clang] [openmp] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-08 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/71234

>From 91c64e83b3d8d405e71f8e3108483b88ee4758d8 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Fri, 3 Nov 2023 16:16:25 -0500
Subject: [PATCH 1/3] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL

Fixes the DeviceRTL compilation to ensure it is ABI agnostic.
Uses already available global variable "oclc_ABI_version" instead
of "llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
---
 clang/include/clang/Basic/TargetOptions.h |  2 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  6 +-
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  5 +-
 clang/test/CodeGen/amdgpu-abi-version.c   |  4 +-
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../amdgpu-code-object-version-linking.cu | 16 +++---
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |  6 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp| 57 +++
 .../amdgpu/utils/UtilitiesRTL.h   |  4 +-
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/TargetOptions.h 
b/clang/include/clang/Basic/TargetOptions.h
index ba3acd029587160..7497e580d27338d 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;
 
   /// \brief Enumeration values for AMDGPU printf lowering scheme
   enum class AMDGPUPrintfKind {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5ab81cc605819c3..44a8133ff61ce67 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17468,11 +17468,11 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
 ///and use its value for COV_4 or COV_5 approach. It is used for
 ///compiling device libraries in an ABI-agnostic way.
 ///
-/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
 ///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
   llvm::LoadInst *LD;
@@ -17480,7 +17480,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
   if (Cov == clang::TargetOptions::COV_None) {
-StringRef Name = "llvm.amdgcn.abi.version";
+StringRef Name = "__oclc_ABI_version";
 auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
 if (!ABIVersionC)
   ABIVersionC = new llvm::GlobalVariable(
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0411846cf9b02bd..d793d27e0db8b80 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -362,11 +362,14 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 /// AMDGPU ROCm device libraries.
 void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
 CodeGen::CodeGenModule ) const {
-  StringRef Name = "llvm.amdgcn.abi.version";
+  StringRef Name = "__oclc_ABI_version";
   llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
   if (OriginalGV && 
!llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
 return;
 
+  if(CGM.getTarget().getTargetOpts().CodeObjectVersion == 
clang::TargetOptions::COV_None)
+return;
+
   auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
   llvm::Constant *COV = llvm::ConstantInt::get(
   Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c 
b/clang/test/CodeGen/amdgpu-abi-version.c
index d1189545139e2a6..4e5ad87655f2305 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -2,14 +2,14 @@
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm 
-mcode-object-version=none %s -o - | FileCheck %s
 
 //.
-// CHECK: @llvm.amdgcn.abi.version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 0
+// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
 //.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[RETVAL]] to ptr
-// CHECK-NEXT:[[TMP0:%.*]] = load 

[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-06 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/71234

>From 36976c1a97518c9cdf080d80b5fab2b16837b055 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Fri, 3 Nov 2023 16:16:25 -0500
Subject: [PATCH 1/3] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL

Fixes the DeviceRTL compilation to ensure it is ABI agnostic.
Uses already available global variable "oclc_ABI_version" instead
of "llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
---
 clang/include/clang/Basic/TargetOptions.h |  2 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  6 +-
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  5 +-
 clang/test/CodeGen/amdgpu-abi-version.c   |  4 +-
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../amdgpu-code-object-version-linking.cu | 16 +++---
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |  6 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp| 57 +++
 .../amdgpu/utils/UtilitiesRTL.h   |  4 +-
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/TargetOptions.h 
b/clang/include/clang/Basic/TargetOptions.h
index 8bb03249b7f8308..f9c03b61a2a827c 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;
 
   /// \brief Enumeration values for AMDGPU printf lowering scheme
   enum class AMDGPUPrintfKind {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 972aa1c708e5f65..8a9d83fd29e2b77 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17468,11 +17468,11 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
 ///and use its value for COV_4 or COV_5 approach. It is used for
 ///compiling device libraries in an ABI-agnostic way.
 ///
-/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
 ///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
   llvm::LoadInst *LD;
@@ -17480,7 +17480,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
   if (Cov == clang::TargetOptions::COV_None) {
-StringRef Name = "llvm.amdgcn.abi.version";
+StringRef Name = "__oclc_ABI_version";
 auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
 if (!ABIVersionC)
   ABIVersionC = new llvm::GlobalVariable(
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0411846cf9b02bd..d793d27e0db8b80 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -362,11 +362,14 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 /// AMDGPU ROCm device libraries.
 void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
 CodeGen::CodeGenModule ) const {
-  StringRef Name = "llvm.amdgcn.abi.version";
+  StringRef Name = "__oclc_ABI_version";
   llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
   if (OriginalGV && 
!llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
 return;
 
+  if(CGM.getTarget().getTargetOpts().CodeObjectVersion == 
clang::TargetOptions::COV_None)
+return;
+
   auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
   llvm::Constant *COV = llvm::ConstantInt::get(
   Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c 
b/clang/test/CodeGen/amdgpu-abi-version.c
index d1189545139e2a6..4e5ad87655f2305 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -2,14 +2,14 @@
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm 
-mcode-object-version=none %s -o - | FileCheck %s
 
 //.
-// CHECK: @llvm.amdgcn.abi.version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 0
+// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
 //.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[RETVAL]] to ptr
-// CHECK-NEXT:[[TMP0:%.*]] = load 

[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-06 Thread Saiyedul Islam via cfe-commits


@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;

saiislam wrote:

Apologies. This change is not relevant to this PR. Removed it.

https://github.com/llvm/llvm-project/pull/71234
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-06 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/71234

>From 36976c1a97518c9cdf080d80b5fab2b16837b055 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Fri, 3 Nov 2023 16:16:25 -0500
Subject: [PATCH 1/3] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL

Fixes the DeviceRTL compilation to ensure it is ABI agnostic.
Uses already available global variable "oclc_ABI_version" instead
of "llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
---
 clang/include/clang/Basic/TargetOptions.h |  2 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  6 +-
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  5 +-
 clang/test/CodeGen/amdgpu-abi-version.c   |  4 +-
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../amdgpu-code-object-version-linking.cu | 16 +++---
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |  6 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp| 57 +++
 .../amdgpu/utils/UtilitiesRTL.h   |  4 +-
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/TargetOptions.h 
b/clang/include/clang/Basic/TargetOptions.h
index 8bb03249b7f8308..f9c03b61a2a827c 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;
 
   /// \brief Enumeration values for AMDGPU printf lowering scheme
   enum class AMDGPUPrintfKind {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 972aa1c708e5f65..8a9d83fd29e2b77 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17468,11 +17468,11 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
 ///and use its value for COV_4 or COV_5 approach. It is used for
 ///compiling device libraries in an ABI-agnostic way.
 ///
-/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
 ///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
   llvm::LoadInst *LD;
@@ -17480,7 +17480,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
   if (Cov == clang::TargetOptions::COV_None) {
-StringRef Name = "llvm.amdgcn.abi.version";
+StringRef Name = "__oclc_ABI_version";
 auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
 if (!ABIVersionC)
   ABIVersionC = new llvm::GlobalVariable(
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0411846cf9b02bd..d793d27e0db8b80 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -362,11 +362,14 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 /// AMDGPU ROCm device libraries.
 void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
 CodeGen::CodeGenModule ) const {
-  StringRef Name = "llvm.amdgcn.abi.version";
+  StringRef Name = "__oclc_ABI_version";
   llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
   if (OriginalGV && 
!llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
 return;
 
+  if(CGM.getTarget().getTargetOpts().CodeObjectVersion == 
clang::TargetOptions::COV_None)
+return;
+
   auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
   llvm::Constant *COV = llvm::ConstantInt::get(
   Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c 
b/clang/test/CodeGen/amdgpu-abi-version.c
index d1189545139e2a6..4e5ad87655f2305 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -2,14 +2,14 @@
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm 
-mcode-object-version=none %s -o - | FileCheck %s
 
 //.
-// CHECK: @llvm.amdgcn.abi.version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 0
+// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
 //.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[RETVAL]] to ptr
-// CHECK-NEXT:[[TMP0:%.*]] = load 

[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-03 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/71234

>From 36976c1a97518c9cdf080d80b5fab2b16837b055 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Fri, 3 Nov 2023 16:16:25 -0500
Subject: [PATCH 1/2] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL

Fixes the DeviceRTL compilation to ensure it is ABI agnostic.
Uses already available global variable "oclc_ABI_version" instead
of "llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
---
 clang/include/clang/Basic/TargetOptions.h |  2 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  6 +-
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  5 +-
 clang/test/CodeGen/amdgpu-abi-version.c   |  4 +-
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../amdgpu-code-object-version-linking.cu | 16 +++---
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |  6 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp| 57 +++
 .../amdgpu/utils/UtilitiesRTL.h   |  4 +-
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/TargetOptions.h 
b/clang/include/clang/Basic/TargetOptions.h
index 8bb03249b7f8308..f9c03b61a2a827c 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;
 
   /// \brief Enumeration values for AMDGPU printf lowering scheme
   enum class AMDGPUPrintfKind {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 972aa1c708e5f65..8a9d83fd29e2b77 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17468,11 +17468,11 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
 ///and use its value for COV_4 or COV_5 approach. It is used for
 ///compiling device libraries in an ABI-agnostic way.
 ///
-/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
 ///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
   llvm::LoadInst *LD;
@@ -17480,7 +17480,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
   if (Cov == clang::TargetOptions::COV_None) {
-StringRef Name = "llvm.amdgcn.abi.version";
+StringRef Name = "__oclc_ABI_version";
 auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
 if (!ABIVersionC)
   ABIVersionC = new llvm::GlobalVariable(
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0411846cf9b02bd..d793d27e0db8b80 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -362,11 +362,14 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 /// AMDGPU ROCm device libraries.
 void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
 CodeGen::CodeGenModule ) const {
-  StringRef Name = "llvm.amdgcn.abi.version";
+  StringRef Name = "__oclc_ABI_version";
   llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
   if (OriginalGV && 
!llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
 return;
 
+  if(CGM.getTarget().getTargetOpts().CodeObjectVersion == 
clang::TargetOptions::COV_None)
+return;
+
   auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
   llvm::Constant *COV = llvm::ConstantInt::get(
   Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c 
b/clang/test/CodeGen/amdgpu-abi-version.c
index d1189545139e2a6..4e5ad87655f2305 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -2,14 +2,14 @@
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm 
-mcode-object-version=none %s -o - | FileCheck %s
 
 //.
-// CHECK: @llvm.amdgcn.abi.version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 0
+// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
 //.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[RETVAL]] to ptr
-// CHECK-NEXT:[[TMP0:%.*]] = load 

[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-03 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam updated 
https://github.com/llvm/llvm-project/pull/71234

>From 36976c1a97518c9cdf080d80b5fab2b16837b055 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Fri, 3 Nov 2023 16:16:25 -0500
Subject: [PATCH 1/2] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL

Fixes the DeviceRTL compilation to ensure it is ABI agnostic.
Uses already available global variable "oclc_ABI_version" instead
of "llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
---
 clang/include/clang/Basic/TargetOptions.h |  2 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  6 +-
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  5 +-
 clang/test/CodeGen/amdgpu-abi-version.c   |  4 +-
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../amdgpu-code-object-version-linking.cu | 16 +++---
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |  6 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp| 57 +++
 .../amdgpu/utils/UtilitiesRTL.h   |  4 +-
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/TargetOptions.h 
b/clang/include/clang/Basic/TargetOptions.h
index 8bb03249b7f8308..f9c03b61a2a827c 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;
 
   /// \brief Enumeration values for AMDGPU printf lowering scheme
   enum class AMDGPUPrintfKind {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 972aa1c708e5f65..8a9d83fd29e2b77 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17468,11 +17468,11 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
 ///and use its value for COV_4 or COV_5 approach. It is used for
 ///compiling device libraries in an ABI-agnostic way.
 ///
-/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
 ///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
   llvm::LoadInst *LD;
@@ -17480,7 +17480,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
   if (Cov == clang::TargetOptions::COV_None) {
-StringRef Name = "llvm.amdgcn.abi.version";
+StringRef Name = "__oclc_ABI_version";
 auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
 if (!ABIVersionC)
   ABIVersionC = new llvm::GlobalVariable(
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0411846cf9b02bd..d793d27e0db8b80 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -362,11 +362,14 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 /// AMDGPU ROCm device libraries.
 void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
 CodeGen::CodeGenModule ) const {
-  StringRef Name = "llvm.amdgcn.abi.version";
+  StringRef Name = "__oclc_ABI_version";
   llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
   if (OriginalGV && 
!llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
 return;
 
+  if(CGM.getTarget().getTargetOpts().CodeObjectVersion == 
clang::TargetOptions::COV_None)
+return;
+
   auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
   llvm::Constant *COV = llvm::ConstantInt::get(
   Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c 
b/clang/test/CodeGen/amdgpu-abi-version.c
index d1189545139e2a6..4e5ad87655f2305 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -2,14 +2,14 @@
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm 
-mcode-object-version=none %s -o - | FileCheck %s
 
 //.
-// CHECK: @llvm.amdgcn.abi.version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 0
+// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
 //.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:[[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) 
[[RETVAL]] to ptr
-// CHECK-NEXT:[[TMP0:%.*]] = load 

[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-03 Thread Saiyedul Islam via cfe-commits


@@ -3086,10 +3139,14 @@ Error AMDGPUKernelTy::launchImpl(GenericDeviceTy 
,
   // Only COV5 implicitargs needs to be set. COV4 implicitargs are not used.
   if (getImplicitArgsSize() == sizeof(utils::AMDGPUImplicitArgsTy)) {
 ImplArgs->BlockCountX = NumBlocks;
+ImplArgs->BlockCountY = 1;
+ImplArgs->BlockCountZ = 1;
 ImplArgs->GroupSizeX = NumThreads;
 ImplArgs->GroupSizeY = 1;
 ImplArgs->GroupSizeZ = 1;
 ImplArgs->GridDims = 1;
+ImplArgs->HeapV1Ptr =

saiislam wrote:

Agreed. Removing the field.

https://github.com/llvm/llvm-project/pull/71234
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-03 Thread Saiyedul Islam via cfe-commits


@@ -17468,19 +17468,19 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"

saiislam wrote:

`oclc_ABI_version` already exist in the upstream. And, it contains the exact 
same information as `llvm.amdgcn.abi.version` was supposed to store. So, I 
removed the latter with the former.

This patch is ensuring that DeviceRTL is indeed ABI agnostic.

I agree with your suggestion that it would be great to not deal with these 
things in the driver.

https://github.com/llvm/llvm-project/pull/71234
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-03 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam created 
https://github.com/llvm/llvm-project/pull/71234

Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses already 
available global variable "oclc_ABI_version" instead of 
"llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.

>From 36976c1a97518c9cdf080d80b5fab2b16837b055 Mon Sep 17 00:00:00 2001
From: Saiyedul Islam 
Date: Fri, 3 Nov 2023 16:16:25 -0500
Subject: [PATCH] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL

Fixes the DeviceRTL compilation to ensure it is ABI agnostic.
Uses already available global variable "oclc_ABI_version" instead
of "llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
---
 clang/include/clang/Basic/TargetOptions.h |  2 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |  6 +-
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  5 +-
 clang/test/CodeGen/amdgpu-abi-version.c   |  4 +-
 clang/test/CodeGen/amdgpu-address-spaces.cpp  |  2 +-
 .../amdgpu-code-object-version-linking.cu | 16 +++---
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |  6 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp| 57 +++
 .../amdgpu/utils/UtilitiesRTL.h   |  4 +-
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/clang/include/clang/Basic/TargetOptions.h 
b/clang/include/clang/Basic/TargetOptions.h
index 8bb03249b7f8308..f9c03b61a2a827c 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -88,7 +88,7 @@ class TargetOptions {
 COV_5 = 500,
   };
   /// \brief Code object version for AMDGPU.
-  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;
+  CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_4;
 
   /// \brief Enumeration values for AMDGPU printf lowering scheme
   enum class AMDGPUPrintfKind {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 972aa1c708e5f65..8a9d83fd29e2b77 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17468,11 +17468,11 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
 /// COV_5: Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
 ///and use its value for COV_4 or COV_5 approach. It is used for
 ///compiling device libraries in an ABI-agnostic way.
 ///
-/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
 ///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
   llvm::LoadInst *LD;
@@ -17480,7 +17480,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
   if (Cov == clang::TargetOptions::COV_None) {
-StringRef Name = "llvm.amdgcn.abi.version";
+StringRef Name = "__oclc_ABI_version";
 auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
 if (!ABIVersionC)
   ABIVersionC = new llvm::GlobalVariable(
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0411846cf9b02bd..d793d27e0db8b80 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -362,11 +362,14 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
 /// AMDGPU ROCm device libraries.
 void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
 CodeGen::CodeGenModule ) const {
-  StringRef Name = "llvm.amdgcn.abi.version";
+  StringRef Name = "__oclc_ABI_version";
   llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
   if (OriginalGV && 
!llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
 return;
 
+  if(CGM.getTarget().getTargetOpts().CodeObjectVersion == 
clang::TargetOptions::COV_None)
+return;
+
   auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
   llvm::Constant *COV = llvm::ConstantInt::get(
   Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c 
b/clang/test/CodeGen/amdgpu-abi-version.c
index d1189545139e2a6..4e5ad87655f2305 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -2,14 +2,14 @@
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm 
-mcode-object-version=none %s -o - | FileCheck %s
 
 //.
-// CHECK: @llvm.amdgcn.abi.version = weak_odr hidden local_unnamed_addr 
addrspace(4) constant i32 0
+// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
 //.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // 

[clang] [Clang][OpenMP] Check if value is contained in array, not if it's contained in the first element (PR #69462)

2023-10-19 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam approved this pull request.

LGTM, thanks!

https://github.com/llvm/llvm-project/pull/69462
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][HIP] Remove 'clangPostLink' from SDL handling (PR #67366)

2023-09-26 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam approved this pull request.

LGTM, thanks!

https://github.com/llvm/llvm-project/pull/67366
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Make default AMDHSA Code Object Version to be 5 (PR #65410)

2023-09-12 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam closed 
https://github.com/llvm/llvm-project/pull/65410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Make default AMDHSA Code Object Version to be 5 (PR #65410)

2023-09-05 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam review_requested 
https://github.com/llvm/llvm-project/pull/65410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Make default AMDHSA Code Object Version to be 5 (PR #65410)

2023-09-05 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam review_requested 
https://github.com/llvm/llvm-project/pull/65410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Make default AMDHSA Code Object Version to be 5 (PR #65410)

2023-09-05 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam review_requested 
https://github.com/llvm/llvm-project/pull/65410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Make default AMDHSA Code Object Version to be 5 (PR #65410)

2023-09-05 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam review_requested 
https://github.com/llvm/llvm-project/pull/65410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Make default AMDHSA Code Object Version to be 5 (PR #65410)

2023-09-05 Thread Saiyedul Islam via cfe-commits

https://github.com/saiislam labeled 
https://github.com/llvm/llvm-project/pull/65410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] f616c3e - [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-29 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2023-08-29T06:35:44-05:00
New Revision: f616c3eeb43f3732f53f81d291723a6a34af2de1

URL: 
https://github.com/llvm/llvm-project/commit/f616c3eeb43f3732f53f81d291723a6a34af2de1
DIFF: 
https://github.com/llvm/llvm-project/commit/f616c3eeb43f3732f53f81d291723a6a34af2de1.diff

LOG: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.

CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.

Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.

AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.

Differential Revision: https://reviews.llvm.org/D139730

Reviewed By: jhuber6, yaxunl

Added: 
clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu

Modified: 
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/CodeGen/CodeGenModule.cpp
clang/lib/CodeGen/CodeGenModule.h
clang/lib/CodeGen/TargetInfo.h
clang/lib/CodeGen/Targets/AMDGPU.cpp
clang/lib/Driver/ToolChain.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
clang/test/CodeGenOpenCL/opencl_types.cl
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
openmp/libomptarget/DeviceRTL/CMakeLists.txt
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h

Removed: 




diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 070246f099e2e9..a513eae46e358e 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -27,6 +27,7 @@
 #include "clang/AST/OSLog.h"
 #include "clang/Basic/TargetBuiltins.h"
 #include "clang/Basic/TargetInfo.h"
+#include "clang/Basic/TargetOptions.h"
 #include "clang/CodeGen/CGFunctionInfo.h"
 #include "clang/Frontend/FrontendDiagnostic.h"
 #include "llvm/ADT/APFloat.h"
@@ -17098,24 +17099,61 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction ) 
{
 }
 
 // \p Index is 0, 1, and 2 for x, y, and z dimension, respectively.
+/// Emit code based on Code Object ABI version.
+/// COV_4: Emit code to use dispatch ptr
+/// COV_5: Emit code to use implicitarg ptr
+/// COV_NONE : Emit code to load a global variable "llvm.amdgcn.abi.version"
+///and use its value for COV_4 or COV_5 approach. It is used for
+///compiling device libraries in an ABI-agnostic way.
+///
+/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+///   clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , unsigned Index) {
-  bool IsCOV_5 = CGF.getTarget().getTargetOpts().CodeObjectVersion ==
- clang::TargetOptions::COV_5;
-  Constant *Offset;
-  Value *DP;
-  if (IsCOV_5) {
+  llvm::LoadInst *LD;
+
+  auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
+
+  if (Cov == clang::TargetOptions::COV_None) {
+auto *ABIVersionC = CGF.CGM.GetOrCreateLLVMGlobal(
+"llvm.amdgcn.abi.version", CGF.Int32Ty, LangAS::Default, nullptr,
+CodeGen::NotForDefinition);
+
+// This load will be eliminated by the IPSCCP because it is constant
+// weak_odr without externally_initialized. Either changing it to weak or
+// adding externally_initialized will keep the load.
+Value *ABIVersion = CGF.Builder.CreateAlignedLoad(CGF.Int32Ty, ABIVersionC,
+  CGF.CGM.getIntAlign());
+
+Value *IsCOV5 = CGF.Builder.CreateICmpSGE(
+ABIVersion,
+llvm::ConstantInt::get(CGF.Int32Ty, clang::TargetOptions::COV_5));
+
 // Indexing the implicit kernarg segment.
-Offset = llvm::ConstantInt::get(CGF.Int32Ty, 12 + Index * 2);
-DP = EmitAMDGPUImplicitArgPtr(CGF);
-  } else {
+Value *ImplicitGEP = CGF.Builder.CreateConstGEP1_32(
+CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
+
 // Indexing the HSA kernel_dispatch_packet struct.
-Offset = llvm::ConstantInt::get(CGF.Int32Ty, 4 + Index * 2);
-DP = EmitAMDGPUDispatchPtr(CGF);
+Value *DispatchGEP = CGF.Builder.CreateConstGEP1_32(
+CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
+
+auto Result = CGF.Builder.CreateSelect(IsCOV5, ImplicitGEP, DispatchGEP);
+LD = CGF.Builder.CreateLoad(
+Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+  } else {
+Value *GEP = nullptr;
+if (Cov == clang::TargetOptions::COV_5) {
+  // Indexing the 

[clang] 0cecc6e - [OpenMP] Add lit test for metadirective device arch inspired

2022-09-06 Thread Saiyedul Islam via cfe-commits

Author: Animesh Kumar
Date: 2022-09-06T07:10:15-05:00
New Revision: 0cecc6e8e27c9913cd2d82a77941dc3a6d11318f

URL: 
https://github.com/llvm/llvm-project/commit/0cecc6e8e27c9913cd2d82a77941dc3a6d11318f
DIFF: 
https://github.com/llvm/llvm-project/commit/0cecc6e8e27c9913cd2d82a77941dc3a6d11318f.diff

LOG: [OpenMP] Add lit test for metadirective device arch inspired
from sollve

This lit test is added based upon the tests present in the
tests/5.0/metadirective directory of the SOLLVE repo
https://github.com/SOLLVE/sollve_vv

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D131763

Added: 
clang/test/OpenMP/metadirective_device_arch_codegen.cpp

Modified: 
clang/test/OpenMP/metadirective_ast_print.c

Removed: 




diff  --git a/clang/test/OpenMP/metadirective_ast_print.c 
b/clang/test/OpenMP/metadirective_ast_print.c
index 6c75cb0592d6..ddd5b8633cc5 100644
--- a/clang/test/OpenMP/metadirective_ast_print.c
+++ b/clang/test/OpenMP/metadirective_ast_print.c
@@ -1,6 +1,10 @@
 // RUN: %clang_cc1 -verify -fopenmp -triple x86_64-unknown-linux-gnu -x c 
-std=c99 -ast-print %s -o - | FileCheck %s
 
 // RUN: %clang_cc1 -verify -fopenmp-simd -triple x86_64-unknown-linux-gnu -x c 
-std=c99 -ast-print %s -o - | FileCheck %s
+
+// RUN: %clang_cc1 -verify -fopenmp -triple amdgcn-amd-amdhsa -x c -std=c99 
-ast-print %s -o - | FileCheck %s --check-prefix=CHECK-AMDGCN
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -triple amdgcn-amd-amdhsa -x c 
-std=c99 -ast-print %s -o - | FileCheck %s --check-prefix=CHECK-AMDGCN
 // expected-no-diagnostics
 
 #ifndef HEADER
@@ -57,6 +61,12 @@ void foo(void) {
 for (int j = 0; j < 16; j++)
   array[i] = i;
   }
+
+#pragma omp metadirective when(device={arch("amdgcn")}: \
+teams distribute parallel for)\
+default(parallel for)
+  for (int i = 0; i < 100; i++)
+  ;
 }
 
 // CHECK: void bar(void);
@@ -83,5 +93,7 @@ void foo(void) {
 // CHECK-NEXT: for (int i = 0; i < 16; i++) {
 // CHECK-NEXT: #pragma omp simd
 // CHECK-NEXT: for (int j = 0; j < 16; j++)
+// CHECK-AMDGCN: #pragma omp teams distribute parallel for
+// CHECK-AMDGCN-NEXT: for (int i = 0; i < 100; i++)
 
 #endif

diff  --git a/clang/test/OpenMP/metadirective_device_arch_codegen.cpp 
b/clang/test/OpenMP/metadirective_device_arch_codegen.cpp
new file mode 100644
index ..eac71d0e5b5a
--- /dev/null
+++ b/clang/test/OpenMP/metadirective_device_arch_codegen.cpp
@@ -0,0 +1,65 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -w -std=c++11 -triple 
x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o 
%t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -w -std=c++11 -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -target-cpu gfx906 -o - | FileCheck %s
+// expected-no-diagnostics
+
+
+/*===---===
 
+
+Inspired from SOLLVE tests:
+ - 5.0/metadirective/test_metadirective_arch_is_nvidia.c
+
+
+======*/
+
+
+#define N 1024
+
+int metadirective1() {
+   
+   int v1[N], v2[N], v3[N];
+
+   int target_device_num, host_device_num, default_device;
+   int errors = 0;
+
+   #pragma omp target map(to:v1,v2) map(from:v3, target_device_num) 
device(default_device)
+   {
+  #pragma omp metadirective \
+   when(device={arch("amdgcn")}: teams distribute parallel 
for) \
+   default(parallel for)
+
+ for (int i = 0; i < N; i++) {
+   #pragma omp atomic write
+v3[i] = v1[i] * v2[i];
+ }
+   }
+
+   return errors;
+}
+
+// CHECK-LABEL: define weak_odr amdgpu_kernel void {{.+}}metadirective1
+// CHECK: entry:
+// CHECK: %{{[0-9]}} = call i32 @__kmpc_target_init
+// CHECK: user_code.entry:
+// CHECK: call void @__omp_outlined__
+// CHECK-NOT: call void @__kmpc_parallel_51
+// CHECK: ret void
+
+
+// CHECK-LABEL: define internal void @__omp_outlined__
+// CHECK: entry:
+// CHECK: call void @__kmpc_distribute_static_init
+// CHECK: omp.loop.exit:  
+// CHECK: call void @__kmpc_distribute_static_fini
+
+
+// CHECK-LABEL: define internal void @__omp_outlined__.{{[0-9]+}}
+// CHECK: entry:
+// CHECK: call void @__kmpc_for_static_init_4
+// CHECK: omp.inner.for.body:
+// CHECK: store atomic {{.*}} monotonic
+// CHECK: omp.loop.exit:
+// CHECK-NEXT: call void @__kmpc_distribute_static_fini
+// CHECK-NEXT: ret void
+



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 0f6cbde - [clang-offload-bundler] fix "no output file" issue with -outputs

2022-04-08 Thread Saiyedul Islam via cfe-commits

Author: Siu Chi Chan
Date: 2022-04-08T17:11:27Z
New Revision: 0f6cbdee576160a3b40139bf66b864ce05a1e28f

URL: 
https://github.com/llvm/llvm-project/commit/0f6cbdee576160a3b40139bf66b864ce05a1e28f
DIFF: 
https://github.com/llvm/llvm-project/commit/0f6cbdee576160a3b40139bf66b864ce05a1e28f.diff

LOG: [clang-offload-bundler] fix "no output file" issue with -outputs

Fix backward compatibility issue due to D120662.

Change-Id: I7cd0f704aabbaac7dcf59fd4b73b4f0e0cdfa69f

Reviewed By: yaxunl, saiislam

Differential Revision: https://reviews.llvm.org/D123387

Added: 


Modified: 
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp 
b/clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
index 6da77cccafbfc..1792923dd0159 100644
--- a/clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
+++ b/clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
@@ -1450,7 +1450,7 @@ int main(int argc, const char **argv) {
 return 0;
   }
 
-  if (OutputFileNames.getNumOccurrences() == 0) {
+  if (OutputFileNames.size() == 0) {
 reportError(
 createStringError(errc::invalid_argument, "no output file 
specified!"));
   }



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 1e78d07 - [clang-offload-bundler] Fix typo in a test case

2022-03-02 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-03-02T13:33:56Z
New Revision: 1e78d07dc9cdb786eadb30dd4a4f9b2c5d8ba8eb

URL: 
https://github.com/llvm/llvm-project/commit/1e78d07dc9cdb786eadb30dd4a4f9b2c5d8ba8eb
DIFF: 
https://github.com/llvm/llvm-project/commit/1e78d07dc9cdb786eadb30dd4a4f9b2c5d8ba8eb.diff

LOG: [clang-offload-bundler] Fix typo in a test case

Intermediate file of one of the test was getting overwritten due
to name clash.

Added: 


Modified: 
clang/test/Driver/clang-offload-bundler.c

Removed: 




diff  --git a/clang/test/Driver/clang-offload-bundler.c 
b/clang/test/Driver/clang-offload-bundler.c
index 3fde2233fe72..cb4092a8eaef 100644
--- a/clang/test/Driver/clang-offload-bundler.c
+++ b/clang/test/Driver/clang-offload-bundler.c
@@ -406,8 +406,8 @@
 // Create few code object bundles and archive them to create an input archive
 // RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx906,openmp-amdgcn-amd-amdhsa--gfx908
 -inputs=%t.o,%t.tgt1,%t.tgt2 -outputs=%t.simple.bundle
 // RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa--gfx903 
-inputs=%t.o,%t.tgt1 -outputs=%t.simple1.bundle
-// RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,hip-amdgcn-amd-amdhsa--gfx906 
-inputs=%t.o,%t.tgt1 -outputs=%t.simple1.bundle
-// RUN: llvm-ar cr %t.input-archive.a %t.simple.bundle %t.simple1.bundle
+// RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,hip-amdgcn-amd-amdhsa--gfx906 
-inputs=%t.o,%t.tgt1 -outputs=%t.simple2.bundle
+// RUN: llvm-ar cr %t.input-archive.a %t.simple.bundle %t.simple1.bundle 
%t.simple2.bundle
 
 // RUN: clang-offload-bundler -unbundle -type=a 
-targets=openmp-amdgcn-amd-amdhsa-gfx906,openmp-amdgcn-amd-amdhsa-gfx908 
-inputs=%t.input-archive.a 
-outputs=%t-archive-gfx906-simple.a,%t-archive-gfx908-simple.a
 // RUN: llvm-ar t %t-archive-gfx906-simple.a | FileCheck %s 
-check-prefix=GFX906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 7a02abf - [clang-offload-bundler] HIP and OpenMP comaptibility for linking heterogeneous archive library

2022-03-01 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-03-02T07:55:06Z
New Revision: 7a02abf06ff94762d8cbce71d0249df25d64721b

URL: 
https://github.com/llvm/llvm-project/commit/7a02abf06ff94762d8cbce71d0249df25d64721b
DIFF: 
https://github.com/llvm/llvm-project/commit/7a02abf06ff94762d8cbce71d0249df25d64721b.diff

LOG: [clang-offload-bundler] HIP and OpenMP comaptibility for linking 
heterogeneous archive library

`hip-openmp-compatible` flag treats hip and hipv4 offload kinds
as compatible with openmp offload kind while extracting code objects
from a heterogenous archive library. Vice versa is also considered
compatible if hip code was compiled with -fgpu-rdc.

This flag only relaxes compatibility criteria on `OffloadKind`,
rest of the components like `Triple` and `GPUArhc` still needs to
be compatible.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D120697

Added: 


Modified: 
clang/lib/Driver/ToolChains/CommonArgs.cpp
clang/test/Driver/clang-offload-bundler-asserts-on.c
clang/test/Driver/clang-offload-bundler.c
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index f0c32cd02f55a..5330071ec4641 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1777,6 +1777,12 @@ bool tools::GetSDLFromOffloadArchive(
 std::string AdditionalArgs("-allow-missing-bundles");
 UBArgs.push_back(C.getArgs().MakeArgString(AdditionalArgs));
 
+// Add this flag to treat hip and hipv4 offload kinds as compatible with
+// openmp offload kind while extracting code objects from a heterogenous
+// archive library. Vice versa is also considered compatible.
+std::string HipCompatibleArgs("-hip-openmp-compatible");
+UBArgs.push_back(C.getArgs().MakeArgString(HipCompatibleArgs));
+
 C.addCommand(std::make_unique(
 JA, T, ResponseFileSupport::AtFileCurCP(), UBProgram, UBArgs, Inputs,
 InputInfo(, C.getArgs().MakeArgString(OutputLib;

diff  --git a/clang/test/Driver/clang-offload-bundler-asserts-on.c 
b/clang/test/Driver/clang-offload-bundler-asserts-on.c
index 7622998c9c182..f97171dbd169c 100644
--- a/clang/test/Driver/clang-offload-bundler-asserts-on.c
+++ b/clang/test/Driver/clang-offload-bundler-asserts-on.c
@@ -24,6 +24,10 @@
 // BUNDLECOMPATIBILITY: Compatible: Exact match:[CodeObject: 
openmp-amdgcn-amd-amdhsa-gfx906]   :   [Target: 
openmp-amdgcn-amd-amdhsa--gfx906]
 // BUNDLECOMPATIBILITY: Compatible: Exact match:[CodeObject: 
openmp-amdgcn-amd-amdhsa--gfx908]  :   [Target: 
openmp-amdgcn-amd-amdhsa-gfx908]
 
+// RUN: clang-offload-bundler -unbundle -type=a 
-targets=hip-amdgcn-amd-amdhsa--gfx906,hipv4-amdgcn-amd-amdhsa-gfx908 
-inputs=%t.input-archive.a 
-outputs=%t-hip-archive-gfx906-simple.a,%t-hipv4-archive-gfx908-simple.a 
-hip-openmp-compatible -debug-only=CodeObjectCompatibility 2>&1 | FileCheck %s 
-check-prefix=HIPOpenMPCOMPATIBILITY
+// HIPOpenMPCOMPATIBILITY: Compatible: Code Objects are compatible
[CodeObject: openmp-amdgcn-amd-amdhsa-gfx906]   :   [Target: 
hip-amdgcn-amd-amdhsa--gfx906]
+// HIPOpenMPCOMPATIBILITY: Compatible: Code Objects are compatible
[CodeObject: openmp-amdgcn-amd-amdhsa--gfx908]  :   [Target: 
hipv4-amdgcn-amd-amdhsa-gfx908]
+
 // Some code so that we can create a binary out of this file.
 int A = 0;
 void test_func(void) {

diff  --git a/clang/test/Driver/clang-offload-bundler.c 
b/clang/test/Driver/clang-offload-bundler.c
index eab4dbc7e3be0..3fde2233fe720 100644
--- a/clang/test/Driver/clang-offload-bundler.c
+++ b/clang/test/Driver/clang-offload-bundler.c
@@ -406,6 +406,7 @@
 // Create few code object bundles and archive them to create an input archive
 // RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx906,openmp-amdgcn-amd-amdhsa--gfx908
 -inputs=%t.o,%t.tgt1,%t.tgt2 -outputs=%t.simple.bundle
 // RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa--gfx903 
-inputs=%t.o,%t.tgt1 -outputs=%t.simple1.bundle
+// RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,hip-amdgcn-amd-amdhsa--gfx906 
-inputs=%t.o,%t.tgt1 -outputs=%t.simple1.bundle
 // RUN: llvm-ar cr %t.input-archive.a %t.simple.bundle %t.simple1.bundle
 
 // RUN: clang-offload-bundler -unbundle -type=a 
-targets=openmp-amdgcn-amd-amdhsa-gfx906,openmp-amdgcn-amd-amdhsa-gfx908 
-inputs=%t.input-archive.a 
-outputs=%t-archive-gfx906-simple.a,%t-archive-gfx908-simple.a
@@ -423,6 +424,19 @@
 // RUN: cat %t-archive-gfx803-empty.a | FileCheck %s -check-prefix=EMPTYARCHIVE
 // EMPTYARCHIVE: !
 
+// Check compatibility of OpenMP code objects found in the heterogeneous 
archive library with HIP code objects of the target
+// RUN: clang-offload-bundler 

[clang] 4db88a5 - [OpenMP][Clang] Move partial support of reverse offload to a future version

2022-02-08 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-02-09T07:11:41Z
New Revision: 4db88a54b6d4bd38fe38dbe57ec2a156ff3c144e

URL: 
https://github.com/llvm/llvm-project/commit/4db88a54b6d4bd38fe38dbe57ec2a156ff3c144e
DIFF: 
https://github.com/llvm/llvm-project/commit/4db88a54b6d4bd38fe38dbe57ec2a156ff3c144e.diff

LOG: [OpenMP][Clang] Move partial support of reverse offload to a future version

OpenMP Spec 5.2 requires unimplemented requires clauses to produce
compile time error termination. Moving current partial support of
reverse_offload to a distant future version 9.9 so that existing
code can be tested and maintained until a complete implementation
is available.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D119256

Added: 


Modified: 
clang/test/OpenMP/requires_ast_print.cpp
clang/test/OpenMP/requires_messages.cpp
clang/test/OpenMP/requires_target_messages.cpp
clang/test/OpenMP/target_ast_print.cpp
clang/test/OpenMP/target_device_codegen.cpp
llvm/include/llvm/Frontend/OpenMP/OMP.td

Removed: 




diff  --git a/clang/test/OpenMP/requires_ast_print.cpp 
b/clang/test/OpenMP/requires_ast_print.cpp
index e884c71c86635..8343608070c18 100644
--- a/clang/test/OpenMP/requires_ast_print.cpp
+++ b/clang/test/OpenMP/requires_ast_print.cpp
@@ -5,6 +5,14 @@
 // RUN: %clang_cc1 -verify -fopenmp-simd -ast-print %s | FileCheck %s
 // RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -std=c++11 -include-pch %t -fsyntax-only 
-verify %s -ast-print | FileCheck %s
+
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=99 -DOMP99 -ast-print %s 
| FileCheck --check-prefixes=CHECK,REV %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=99 -DOMP99 -x c++ -std=c++11 
-emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=99 -DOMP99 -std=c++11 
-include-pch %t -fsyntax-only -verify %s -ast-print | FileCheck 
--check-prefixes=CHECK,REV %s
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=99 -DOMP99 
-ast-print %s | FileCheck --check-prefixes=CHECK,REV %s
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=99 -DOMP99 -x c++ -std=c++11 
-emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=99 -DOMP99 -std=c++11 
-include-pch %t -fsyntax-only -verify %s -ast-print | FileCheck 
--check-prefixes=CHECK,REV %s
 // expected-no-diagnostics
 
 #ifndef HEADER
@@ -16,8 +24,10 @@
 #pragma omp requires unified_shared_memory
 // CHECK:#pragma omp requires unified_shared_memory
 
+#ifdef OMP99
 #pragma omp requires reverse_offload
-// CHECK:#pragma omp requires reverse_offload
+// REV:#pragma omp requires reverse_offload
+#endif
 
 #pragma omp requires dynamic_allocators
 // CHECK:#pragma omp requires dynamic_allocators

diff  --git a/clang/test/OpenMP/requires_messages.cpp 
b/clang/test/OpenMP/requires_messages.cpp
index 72a6c5022975e..10d311631b100 100644
--- a/clang/test/OpenMP/requires_messages.cpp
+++ b/clang/test/OpenMP/requires_messages.cpp
@@ -1,9 +1,10 @@
 // RUN: %clang_cc1 -verify -fopenmp -ferror-limit 100  %s -Wuninitialized
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=99 -DOMP99 
-verify=expected,rev -ferror-limit 100  %s -Wuninitialized
 
 int a;
-#pragma omp requires unified_address allocate(a) // expected-note 
{{unified_address clause previously used here}} expected-note {{unified_address 
clause previously used here}} expected-note {{unified_address clause previously 
used here}} expected-note {{unified_address clause previously used here}} 
expected-note {{unified_address clause previously used here}} 
expected-note{{unified_address clause previously used here}} expected-error 
{{unexpected OpenMP clause 'allocate' in directive '#pragma omp requires'}}
+#pragma omp requires unified_address allocate(a) // rev-note {{unified_address 
clause previously used here}} expected-note {{unified_address clause previously 
used here}} expected-note {{unified_address clause previously used here}} 
expected-note {{unified_address clause previously used here}} expected-note 
{{unified_address clause previously used here}} expected-note{{unified_address 
clause previously used here}} expected-error {{unexpected OpenMP clause 
'allocate' in directive '#pragma omp requires'}}
 
-#pragma omp requires unified_shared_memory // expected-note 
{{unified_shared_memory clause previously used here}} 
expected-note{{unified_shared_memory clause previously used here}}
+#pragma omp requires unified_shared_memory // rev-note {{unified_shared_memory 
clause previously used here}} expected-note{{unified_shared_memory clause 
previously used here}}
 
 #pragma omp requires unified_shared_memory, unified_shared_memory // 
expected-error {{Only one unified_shared_memory clause can appear on a requires 
directive in a single translation unit}} expected-error {{directive '#pragma 
omp requires' cannot contain more than one 'unified_shared_memory' 

[clang] ae9c074 - [OpenMP][Clang] Allow ancestor device modifier only with reverse offloading

2022-02-04 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-02-04T12:10:14Z
New Revision: ae9c0740648fd8f7010c895ddcf78380da94dd57

URL: 
https://github.com/llvm/llvm-project/commit/ae9c0740648fd8f7010c895ddcf78380da94dd57
DIFF: 
https://github.com/llvm/llvm-project/commit/ae9c0740648fd8f7010c895ddcf78380da94dd57.diff

LOG: [OpenMP][Clang] Allow ancestor device modifier only with reverse offloading

OpenMP Spec 5.0 [2.12.5, Restrictions]: If a device clause in which the
ancestor device-modifier appears is present on the target construct,
then a requires directive with the reverse_offload clause must be
specified.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D118887

Added: 
clang/test/OpenMP/target_device_ancestor_messages.cpp

Modified: 
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/Sema/SemaOpenMP.cpp
clang/test/OpenMP/target_ast_print.cpp
clang/test/OpenMP/target_device_codegen.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 46316bd5d6b2b..d5e653a7fa192 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -10696,6 +10696,8 @@ def err_omp_directive_before_requires : Error <
   "'%0' region encountered before requires directive with '%1' clause">;
 def note_omp_requires_encountered_directive : Note <
   "'%0' previously encountered here">;
+def err_omp_device_ancestor_without_requires_reverse_offload : Error <
+  "Device clause with ancestor device-modifier used without specifying 
'requires reverse_offload'">;
 def err_omp_invalid_scope : Error <
   "'#pragma omp %0' directive must appear only in file scope">;
 def note_omp_invalid_length_on_this_ptr_mapping : Note <

diff  --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index a500ad4f02209..a4092f0d2b543 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -18759,6 +18759,18 @@ OMPClause 
*Sema::ActOnOpenMPDeviceClause(OpenMPDeviceClauseModifier Modifier,
   if (ErrorFound)
 return nullptr;
 
+  // OpenMP 5.0 [2.12.5, Restrictions]
+  // In case of ancestor device-modifier, a requires directive with
+  // the reverse_offload clause must be specified.
+  if (Modifier == OMPC_DEVICE_ancestor) {
+if (!DSAStack->hasRequiresDeclWithClause()) {
+  targetDiag(
+  StartLoc,
+  diag::err_omp_device_ancestor_without_requires_reverse_offload);
+  ErrorFound = true;
+}
+  }
+
   OpenMPDirectiveKind DKind = DSAStack->getCurrentDirective();
   OpenMPDirectiveKind CaptureRegion =
   getOpenMPCaptureRegionForClause(DKind, OMPC_device, LangOpts.OpenMP);

diff  --git a/clang/test/OpenMP/target_ast_print.cpp 
b/clang/test/OpenMP/target_ast_print.cpp
index 8464b6b3d16df..7d6cd9e14c4e3 100644
--- a/clang/test/OpenMP/target_ast_print.cpp
+++ b/clang/test/OpenMP/target_ast_print.cpp
@@ -342,7 +342,7 @@ int main (int argc, char **argv) {
 // RUN: %clang_cc1 -DOMP5 -verify -fopenmp-simd -fopenmp-version=50 -ast-print 
%s | FileCheck %s --check-prefix OMP5
 // RUN: %clang_cc1 -DOMP5 -fopenmp-simd -fopenmp-version=50 -x c++ -std=c++11 
-emit-pch -o %t %s
 // RUN: %clang_cc1 -DOMP5 -fopenmp-simd -fopenmp-version=50 -std=c++11 
-include-pch %t -fsyntax-only -verify %s -ast-print | FileCheck %s 
--check-prefix OMP5
-
+#pragma omp requires reverse_offload
 typedef void **omp_allocator_handle_t;
 extern const omp_allocator_handle_t omp_null_allocator;
 extern const omp_allocator_handle_t omp_default_mem_alloc;

diff  --git a/clang/test/OpenMP/target_device_ancestor_messages.cpp 
b/clang/test/OpenMP/target_device_ancestor_messages.cpp
new file mode 100644
index 0..bc1d668d19143
--- /dev/null
+++ b/clang/test/OpenMP/target_device_ancestor_messages.cpp
@@ -0,0 +1,7 @@
+// RUN: %clang_cc1 -triple=x86_64 -verify -fopenmp -fopenmp-targets=x86_64 -x 
c++ -fexceptions -fcxx-exceptions %s
+// RUN: %clang_cc1 -triple=x86_64 -verify -fopenmp-simd 
-fopenmp-targets=x86_64 -x c++ -fexceptions -fcxx-exceptions %s
+
+void bar() {
+#pragma omp target device(ancestor : 1) // expected-error {{Device clause with 
ancestor device-modifier used without specifying 'requires reverse_offload'}}
+  ;
+}

diff  --git a/clang/test/OpenMP/target_device_codegen.cpp 
b/clang/test/OpenMP/target_device_codegen.cpp
index abdefef5cc076..f77d2362ddf9a 100644
--- a/clang/test/OpenMP/target_device_codegen.cpp
+++ b/clang/test/OpenMP/target_device_codegen.cpp
@@ -11,7 +11,7 @@
 // expected-no-diagnostics
 #ifndef HEADER
 #define HEADER
-
+#pragma omp requires reverse_offload
 void foo(int n) {
 
   // CHECK:   [[N:%.+]] = load i32, i32* [[N_ADDR:%.+]],



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 52fddcd - [clang-format] Format ParseOpenMP.cpp changes

2022-01-27 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-01-27T09:00:34Z
New Revision: 52fddcdd9c90a550d7a50cbc2013be3314f91d08

URL: 
https://github.com/llvm/llvm-project/commit/52fddcdd9c90a550d7a50cbc2013be3314f91d08
DIFF: 
https://github.com/llvm/llvm-project/commit/52fddcdd9c90a550d7a50cbc2013be3314f91d08.diff

LOG: [clang-format] Format ParseOpenMP.cpp changes

Properly format D116549.

Added: 


Modified: 
clang/lib/Parse/ParseOpenMP.cpp

Removed: 




diff  --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index de3d58baf84c9..8ad5edb1bcd63 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -2210,12 +2210,12 @@ Parser::DeclGroupPtrTy 
Parser::ParseOpenMPDeclarativeDirectiveWithExtDecl(
 VariantMatchInfo VMI;
 TI.getAsVariantMatchInfo(ASTCtx, VMI);
 
-std::function DiagUnknownTrait = [this, Loc](
-  StringRef ISATrait) {
-  // TODO Track the selector locations in a way that is accessible here to
-  // improve the diagnostic location.
-  Diag(Loc, diag::warn_unknown_declare_variant_isa_trait) << ISATrait;
-};
+std::function DiagUnknownTrait =
+[this, Loc](StringRef ISATrait) {
+  // TODO Track the selector locations in a way that is accessible here
+  // to improve the diagnostic location.
+  Diag(Loc, diag::warn_unknown_declare_variant_isa_trait) << ISATrait;
+};
 TargetOMPContext OMPCtx(
 ASTCtx, std::move(DiagUnknownTrait),
 /* CurrentFunctionDecl */ nullptr,
@@ -2551,12 +2551,12 @@ 
Parser::ParseOpenMPDeclarativeOrExecutableDirective(ParsedStmtContext StmtCtx) {
 TPA.Revert();
 // End of the first iteration. Parser is reset to the start of 
metadirective
 
-std::function DiagUnknownTrait = [this, Loc](
-  StringRef ISATrait) {
-  // TODO Track the selector locations in a way that is accessible here to
-  // improve the diagnostic location.
-  Diag(Loc, diag::warn_unknown_declare_variant_isa_trait) << ISATrait;
-};
+std::function DiagUnknownTrait =
+[this, Loc](StringRef ISATrait) {
+  // TODO Track the selector locations in a way that is accessible here
+  // to improve the diagnostic location.
+  Diag(Loc, diag::warn_unknown_declare_variant_isa_trait) << ISATrait;
+};
 TargetOMPContext OMPCtx(ASTContext, std::move(DiagUnknownTrait),
 /* CurrentFunctionDecl */ nullptr,
 ArrayRef());



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 6ee9654 - [Doc] Fix wrong indentation

2022-01-19 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-01-19T11:15:31Z
New Revision: 6ee965471363972fafbed60ad8d27d0f666f8671

URL: 
https://github.com/llvm/llvm-project/commit/6ee965471363972fafbed60ad8d27d0f666f8671
DIFF: 
https://github.com/llvm/llvm-project/commit/6ee965471363972fafbed60ad8d27d0f666f8671.diff

LOG: [Doc] Fix wrong indentation

Handle Sphinx's warning at line 218.

Added: 


Modified: 
clang/docs/ClangOffloadWrapper.rst

Removed: 




diff  --git a/clang/docs/ClangOffloadWrapper.rst 
b/clang/docs/ClangOffloadWrapper.rst
index 2a1e9f362608..efd042509547 100644
--- a/clang/docs/ClangOffloadWrapper.rst
+++ b/clang/docs/ClangOffloadWrapper.rst
@@ -214,7 +214,10 @@ For each offloading target, device ELF code objects are 
generated by ``clang``,
 
   * At compile time, the ``clang-offload-wrapper`` tool takes the following
 actions:
+
 * It embeds the ELF code objects for the device into the host code (see
   :ref:`openmp-device-binary_embedding`).
+
   * At execution time:
+
 * The global constructor gets run and it registers the device image.



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 0731f6b - [Doc] Add documentation for the clang-offload-wrapper tool (NFC)

2022-01-19 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-01-19T10:45:13Z
New Revision: 0731f6ba4f5773071a7b91248caf79a287703a28

URL: 
https://github.com/llvm/llvm-project/commit/0731f6ba4f5773071a7b91248caf79a287703a28
DIFF: 
https://github.com/llvm/llvm-project/commit/0731f6ba4f5773071a7b91248caf79a287703a28.diff

LOG: [Doc] Add documentation for the clang-offload-wrapper tool (NFC)

Add the missing documentation for this tool.

Reviewed By: sdmitriev

Differential Revision: https://reviews.llvm.org/D117120

Added: 
clang/docs/ClangOffloadWrapper.rst

Modified: 
clang/docs/index.rst

Removed: 




diff  --git a/clang/docs/ClangOffloadWrapper.rst 
b/clang/docs/ClangOffloadWrapper.rst
new file mode 100644
index 0..2a1e9f3626080
--- /dev/null
+++ b/clang/docs/ClangOffloadWrapper.rst
@@ -0,0 +1,220 @@
+=
+Clang Offload Wrapper
+=
+
+.. contents::
+   :local:
+
+.. _clang-offload-wrapper:
+
+Introduction
+
+
+This tool is used in OpenMP offloading toolchain to embed device code objects
+(usually ELF) into a wrapper host llvm IR (bitcode) file. The wrapper host IR
+is then assembled and linked with host code objects to generate the executable
+binary. See :ref:`image-binary-embedding-execution` for more details.
+
+Usage
+=
+
+This tool can be used as follows:
+
+.. code-block:: console
+
+  $ clang-offload-wrapper -help
+  OVERVIEW: A tool to create a wrapper bitcode for offload target binaries.
+  Takes offload target binaries as input and produces bitcode file containing
+  target binaries packaged as data and initialization code which registers
+  target binaries in offload runtime.
+  USAGE: clang-offload-wrapper [options] 
+  OPTIONS:
+  Generic Options:
+--help - Display available options 
(--help-hidden for more)
+--help-list- Display list of available options 
(--help-list-hidden for more)
+--version  - Display the version of this program
+  clang-offload-wrapper options:
+-o=  - Output filename
+--target=  - Target triple for the output module
+
+Example
+===
+
+.. code-block:: console
+
+  clang-offload-wrapper -target host-triple -o host-wrapper.bc 
gfx90a-binary.out
+
+.. _openmp-device-binary_embedding:
+
+OpenMP Device Binary Embedding
+==
+
+Various structures and functions used in the wrapper host IR form the interface
+between the executable binary and the OpenMP runtime.
+
+Enum Types
+--
+
+:ref:`table-offloading-declare-target-flags` lists 
diff erent flag for
+offloading entries.
+
+  .. table:: Offloading Declare Target Flags Enum
+:name: table-offloading-declare-target-flags
+
+
+-+---+--+
+|  Name   | Value | Description
  |
+
+=+===+==+
+| OMP_DECLARE_TARGET_LINK | 0x01  | Mark the entry as having a 'link' 
attribute (w.r.t. link clause) |
+
+-+---+--+
+| OMP_DECLARE_TARGET_CTOR | 0x02  | Mark the entry as being a global 
constructor |
+
+-+---+--+
+| OMP_DECLARE_TARGET_DTOR | 0x04  | Mark the entry as being a global 
destructor  |
+
+-+---+--+
+
+Structure Types
+---
+
+:ref:`table-tgt_offload_entry`, :ref:`table-tgt_device_image`, and
+:ref:`table-tgt_bin_desc` are the structures used in the wrapper host IR.
+
+  .. table:: __tgt_offload_entry structure
+:name: table-tgt_offload_entry
+
+
+-+++
+|   Type  | Identifier | Description   
 |
+
+=+++
+|  void*  |addr| Address of global symbol within device image 
(function or global)  |
+
+-+++
+|  char*  |name| Name of the symbol
 |
+
+-+++
+|  size_t |size| Size of the entry info (0 if it is a function)

[clang] 876b5ea - [OpenMP][Clang] Allow passing target features in ISA trait for metadirective clause

2022-01-11 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-01-12T05:24:49Z
New Revision: 876b5ea96bf5890074aec61bc2c6a37b2cdc0617

URL: 
https://github.com/llvm/llvm-project/commit/876b5ea96bf5890074aec61bc2c6a37b2cdc0617
DIFF: 
https://github.com/llvm/llvm-project/commit/876b5ea96bf5890074aec61bc2c6a37b2cdc0617.diff

LOG: [OpenMP][Clang] Allow passing target features in ISA trait for 
metadirective clause

Passing any feature in the device-isa trait which is not supported by the host
was causing a compilation failure.

Differential Revision: https://reviews.llvm.org/D116549

Added: 
clang/test/OpenMP/metadirective_device_isa_codegen.cpp
clang/test/OpenMP/metadirective_device_isa_codegen_amdgcn.cpp

Modified: 
clang/include/clang/Basic/DiagnosticParseKinds.td
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/Parse/ParseOpenMP.cpp
clang/test/OpenMP/metadirective_messages.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/DiagnosticParseKinds.td 
b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 193dff8b9c8f6..770ddb3ab16f6 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1376,7 +1376,7 @@ def warn_omp_declare_variant_string_literal_or_identifier
   "%select{set|selector|property}0; "
   "%select{set|selector|property}0 skipped">,
   InGroup;
-def warn_unknown_begin_declare_variant_isa_trait
+def warn_unknown_declare_variant_isa_trait
 : Warning<"isa trait '%0' is not known to the current target; verify the "
   "spelling or consider restricting the context selector with the "
   "'arch' selector further">,

diff  --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 1c8cd79910add..0a659688d82e0 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -10810,11 +10810,6 @@ def err_omp_non_lvalue_in_map_or_motion_clauses: Error<
   "expected addressable lvalue in '%0' clause">;
 def err_omp_var_expected : Error<
   "expected variable of the '%0' type%select{|, not %2}1">;
-def warn_unknown_declare_variant_isa_trait
-: Warning<"isa trait '%0' is not known to the current target; verify the "
-  "spelling or consider restricting the context selector with the "
-  "'arch' selector further">,
-  InGroup;
 def err_omp_non_pointer_type_array_shaping_base : Error<
   "expected expression with a pointer to a complete type as a base of an array 
"
   "shaping operation">;

diff  --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index 0089da741b597..de3d58baf84c9 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -2214,7 +2214,7 @@ Parser::DeclGroupPtrTy 
Parser::ParseOpenMPDeclarativeDirectiveWithExtDecl(
   StringRef ISATrait) {
   // TODO Track the selector locations in a way that is accessible here to
   // improve the diagnostic location.
-  Diag(Loc, diag::warn_unknown_begin_declare_variant_isa_trait) << 
ISATrait;
+  Diag(Loc, diag::warn_unknown_declare_variant_isa_trait) << ISATrait;
 };
 TargetOMPContext OMPCtx(
 ASTCtx, std::move(DiagUnknownTrait),
@@ -2551,7 +2551,13 @@ 
Parser::ParseOpenMPDeclarativeOrExecutableDirective(ParsedStmtContext StmtCtx) {
 TPA.Revert();
 // End of the first iteration. Parser is reset to the start of 
metadirective
 
-TargetOMPContext OMPCtx(ASTContext, /* DiagUnknownTrait */ nullptr,
+std::function DiagUnknownTrait = [this, Loc](
+  StringRef ISATrait) {
+  // TODO Track the selector locations in a way that is accessible here to
+  // improve the diagnostic location.
+  Diag(Loc, diag::warn_unknown_declare_variant_isa_trait) << ISATrait;
+};
+TargetOMPContext OMPCtx(ASTContext, std::move(DiagUnknownTrait),
 /* CurrentFunctionDecl */ nullptr,
 ArrayRef());
 

diff  --git a/clang/test/OpenMP/metadirective_device_isa_codegen.cpp 
b/clang/test/OpenMP/metadirective_device_isa_codegen.cpp
new file mode 100644
index 0..a1954c7b8bf1f
--- /dev/null
+++ b/clang/test/OpenMP/metadirective_device_isa_codegen.cpp
@@ -0,0 +1,32 @@
+// RUN: %clang_cc1 -verify -w -fopenmp -x c++ -triple x86_64-unknown-linux 
-emit-llvm %s -fexceptions -fcxx-exceptions -o - 
-fsanitize-address-use-after-scope | FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+void bar();
+
+void x86_64_device_isa_selected() {
+#pragma omp metadirective when(device = {isa("sse2")} \
+   : parallel) default(single)
+  bar();
+}
+// CHECK-LABEL: void @_Z26x86_64_device_isa_selectedv()
+// 

[clang] 49f23af - [OpenMP] Add nec and nvidia as compiler vendors for OpenMP

2022-01-04 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-01-04T12:30:43Z
New Revision: 49f23afdc3453ad6834f32f69b48aa88b5d17338

URL: 
https://github.com/llvm/llvm-project/commit/49f23afdc3453ad6834f32f69b48aa88b5d17338
DIFF: 
https://github.com/llvm/llvm-project/commit/49f23afdc3453ad6834f32f69b48aa88b5d17338.diff

LOG: [OpenMP] Add nec and nvidia as compiler vendors for OpenMP

OpenMP Specs 5.0[1] and 5.1[2] recognizes nec and nvidia as known
compiler vendors and their absence is causing compilation error in one
of the vendor based metadirective test of sollve_vv project[3].

[1] https://www.openmp.org/wp-content/uploads/Context-Definitions-5.0-v1.0.pdf
[2] 
https://www.openmp.org/wp-content/uploads/OpenMP-API-Additional-Definitions-2-0.pdf
[3] 
https://github.com/SOLLVE/sollve_vv/blob/master/tests/5.0/metadirective/test_metadirective_arch_nvidia_or_amd.c

Differential Revision: https://reviews.llvm.org/D116540

Added: 


Modified: 
clang/test/OpenMP/begin_declare_variant_messages.c
clang/test/OpenMP/declare_variant_messages.c
clang/test/OpenMP/declare_variant_messages.cpp
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def

Removed: 




diff  --git a/clang/test/OpenMP/begin_declare_variant_messages.c 
b/clang/test/OpenMP/begin_declare_variant_messages.c
index 5922153b24457..e419ec2345d6b 100644
--- a/clang/test/OpenMP/begin_declare_variant_messages.c
+++ b/clang/test/OpenMP/begin_declare_variant_messages.c
@@ -54,15 +54,15 @@ const int var;
 #pragma omp end declare variant
 #pragma omp begin declare variant match(implementation={vendor}) // 
expected-warning {{the context selector 'vendor' in context set 
'implementation' requires a context property defined in parentheses; selector 
ignored}} expected-note {{the ignored selector spans until here}}
 #pragma omp end declare variant
-#pragma omp begin declare variant match(implementation={vendor(}) // 
expected-error {{expected ')'}} expected-warning {{expected identifier or 
string literal describing a context property; property skipped}} expected-note 
{{context property options are: 'amd' 'arm' 'bsc' 'cray' 'fujitsu' 'gnu' 'ibm' 
'intel' 'llvm' 'pgi' 'ti' 'unknown'}} expected-note {{to match this '('}}
+#pragma omp begin declare variant match(implementation={vendor(}) // 
expected-error {{expected ')'}} expected-warning {{expected identifier or 
string literal describing a context property; property skipped}} expected-note 
{{context property options are: 'amd' 'arm' 'bsc' 'cray' 'fujitsu' 'gnu' 'ibm' 
'intel' 'llvm' 'nec' 'nvidia' 'pgi' 'ti' 'unknown'}} expected-note {{to match 
this '('}}
 #pragma omp end declare variant
-#pragma omp begin declare variant match(implementation={vendor()}) // 
expected-warning {{expected identifier or string literal describing a context 
property; property skipped}} expected-note {{context property options are: 
'amd' 'arm' 'bsc' 'cray' 'fujitsu' 'gnu' 'ibm' 'intel' 'llvm' 'pgi' 'ti' 
'unknown'}}
+#pragma omp begin declare variant match(implementation={vendor()}) // 
expected-warning {{expected identifier or string literal describing a context 
property; property skipped}} expected-note {{context property options are: 
'amd' 'arm' 'bsc' 'cray' 'fujitsu' 'gnu' 'ibm' 'intel' 'llvm' 'nec' 'nvidia' 
'pgi' 'ti' 'unknown'}}
 #pragma omp end declare variant
 #pragma omp begin declare variant match(implementation={vendor(score ibm)}) // 
expected-error {{expected '(' after 'score'}} expected-warning {{expected '':'' 
after the score expression; '':'' assumed}}
 #pragma omp end declare variant
-#pragma omp begin declare variant match(implementation={vendor(score( ibm)}) 
// expected-error {{use of undeclared identifier 'ibm'}} expected-error 
{{expected ')'}} expected-warning {{expected '':'' after the score expression; 
'':'' assumed}} expected-warning {{expected identifier or string literal 
describing a context property; property skipped}} expected-note {{context 
property options are: 'amd' 'arm' 'bsc' 'cray' 'fujitsu' 'gnu' 'ibm' 'intel' 
'llvm' 'pgi' 'ti' 'unknown'}} expected-note {{to match this '('}}
+#pragma omp begin declare variant match(implementation={vendor(score( ibm)}) 
// expected-error {{use of undeclared identifier 'ibm'}} expected-error 
{{expected ')'}} expected-warning {{expected '':'' after the score expression; 
'':'' assumed}} expected-warning {{expected identifier or string literal 
describing a context property; property skipped}} expected-note {{context 
property options are: 'amd' 'arm' 'bsc' 'cray' 'fujitsu' 'gnu' 'ibm' 'intel' 
'llvm' 'nec' 'nvidia' 'pgi' 'ti' 'unknown'}} expected-note {{to match this '('}}
 #pragma omp end declare variant
-#pragma omp begin declare variant match(implementation={vendor(score(2 ibm)}) 
// expected-error {{expected ')'}} expected-error {{expected ')'}} 
expected-warning {{expected '':'' after the score expression; '':'' assumed}} 
expected-warning {{expected identifier or string literal 

[clang] 3235726 - [Clang][NFC] Fix multiline comment prefixes in function headers

2022-01-04 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2022-01-04T11:51:31Z
New Revision: 32357266fd055e0eba63fc321f31a1c88eae0ea8

URL: 
https://github.com/llvm/llvm-project/commit/32357266fd055e0eba63fc321f31a1c88eae0ea8
DIFF: 
https://github.com/llvm/llvm-project/commit/32357266fd055e0eba63fc321f31a1c88eae0ea8.diff

LOG: [Clang][NFC] Fix multiline comment prefixes in function headers

Cleanup of D105191 after latest clang-format changes.

Reviewed By: MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D111545

Added: 


Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
clang/lib/Driver/ToolChains/Cuda.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index f282f04b79311..198e3546d4fa2 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -131,9 +131,8 @@ const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
   }
 
   AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "amdgcn",
-  SubArchName,
-  /* bitcode SDL?*/ true,
-  /* PostClang Link? */ false);
+ SubArchName, /*isBitCodeSDL=*/true,
+ /*postClangLink=*/false);
   // Add an intermediate output file.
   CmdArgs.push_back("-o");
   const char *OutputFileName =

diff  --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index ee573b89bed13..7324339efaa62 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -612,8 +612,9 @@ void NVPTX::OpenMPLinker::ConstructJob(Compilation , 
const JobAction ,
 CmdArgs.push_back(CubinF);
   }
 
-  AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "nvptx", 
GPUArch,
-  false, false);
+  AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "nvptx",
+ GPUArch, /*isBitCodeSDL=*/false,
+ /*postClangLink=*/false);
 
   // Find nvlink and pass it as "--nvlink-path=" argument of
   // clang-nvlink-wrapper.
@@ -752,8 +753,9 @@ void CudaToolChain::addClangTargetOptions(
 
 addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, BitcodeSuffix,
getTriple());
-AddStaticDeviceLibsPostLinking(getDriver(), DriverArgs, CC1Args, "nvptx", 
GpuArch,
-/* bitcode SDL?*/ true, /* PostClang Link? */ true);
+AddStaticDeviceLibsPostLinking(getDriver(), DriverArgs, CC1Args, "nvptx",
+   GpuArch, /*isBitCodeSDL=*/true,
+   /*postClangLink=*/true);
   }
 }
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] f565488 - [Clang][clang-nvlink-wrapper] Pass nvlink path to the wrapper

2021-10-12 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-12T16:15:52Z
New Revision: f56548829c4c696d798c252bf097b71538bd45d7

URL: 
https://github.com/llvm/llvm-project/commit/f56548829c4c696d798c252bf097b71538bd45d7
DIFF: 
https://github.com/llvm/llvm-project/commit/f56548829c4c696d798c252bf097b71538bd45d7.diff

LOG: [Clang][clang-nvlink-wrapper] Pass nvlink path to the wrapper

Added support of a "--nvlink-path" option in clang-nvlink-wrapper which
takes the path of nvlink binary.

Static Device Library support for OpenMP (D105191) now searches for
nvlink binary and passes its location via this option. In absence
of this option, nvlink binary is searched in locations in PATH.

Differential Revision: https://reviews.llvm.org/D111488

Added: 


Modified: 
clang/lib/Driver/ToolChains/Cuda.cpp
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 18351dae39f7e..0ad1ffb079b31 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -613,6 +613,11 @@ void NVPTX::OpenMPLinker::ConstructJob(Compilation , 
const JobAction ,
   AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "nvptx", 
GPUArch,
   false, false);
 
+  // Find nvlink and pass it as "--nvlink-path=" argument of
+  // clang-nvlink-wrapper.
+  CmdArgs.push_back(Args.MakeArgString(
+  Twine("--nvlink-path=" + getToolChain().GetProgramPath("nvlink";
+
   const char *Exec =
   
Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper"));
   C.addCommand(std::make_unique(

diff  --git a/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp 
b/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
index 5c8b7b9db6884..bc5b9a9f1fde7 100644
--- a/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
+++ b/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
@@ -25,6 +25,7 @@
 /// 2. nvlink -o a.out-openmp-nvptx64 /tmp/a.cubin /tmp/b.cubin
 //===-===//
 
+#include "clang/Basic/Version.h"
 #include "llvm/Object/Archive.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Errc.h"
@@ -41,6 +42,19 @@ using namespace llvm;
 
 static cl::opt Help("h", cl::desc("Alias for -help"), cl::Hidden);
 
+// Mark all our options with this category, everything else (except for -help)
+// will be hidden.
+static cl::OptionCategory
+ClangNvlinkWrapperCategory("clang-nvlink-wrapper options");
+
+static cl::opt NvlinkUserPath("nvlink-path",
+   cl::desc("Path of nvlink binary"),
+   
cl::cat(ClangNvlinkWrapperCategory));
+
+// Do not parse nvlink options
+static cl::list
+NVArgs(cl::Sink, cl::desc("..."));
+
 static Error runNVLink(std::string NVLinkPath,
SmallVectorImpl ) {
   std::vector NVLArgs;
@@ -119,8 +133,20 @@ static Error cleanupTmpFiles(SmallVectorImpl 
) {
   return Error::success();
 }
 
+static void PrintVersion(raw_ostream ) {
+  OS << clang::getClangToolFullVersion("clang-nvlink-wrapper") << '\n';
+}
+
 int main(int argc, const char **argv) {
   sys::PrintStackTraceOnErrorSignal(argv[0]);
+  cl::SetVersionPrinter(PrintVersion);
+  cl::HideUnrelatedOptions(ClangNvlinkWrapperCategory);
+  cl::ParseCommandLineOptions(
+  argc, argv,
+  "A wrapper tool over nvlink program. It transparently passes every \n"
+  "input option and objects to nvlink except archive files and path of \n"
+  "nvlink binary. It reads each input archive file to extract archived \n"
+  "cubin files as temporary files.\n");
 
   if (Help) {
 cl::PrintHelpMessage();
@@ -132,12 +158,7 @@ int main(int argc, const char **argv) {
 exit(1);
   };
 
-  ErrorOr NvlinkPath = sys::findProgramByName("nvlink");
-  if (!NvlinkPath) {
-reportError(createStringError(NvlinkPath.getError(),
-  "unable to find 'nvlink' in path"));
-  }
-
+  std::string NvlinkPath;
   SmallVector Argv(argv, argv + argc);
   SmallVector ArgvSubst;
   SmallVector TmpFiles;
@@ -145,8 +166,7 @@ int main(int argc, const char **argv) {
   StringSaver Saver(Alloc);
   cl::ExpandResponseFiles(Saver, cl::TokenizeGNUCommandLine, Argv);
 
-  for (size_t i = 1; i < Argv.size(); ++i) {
-std::string Arg = Argv[i];
+  for (const std::string  : NVArgs) {
 if (sys::path::extension(Arg) == ".a") {
   if (Error Err = extractArchiveFiles(Arg, ArgvSubst, TmpFiles))
 reportError(std::move(Err));
@@ -155,7 +175,19 @@ int main(int argc, const char **argv) {
 }
   }
 
-  if (Error Err = runNVLink(NvlinkPath.get(), ArgvSubst))
+  NvlinkPath = NvlinkUserPath;
+
+  // If user hasn't specified nvlink binary then search it in PATH
+  if (NvlinkPath.empty()) {
+ErrorOr NvlinkPathErr = 

[clang] 35ebe4c - [Clang][OpenMP] Add partial support for Static Device Libraries

2021-10-08 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-08T09:37:51Z
New Revision: 35ebe4cc24f87397762e35831953c4bfe5f52def

URL: 
https://github.com/llvm/llvm-project/commit/35ebe4cc24f87397762e35831953c4bfe5f52def
DIFF: 
https://github.com/llvm/llvm-project/commit/35ebe4cc24f87397762e35831953c4bfe5f52def.diff

LOG: [Clang][OpenMP] Add partial support for Static Device Libraries

An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.
  clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib

It currently doesn't support linking an archive directly, like:
  clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a

Linking with x86 offload also does not work.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D105191

Added: 
clang/test/Driver/Inputs/openmp_static_device_link/libFatArchive.a
clang/test/Driver/fat_archive_amdgpu.cpp
clang/test/Driver/fat_archive_nvptx.cpp

Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Driver/ToolChains/CommonArgs.cpp
clang/lib/Driver/ToolChains/CommonArgs.h
clang/lib/Driver/ToolChains/Cuda.cpp
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index 135e3694434db..5400e26177291 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -114,6 +114,10 @@ const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
 }
   }
 
+  AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "amdgcn",
+  SubArchName,
+  /* bitcode SDL?*/ true,
+  /* PostClang Link? */ false);
   // Add an intermediate output file.
   CmdArgs.push_back("-o");
   const char *OutputFileName =

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 3f98914bd1904..c636c25a1dc81 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7745,12 +7745,28 @@ void OffloadBundler::ConstructJob(Compilation , const 
JobAction ,
 Triples += Action::GetOffloadKindName(CurKind);
 Triples += '-';
 Triples += CurTC->getTriple().normalize();
-if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_OpenMP ||
- CurKind == Action::OFK_Cuda) &&
+if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_Cuda) &&
 CurDep->getOffloadingArch()) {
   Triples += '-';
   Triples += CurDep->getOffloadingArch();
 }
+
+// TODO: Replace parsing of -march flag. Can be done by storing GPUArch
+//   with each toolchain.
+StringRef GPUArchName;
+if (CurKind == Action::OFK_OpenMP) {
+  // Extract GPUArch from -march argument in TC argument list.
+  for (unsigned ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
+auto ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
+auto Arch = ArchStr.startswith_insensitive("-march=");
+if (Arch) {
+  GPUArchName = ArchStr.substr(7);
+  Triples += "-";
+  break;
+}
+  }
+  Triples += GPUArchName.str();
+}
   }
   CmdArgs.push_back(TCArgs.MakeArgString(Triples));
 
@@ -7824,12 +7840,27 @@ void OffloadBundler::ConstructJobMultipleOutputs(
 Triples += '-';
 Triples += Dep.DependentToolChain->getTriple().normalize();
 if ((Dep.DependentOffloadKind == Action::OFK_HIP ||
- Dep.DependentOffloadKind == Action::OFK_OpenMP ||
  Dep.DependentOffloadKind == Action::OFK_Cuda) &&
 !Dep.DependentBoundArch.empty()) {
   Triples += '-';
   Triples += Dep.DependentBoundArch;
 }
+// TODO: Replace parsing of -march flag. Can be done by storing GPUArch
+//   with each toolchain.
+StringRef GPUArchName;
+if (Dep.DependentOffloadKind == Action::OFK_OpenMP) {
+  // Extract GPUArch from -march argument in TC argument list.
+  for (unsigned ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
+StringRef ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
+auto Arch = ArchStr.startswith_insensitive("-march=");
+if (Arch) {
+  GPUArchName = ArchStr.substr(7);
+  Triples += "-";
+  break;
+}
+  }
+  Triples += GPUArchName.str();
+}
   }
 
   CmdArgs.push_back(TCArgs.MakeArgString(Triples));

diff  --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 9f1895466c98d..c3abdf446cfaf 

[clang] 94e2b02 - Revert "[Clang][OpenMP] Add partial support for Static Device Libraries"

2021-10-07 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-07T14:13:24Z
New Revision: 94e2b0258a176c7451dd8291cdf060ea048fee44

URL: 
https://github.com/llvm/llvm-project/commit/94e2b0258a176c7451dd8291cdf060ea048fee44
DIFF: 
https://github.com/llvm/llvm-project/commit/94e2b0258a176c7451dd8291cdf060ea048fee44.diff

LOG: Revert "[Clang][OpenMP] Add partial support for Static Device Libraries"

This reverts commit 4c4117089599cb5b6c6fa5635c28462ffd1bddf4.

Added: 


Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Driver/ToolChains/CommonArgs.cpp
clang/lib/Driver/ToolChains/CommonArgs.h
clang/lib/Driver/ToolChains/Cuda.cpp
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 
clang/test/Driver/Inputs/openmp_static_device_link/libFatArchive.a
clang/test/Driver/fat_archive_amdgpu.cpp
clang/test/Driver/fat_archive_nvptx.cpp



diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index 5400e26177291..135e3694434db 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -114,10 +114,6 @@ const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
 }
   }
 
-  AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "amdgcn",
-  SubArchName,
-  /* bitcode SDL?*/ true,
-  /* PostClang Link? */ false);
   // Add an intermediate output file.
   CmdArgs.push_back("-o");
   const char *OutputFileName =

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 65dfe0ae0221d..369c12aea5231 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7734,28 +7734,12 @@ void OffloadBundler::ConstructJob(Compilation , const 
JobAction ,
 Triples += Action::GetOffloadKindName(CurKind);
 Triples += '-';
 Triples += CurTC->getTriple().normalize();
-if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_Cuda) &&
+if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_OpenMP ||
+ CurKind == Action::OFK_Cuda) &&
 CurDep->getOffloadingArch()) {
   Triples += '-';
   Triples += CurDep->getOffloadingArch();
 }
-
-// TODO: Replace parsing of -march flag. Can be done by storing GPUArch
-//   with each toolchain.
-StringRef GPUArchName;
-if (CurKind == Action::OFK_OpenMP) {
-  // Extract GPUArch from -march argument in TC argument list.
-  for (unsigned ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
-auto ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
-auto Arch = ArchStr.startswith_insensitive("-march=");
-if (Arch) {
-  GPUArchName = ArchStr.substr(7);
-  Triples += "-";
-  break;
-}
-  }
-  Triples += GPUArchName.str();
-}
   }
   CmdArgs.push_back(TCArgs.MakeArgString(Triples));
 
@@ -7829,27 +7813,12 @@ void OffloadBundler::ConstructJobMultipleOutputs(
 Triples += '-';
 Triples += Dep.DependentToolChain->getTriple().normalize();
 if ((Dep.DependentOffloadKind == Action::OFK_HIP ||
+ Dep.DependentOffloadKind == Action::OFK_OpenMP ||
  Dep.DependentOffloadKind == Action::OFK_Cuda) &&
 !Dep.DependentBoundArch.empty()) {
   Triples += '-';
   Triples += Dep.DependentBoundArch;
 }
-// TODO: Replace parsing of -march flag. Can be done by storing GPUArch
-//   with each toolchain.
-StringRef GPUArchName;
-if (Dep.DependentOffloadKind == Action::OFK_OpenMP) {
-  // Extract GPUArch from -march argument in TC argument list.
-  for (uint ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
-StringRef ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
-auto Arch = ArchStr.startswith_insensitive("-march=");
-if (Arch) {
-  GPUArchName = ArchStr.substr(7);
-  Triples += "-";
-  break;
-}
-  }
-  Triples += GPUArchName.str();
-}
   }
 
   CmdArgs.push_back(TCArgs.MakeArgString(Triples));

diff  --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index c3abdf446cfaf..9f1895466c98d 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -34,7 +34,6 @@
 #include "clang/Driver/Util.h"
 #include "clang/Driver/XRayArgs.h"
 #include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringSwitch.h"
@@ -1588,292 +1587,6 @@ void tools::addX86AlignBranchArgs(const Driver , 
const ArgList ,
   }
 }
 
-/// SDLSearch: Search for Static Device Library
-/// The search for SDL bitcode files is consistent with how static host
-/// libraries are 

[clang] 1097f48 - Revert "[Clang][OpenMP] Fix fat archive tests for Mac and Windows"

2021-10-07 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-07T14:13:24Z
New Revision: 1097f48e3dc3ab6811419f867350e79e280fb0eb

URL: 
https://github.com/llvm/llvm-project/commit/1097f48e3dc3ab6811419f867350e79e280fb0eb
DIFF: 
https://github.com/llvm/llvm-project/commit/1097f48e3dc3ab6811419f867350e79e280fb0eb.diff

LOG: Revert "[Clang][OpenMP] Fix fat archive tests for Mac and Windows"

This reverts commit 2baf7ad6d27fc9c08dd6eb9f8581d7e1353d4ece.

Added: 


Modified: 
clang/test/Driver/fat_archive_amdgpu.cpp
clang/test/Driver/fat_archive_nvptx.cpp

Removed: 




diff  --git a/clang/test/Driver/fat_archive_amdgpu.cpp 
b/clang/test/Driver/fat_archive_amdgpu.cpp
index 5b162f99326f..b64ba8b97478 100644
--- a/clang/test/Driver/fat_archive_amdgpu.cpp
+++ b/clang/test/Driver/fat_archive_amdgpu.cpp
@@ -10,7 +10,7 @@
 // CHECK: clang{{.*}}"-cc1"{{.*}}"-triple" 
"amdgcn-amd-amdhsa"{{.*}}"-emit-llvm-bc"{{.*}}"-target-cpu" 
"[[GPU:gfx[0-9]+]]"{{.*}}"-o" "[[HOSTBC:.*.bc]]" "-x" "c++"{{.*}}.cpp
 // CHECK: clang-offload-bundler" "-unbundle" "-type=a" 
"-inputs={{.*}}/Inputs/openmp_static_device_link/libFatArchive.a" 
"-targets=openmp-amdgcn-amd-amdhsa-[[GPU]]" 
"-outputs=[[DEVICESPECIFICARCHIVE:.*.a]]" "-allow-missing-bundles"
 // CHECK: llvm-link{{.*}}"[[HOSTBC]]" "[[DEVICESPECIFICARCHIVE]]" "-o" 
"{{.*}}-[[GPU]]-linked-{{.*}}.bc"
-// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp"
+// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp{{.*}}-lomptarget"
 // expected-no-diagnostics
 
 #ifndef HEADER

diff  --git a/clang/test/Driver/fat_archive_nvptx.cpp 
b/clang/test/Driver/fat_archive_nvptx.cpp
index f04ede4cf526..72e20d00651e 100644
--- a/clang/test/Driver/fat_archive_nvptx.cpp
+++ b/clang/test/Driver/fat_archive_nvptx.cpp
@@ -10,7 +10,7 @@
 // CHECK: clang{{.*}}"-cc1"{{.*}}"-triple" "nvptx64"{{.*}}"-target-cpu" 
"[[GPU:sm_[0-9]+]]"{{.*}}"-o" "[[HOSTBC:.*.s]]" "-x" "c++"{{.*}}.cpp
 // CHECK: clang-offload-bundler" "-unbundle" "-type=a" 
"-inputs={{.*}}/Inputs/openmp_static_device_link/libFatArchive.a" 
"-targets=openmp-nvptx64-[[GPU]]" "-outputs=[[DEVICESPECIFICARCHIVE:.*.a]]" 
"-allow-missing-bundles"
 // CHECK: clang-nvlink-wrapper{{.*}}"-o" "{{.*}}.out" "-arch" "[[GPU]]" 
"{{.*}}[[DEVICESPECIFICARCHIVE]]"
-// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp{{.*}}"
+// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp{{.*}}-lomptarget"
 // expected-no-diagnostics
 
 #ifndef HEADER



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 3eb44f4 - Revert "[Clang][OpenMP] Fix windows buildbot failure for D105191"

2021-10-07 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-07T14:13:24Z
New Revision: 3eb44f4d28df3d9e9528b8b9f8f6b93ab4c2af67

URL: 
https://github.com/llvm/llvm-project/commit/3eb44f4d28df3d9e9528b8b9f8f6b93ab4c2af67
DIFF: 
https://github.com/llvm/llvm-project/commit/3eb44f4d28df3d9e9528b8b9f8f6b93ab4c2af67.diff

LOG: Revert "[Clang][OpenMP] Fix windows buildbot failure for D105191"

This reverts commit 06404d5488ea505b00f711393973db3ae32d01e9.

Added: 


Modified: 
clang/lib/Driver/ToolChains/Clang.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 8f0082dcadc56..65dfe0ae0221d 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7839,7 +7839,7 @@ void OffloadBundler::ConstructJobMultipleOutputs(
 StringRef GPUArchName;
 if (Dep.DependentOffloadKind == Action::OFK_OpenMP) {
   // Extract GPUArch from -march argument in TC argument list.
-  for (unsigned ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
+  for (uint ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
 StringRef ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
 auto Arch = ArchStr.startswith_insensitive("-march=");
 if (Arch) {



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 2baf7ad - [Clang][OpenMP] Fix fat archive tests for Mac and Windows

2021-10-07 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-07T13:38:46Z
New Revision: 2baf7ad6d27fc9c08dd6eb9f8581d7e1353d4ece

URL: 
https://github.com/llvm/llvm-project/commit/2baf7ad6d27fc9c08dd6eb9f8581d7e1353d4ece
DIFF: 
https://github.com/llvm/llvm-project/commit/2baf7ad6d27fc9c08dd6eb9f8581d7e1353d4ece.diff

LOG: [Clang][OpenMP] Fix fat archive tests for Mac and Windows

Fixes missing libomptarget on Mac and Windows in check lines. Issue
was introduced by D105191.

Differential Revision: https://reviews.llvm.org/D111311

Added: 


Modified: 
clang/test/Driver/fat_archive_amdgpu.cpp
clang/test/Driver/fat_archive_nvptx.cpp

Removed: 




diff  --git a/clang/test/Driver/fat_archive_amdgpu.cpp 
b/clang/test/Driver/fat_archive_amdgpu.cpp
index b64ba8b97478..5b162f99326f 100644
--- a/clang/test/Driver/fat_archive_amdgpu.cpp
+++ b/clang/test/Driver/fat_archive_amdgpu.cpp
@@ -10,7 +10,7 @@
 // CHECK: clang{{.*}}"-cc1"{{.*}}"-triple" 
"amdgcn-amd-amdhsa"{{.*}}"-emit-llvm-bc"{{.*}}"-target-cpu" 
"[[GPU:gfx[0-9]+]]"{{.*}}"-o" "[[HOSTBC:.*.bc]]" "-x" "c++"{{.*}}.cpp
 // CHECK: clang-offload-bundler" "-unbundle" "-type=a" 
"-inputs={{.*}}/Inputs/openmp_static_device_link/libFatArchive.a" 
"-targets=openmp-amdgcn-amd-amdhsa-[[GPU]]" 
"-outputs=[[DEVICESPECIFICARCHIVE:.*.a]]" "-allow-missing-bundles"
 // CHECK: llvm-link{{.*}}"[[HOSTBC]]" "[[DEVICESPECIFICARCHIVE]]" "-o" 
"{{.*}}-[[GPU]]-linked-{{.*}}.bc"
-// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp{{.*}}-lomptarget"
+// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp"
 // expected-no-diagnostics
 
 #ifndef HEADER

diff  --git a/clang/test/Driver/fat_archive_nvptx.cpp 
b/clang/test/Driver/fat_archive_nvptx.cpp
index 72e20d00651e..f04ede4cf526 100644
--- a/clang/test/Driver/fat_archive_nvptx.cpp
+++ b/clang/test/Driver/fat_archive_nvptx.cpp
@@ -10,7 +10,7 @@
 // CHECK: clang{{.*}}"-cc1"{{.*}}"-triple" "nvptx64"{{.*}}"-target-cpu" 
"[[GPU:sm_[0-9]+]]"{{.*}}"-o" "[[HOSTBC:.*.s]]" "-x" "c++"{{.*}}.cpp
 // CHECK: clang-offload-bundler" "-unbundle" "-type=a" 
"-inputs={{.*}}/Inputs/openmp_static_device_link/libFatArchive.a" 
"-targets=openmp-nvptx64-[[GPU]]" "-outputs=[[DEVICESPECIFICARCHIVE:.*.a]]" 
"-allow-missing-bundles"
 // CHECK: clang-nvlink-wrapper{{.*}}"-o" "{{.*}}.out" "-arch" "[[GPU]]" 
"{{.*}}[[DEVICESPECIFICARCHIVE]]"
-// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp{{.*}}-lomptarget"
+// CHECK: ld"{{.*}}" "-L{{.*}}/Inputs/openmp_static_device_link" "{{.*}} 
"-lFatArchive" "{{.*}}" "-lomp{{.*}}"
 // expected-no-diagnostics
 
 #ifndef HEADER



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 06404d5 - [Clang][OpenMP] Fix windows buildbot failure for D105191

2021-10-06 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-07T05:54:56Z
New Revision: 06404d5488ea505b00f711393973db3ae32d01e9

URL: 
https://github.com/llvm/llvm-project/commit/06404d5488ea505b00f711393973db3ae32d01e9
DIFF: 
https://github.com/llvm/llvm-project/commit/06404d5488ea505b00f711393973db3ae32d01e9.diff

LOG: [Clang][OpenMP] Fix windows buildbot failure for D105191

Fixes 4c4117089599cb5b6c6fa5635c28462ffd1bddf4.

Added: 


Modified: 
clang/lib/Driver/ToolChains/Clang.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 65dfe0ae0221d..8f0082dcadc56 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7839,7 +7839,7 @@ void OffloadBundler::ConstructJobMultipleOutputs(
 StringRef GPUArchName;
 if (Dep.DependentOffloadKind == Action::OFK_OpenMP) {
   // Extract GPUArch from -march argument in TC argument list.
-  for (uint ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
+  for (unsigned ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
 StringRef ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
 auto Arch = ArchStr.startswith_insensitive("-march=");
 if (Arch) {



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 4c41170 - [Clang][OpenMP] Add partial support for Static Device Libraries

2021-10-06 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-10-07T04:45:19Z
New Revision: 4c4117089599cb5b6c6fa5635c28462ffd1bddf4

URL: 
https://github.com/llvm/llvm-project/commit/4c4117089599cb5b6c6fa5635c28462ffd1bddf4
DIFF: 
https://github.com/llvm/llvm-project/commit/4c4117089599cb5b6c6fa5635c28462ffd1bddf4.diff

LOG: [Clang][OpenMP] Add partial support for Static Device Libraries

An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.
  clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib

It currently doesn't support linking an archive directly, like:
  clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a

Linking with x86 offload also does not work.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D105191

Added: 
clang/test/Driver/Inputs/openmp_static_device_link/libFatArchive.a
clang/test/Driver/fat_archive_amdgpu.cpp
clang/test/Driver/fat_archive_nvptx.cpp

Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Driver/ToolChains/CommonArgs.cpp
clang/lib/Driver/ToolChains/CommonArgs.h
clang/lib/Driver/ToolChains/Cuda.cpp
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index 135e3694434db..5400e26177291 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -114,6 +114,10 @@ const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
 }
   }
 
+  AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "amdgcn",
+  SubArchName,
+  /* bitcode SDL?*/ true,
+  /* PostClang Link? */ false);
   // Add an intermediate output file.
   CmdArgs.push_back("-o");
   const char *OutputFileName =

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 369c12aea5231..65dfe0ae0221d 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7734,12 +7734,28 @@ void OffloadBundler::ConstructJob(Compilation , const 
JobAction ,
 Triples += Action::GetOffloadKindName(CurKind);
 Triples += '-';
 Triples += CurTC->getTriple().normalize();
-if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_OpenMP ||
- CurKind == Action::OFK_Cuda) &&
+if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_Cuda) &&
 CurDep->getOffloadingArch()) {
   Triples += '-';
   Triples += CurDep->getOffloadingArch();
 }
+
+// TODO: Replace parsing of -march flag. Can be done by storing GPUArch
+//   with each toolchain.
+StringRef GPUArchName;
+if (CurKind == Action::OFK_OpenMP) {
+  // Extract GPUArch from -march argument in TC argument list.
+  for (unsigned ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
+auto ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
+auto Arch = ArchStr.startswith_insensitive("-march=");
+if (Arch) {
+  GPUArchName = ArchStr.substr(7);
+  Triples += "-";
+  break;
+}
+  }
+  Triples += GPUArchName.str();
+}
   }
   CmdArgs.push_back(TCArgs.MakeArgString(Triples));
 
@@ -7813,12 +7829,27 @@ void OffloadBundler::ConstructJobMultipleOutputs(
 Triples += '-';
 Triples += Dep.DependentToolChain->getTriple().normalize();
 if ((Dep.DependentOffloadKind == Action::OFK_HIP ||
- Dep.DependentOffloadKind == Action::OFK_OpenMP ||
  Dep.DependentOffloadKind == Action::OFK_Cuda) &&
 !Dep.DependentBoundArch.empty()) {
   Triples += '-';
   Triples += Dep.DependentBoundArch;
 }
+// TODO: Replace parsing of -march flag. Can be done by storing GPUArch
+//   with each toolchain.
+StringRef GPUArchName;
+if (Dep.DependentOffloadKind == Action::OFK_OpenMP) {
+  // Extract GPUArch from -march argument in TC argument list.
+  for (uint ArgIndex = 0; ArgIndex < TCArgs.size(); ArgIndex++) {
+StringRef ArchStr = StringRef(TCArgs.getArgString(ArgIndex));
+auto Arch = ArchStr.startswith_insensitive("-march=");
+if (Arch) {
+  GPUArchName = ArchStr.substr(7);
+  Triples += "-";
+  break;
+}
+  }
+  Triples += GPUArchName.str();
+}
   }
 
   CmdArgs.push_back(TCArgs.MakeArgString(Triples));

diff  --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 9f1895466c98d..c3abdf446cfaf 100644

[clang] ee31ad0 - [clang-offload-bundler][docs][NFC] Add archive unbundling documentation

2021-09-21 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-09-21T19:24:44+05:30
New Revision: ee31ad0ab5f7d443b0bd582582a3524dcf7f13f0

URL: 
https://github.com/llvm/llvm-project/commit/ee31ad0ab5f7d443b0bd582582a3524dcf7f13f0
DIFF: 
https://github.com/llvm/llvm-project/commit/ee31ad0ab5f7d443b0bd582582a3524dcf7f13f0.diff

LOG: [clang-offload-bundler][docs][NFC] Add archive unbundling documentation

Add documentation of unbundling of heterogeneous device archives to
create device specific archives, as introduced by D93525. Also, add
documentation for supported text file formats.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D110083

Added: 


Modified: 
clang/docs/ClangOffloadBundler.rst

Removed: 




diff  --git a/clang/docs/ClangOffloadBundler.rst 
b/clang/docs/ClangOffloadBundler.rst
index a0e446f766eea..312c45602c991 100644
--- a/clang/docs/ClangOffloadBundler.rst
+++ b/clang/docs/ClangOffloadBundler.rst
@@ -30,9 +30,62 @@ includes an ``init`` function that will use the runtime 
corresponding to the
 offload kind (see :ref:`clang-offload-kind-table`) to load the offload code
 objects appropriate to the devices present when the host program is executed.
 
+Supported File Formats
+==
+Several text and binary file formats are supported for bundling/unbundling. See
+:ref:`supported-file-formats-table` for a list of currently supported formats.
+
+  .. table:: Supported File Formats
+ :name: supported-file-formats-table
+
+ +++-+
+ | File Format| File Extension | Text/Binary |
+ +++=+
+ | CPP output |i   | Text|
+ +++-+
+ | C++ CPP output |   ii   | Text|
+ +++-+
+ | CUDA/HIP output|   cui  | Text|
+ +++-+
+ | Dependency |d   | Text|
+ +++-+
+ | LLVM   |   ll   | Text|
+ +++-+
+ | LLVM Bitcode   |   bc   |Binary   |
+ +++-+
+ | Assembler  |s   | Text|
+ +++-+
+ | Object |o   |Binary   |
+ +++-+
+ | Archive of objects |a   |Binary   |
+ +++-+
+ | Precompiled header |   gch  |Binary   |
+ +++-+
+ | Clang AST file |   ast  |Binary   |
+ +++-+
+
+.. _clang-bundled-code-object-layout-text:
+
+Bundled Text File Layout
+
+
+The format of the bundled files is currently very simple: text formats are
+concatenated with comments that have a magic string and bundle entry ID in
+between.
+
+::
+
+  "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ 1st Bundle Entry ID"
+  Bundle 1
+  "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID"
+  ...
+  "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ Nth Bundle Entry ID"
+  Bundle N
+  "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID"
+
 .. _clang-bundled-code-object-layout:
 
-Bundled Code Object Layout
+Bundled Binary File Layout
 ==
 
 The layout of a bundled code object is defined by the following table:
@@ -209,3 +262,35 @@ Target specific information is available for the following:
   supported.
 
 Most other targets do not support target IDs.
+
+Archive Unbundling
+==
+
+Unbundling of heterogeneous device archive is done to create device specific
+archives. Heterogeneous Device Archive is in a format compatible with GNU ar
+utility and contains a collection of bundled device binaries where each bundle
+file will contain device binaries for a host and one or more targets. The
+output device specific archive is in a format compatible with GNU ar utility
+and contains a collection of device binaries for a specific target.
+
+  Heterogeneous Device Archive, HDA = {F1.X, F2.X, ..., FN.Y}
+  where, Fi = Bundle{Host-DeviceBinary, T1-DeviceBinary, T2-DeviceBinary, ...,
+ Tm-DeviceBinary},
+ Ti = {Target i, qualified using Bundle Entry ID},
+ X/Y = \*.bc for AMDGPU and \*.cubin for NVPTX
+
+  Device Specific Archive, DSA(Tk) = {F1-Tk-DeviceBinary.X, 
F2-Tk-DeviceBinary.X, ...
+  FN-Tk-DeviceBinary.Y}
+  where, Fi-Tj-DeviceBinary.X represents device binary of i-th 

[clang] 4a25c3f - [clang-offload-bundler] Fix compatibility testing for non-assert builds

2021-09-10 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-09-10T18:57:03+05:30
New Revision: 4a25c3fb61942c629907ae50462087c3fbb8703a

URL: 
https://github.com/llvm/llvm-project/commit/4a25c3fb61942c629907ae50462087c3fbb8703a
DIFF: 
https://github.com/llvm/llvm-project/commit/4a25c3fb61942c629907ae50462087c3fbb8703a.diff

LOG: [clang-offload-bundler] Fix compatibility testing for non-assert builds

Test using debug-only=CodeObjectComaptibility was failing in
non-assert builds, so it has been moved to a different file which
requires assert.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109592

Added: 
clang/test/Driver/clang-offload-bundler-asserts-on.c

Modified: 
clang/test/Driver/clang-offload-bundler.c

Removed: 




diff  --git a/clang/test/Driver/clang-offload-bundler-asserts-on.c 
b/clang/test/Driver/clang-offload-bundler-asserts-on.c
new file mode 100644
index 0..c11028d16343a
--- /dev/null
+++ b/clang/test/Driver/clang-offload-bundler-asserts-on.c
@@ -0,0 +1,31 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: asserts
+// UNSUPPORTED: darwin
+
+// Generate the file we can bundle.
+// RUN: %clang -O0 -target %itanium_abi_triple %s -c -o %t.o
+
+//
+// Generate a couple of files to bundle with.
+//
+// RUN: echo 'Content of device file 1' > %t.tgt1
+// RUN: echo 'Content of device file 2' > %t.tgt2
+
+//
+// Check code object compatibility for archive unbundling
+//
+// Create few code object bundles and archive them to create an input archive
+// RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx906,openmp-amdgcn-amd-amdhsa--gfx908
 -inputs=%t.o,%t.tgt1,%t.tgt2 -outputs=%t.simple.bundle
+// RUN: clang-offload-bundler -type=o 
-targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa--gfx903 
-inputs=%t.o,%t.tgt1 -outputs=%t.simple1.bundle
+// RUN: llvm-ar cr %t.input-archive.a %t.simple.bundle %t.simple1.bundle
+
+// Tests to check compatibility between Bundle Entry ID formats i.e. between 
presence/absence of extra hyphen in case of missing environment field
+// RUN: clang-offload-bundler -unbundle -type=a 
-targets=openmp-amdgcn-amd-amdhsa--gfx906,openmp-amdgcn-amd-amdhsa-gfx908 
-inputs=%t.input-archive.a 
-outputs=%t-archive-gfx906-simple.a,%t-archive-gfx908-simple.a 
-debug-only=CodeObjectCompatibility 2>&1 | FileCheck %s 
-check-prefix=BUNDLECOMPATIBILITY
+// BUNDLECOMPATIBILITY: Compatible: Exact match:[CodeObject: 
openmp-amdgcn-amd-amdhsa-gfx906]   :   [Target: 
openmp-amdgcn-amd-amdhsa--gfx906]
+// BUNDLECOMPATIBILITY: Compatible: Exact match:[CodeObject: 
openmp-amdgcn-amd-amdhsa--gfx908]  :   [Target: 
openmp-amdgcn-amd-amdhsa-gfx908]
+
+// Some code so that we can create a binary out of this file.
+int A = 0;
+void test_func(void) {
+  ++A;
+}

diff  --git a/clang/test/Driver/clang-offload-bundler.c 
b/clang/test/Driver/clang-offload-bundler.c
index d201dd4103892..9eb305d4d0eeb 100644
--- a/clang/test/Driver/clang-offload-bundler.c
+++ b/clang/test/Driver/clang-offload-bundler.c
@@ -401,11 +401,6 @@
 // RUN: cat %t-archive-gfx803-empty.a | FileCheck %s -check-prefix=EMPTYARCHIVE
 // EMPTYARCHIVE: !
 
-// Tests to check compatibility between Bundle Entry ID formats i.e. between 
presence/absence of extra hyphen in case of missing environment field
-// RUN: clang-offload-bundler -unbundle -type=a 
-targets=openmp-amdgcn-amd-amdhsa--gfx906,openmp-amdgcn-amd-amdhsa-gfx908 
-inputs=%t.input-archive.a 
-outputs=%t-archive-gfx906-simple.a,%t-archive-gfx908-simple.a 
-debug-only=CodeObjectCompatibility 2>&1 | FileCheck %s 
-check-prefix=BUNDLECOMPATIBILITY
-// BUNDLECOMPATIBILITY: Compatible: Exact match:[CodeObject: 
openmp-amdgcn-amd-amdhsa-gfx906]   :   [Target: 
openmp-amdgcn-amd-amdhsa--gfx906]
-// BUNDLECOMPATIBILITY: Compatible: Exact match:[CodeObject: 
openmp-amdgcn-amd-amdhsa--gfx908]  :   [Target: 
openmp-amdgcn-amd-amdhsa-gfx908]
-
 // Some code so that we can create a binary out of this file.
 int A = 0;
 void test_func(void) {



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 543604f - [clang-nvlink-wrapper][docs][NFC] Fix sphinx warning about asterisk

2021-09-09 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-09-09T23:55:15+05:30
New Revision: 543604f30eddc5c9390d0fb01b0ac67937cbba0e

URL: 
https://github.com/llvm/llvm-project/commit/543604f30eddc5c9390d0fb01b0ac67937cbba0e
DIFF: 
https://github.com/llvm/llvm-project/commit/543604f30eddc5c9390d0fb01b0ac67937cbba0e.diff

LOG: [clang-nvlink-wrapper][docs][NFC] Fix sphinx warning about asterisk

Sphinx was giving warning on unescaped special symbol *. It was
an issue on systems treating warning as error.

Added: 


Modified: 
clang/docs/ClangNvlinkWrapper.rst

Removed: 




diff  --git a/clang/docs/ClangNvlinkWrapper.rst 
b/clang/docs/ClangNvlinkWrapper.rst
index 193c0d420a21..0505d5f678c1 100644
--- a/clang/docs/ClangNvlinkWrapper.rst
+++ b/clang/docs/ClangNvlinkWrapper.rst
@@ -14,7 +14,7 @@ This tool works as a wrapper over the ``nvlink`` program. It 
is required
 because ``nvlink`` does not support linking of archive files implicitly. It
 transparently passes every input option and object to ``nvlink`` except archive
 files. It reads each input archive file to extract the archived cubin files as
-temporary files. These temporary (*.cubin) files are passed to ``nvlink``.
+temporary files. These temporary (\*.cubin) files are passed to ``nvlink``.
 
 Use Case
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 9838076 - [clang-offload-bundler] Make Bundle Entry ID backward compatible

2021-09-08 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-09-08T16:06:12+05:30
New Revision: 98380762c3b734c23d206182605ab9e035c93caa

URL: 
https://github.com/llvm/llvm-project/commit/98380762c3b734c23d206182605ab9e035c93caa
DIFF: 
https://github.com/llvm/llvm-project/commit/98380762c3b734c23d206182605ab9e035c93caa.diff

LOG: [clang-offload-bundler] Make Bundle Entry ID backward compatible

Earlier BundleEntryID used to be --.
This used to work because the clang-offload-bundler didn't need
GPUArch explicitly for any bundling/unbundling action. With
unbundleArchive it needs GPUArch to ensure compatibility between
device specific code objects. D93525 enforced triples to have
separators for all 4 components irrespective of number of
components, like "amdgcn-amd-amdhsa--". It was required to
to correctly parse a possible 4th environment component or a GPU.
But, this condition is breaking backward compatibility with
archive libraries compiled with compilers older than D93525.

This patch allows triples to have any number of components with
and without extra separator for empty environment field. Thus,
both the following bundle entry IDs are same:
openmp-amdgcn-amd-amdhsa--gfx906
openmp-amdgcn-amd-amdhsa-gfx906

Reviewed By: yaxunl, grokos

Differential Revision: https://reviews.llvm.org/D106809

Added: 


Modified: 
clang/docs/ClangOffloadBundler.rst
clang/lib/Driver/ToolChains/Clang.cpp
clang/test/Driver/clang-offload-bundler.c
clang/test/Driver/hip-rdc-device-only.hip
clang/test/Driver/hip-toolchain-rdc-separate.hip
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/docs/ClangOffloadBundler.rst 
b/clang/docs/ClangOffloadBundler.rst
index c92d8a94cfb54..a0e446f766eea 100644
--- a/clang/docs/ClangOffloadBundler.rst
+++ b/clang/docs/ClangOffloadBundler.rst
@@ -121,15 +121,7 @@ Where:
   = 
==
 
 **target-triple**
-The target triple of the code object:
-
-.. code::
-
-  ---
-
-It is required to have all four components present, if target-id is present.
-Components are hyphen separated. If a component is not specified then the
-empty string must be used in its place.
+The target triple of the code object.
 
 **target-id**
   The canonical target ID of the code object. Present only if the target

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 65e377f6a7f7a..5d817aa480bf4 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7686,16 +7686,12 @@ void OffloadBundler::ConstructJob(Compilation , const 
JobAction ,
   });
 }
 Triples += Action::GetOffloadKindName(CurKind);
-Triples += "-";
-std::string NormalizedTriple = CurTC->getTriple().normalize();
-Triples += NormalizedTriple;
-
-if (CurDep->getOffloadingArch() != nullptr) {
-  // If OffloadArch is present it can only appear as the 6th hypen
-  // sepearated field of Bundle Entry ID. So, pad required number of
-  // hyphens in Triple.
-  for (int i = 4 - StringRef(NormalizedTriple).count("-"); i > 0; i--)
-Triples += "-";
+Triples += '-';
+Triples += CurTC->getTriple().normalize();
+if ((CurKind == Action::OFK_HIP || CurKind == Action::OFK_OpenMP ||
+ CurKind == Action::OFK_Cuda) &&
+CurDep->getOffloadingArch()) {
+  Triples += '-';
   Triples += CurDep->getOffloadingArch();
 }
   }
@@ -7768,17 +7764,13 @@ void OffloadBundler::ConstructJobMultipleOutputs(
 
 auto  = DepInfo[I];
 Triples += Action::GetOffloadKindName(Dep.DependentOffloadKind);
-Triples += "-";
-std::string NormalizedTriple =
-Dep.DependentToolChain->getTriple().normalize();
-Triples += NormalizedTriple;
-
-if (!Dep.DependentBoundArch.empty()) {
-  // If OffloadArch is present it can only appear as the 6th hypen
-  // sepearated field of Bundle Entry ID. So, pad required number of
-  // hyphens in Triple.
-  for (int i = 4 - StringRef(NormalizedTriple).count("-"); i > 0; i--)
-Triples += "-";
+Triples += '-';
+Triples += Dep.DependentToolChain->getTriple().normalize();
+if ((Dep.DependentOffloadKind == Action::OFK_HIP ||
+ Dep.DependentOffloadKind == Action::OFK_OpenMP ||
+ Dep.DependentOffloadKind == Action::OFK_Cuda) &&
+!Dep.DependentBoundArch.empty()) {
+  Triples += '-';
   Triples += Dep.DependentBoundArch;
 }
   }

diff  --git a/clang/test/Driver/clang-offload-bundler.c 
b/clang/test/Driver/clang-offload-bundler.c
index e1afa19570ec3..d201dd4103892 100644
--- a/clang/test/Driver/clang-offload-bundler.c
+++ b/clang/test/Driver/clang-offload-bundler.c
@@ -382,16 +382,30 @@
 // Check archive unbundling
 //
 // Create few code object bundles and archive them to create an input archive
-// RUN: 

[clang] e158363 - [clang-nvlink-wrapper] Add documentation in clang docs

2021-09-06 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-09-06T11:43:58+05:30
New Revision: e15836361cdfeb3a717a2ebae94c68286111369b

URL: 
https://github.com/llvm/llvm-project/commit/e15836361cdfeb3a717a2ebae94c68286111369b
DIFF: 
https://github.com/llvm/llvm-project/commit/e15836361cdfeb3a717a2ebae94c68286111369b.diff

LOG: [clang-nvlink-wrapper] Add documentation in clang docs

Add documentation of clang-nvlink-wrapper tool in clang.
Add it to the release notes of clang. Fix a small MSVC
warning.

Differential Revision: https://reviews.llvm.org/D109225

Added: 
clang/docs/ClangNvlinkWrapper.rst

Modified: 
clang/docs/ReleaseNotes.rst
clang/docs/index.rst
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp

Removed: 




diff  --git a/clang/docs/ClangNvlinkWrapper.rst 
b/clang/docs/ClangNvlinkWrapper.rst
new file mode 100644
index 0..193c0d420a210
--- /dev/null
+++ b/clang/docs/ClangNvlinkWrapper.rst
@@ -0,0 +1,57 @@
+
+Clang Nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tool works as a wrapper over the ``nvlink`` program. It is required
+because ``nvlink`` does not support linking of archive files implicitly. It
+transparently passes every input option and object to ``nvlink`` except archive
+files. It reads each input archive file to extract the archived cubin files as
+temporary files. These temporary (*.cubin) files are passed to ``nvlink``.
+
+Use Case
+
+
+During linking of heterogeneous device archive libraries with an OpenMP
+program, the :doc:`ClangOffloadBundler` creates a device specific archive of
+cubin files. Such an archive is then passed to this wrapper tool to extract
+cubin files before passing to ``nvlink``.
+
+Working
+===
+
+**Inputs**
+
+  A command line generated by the OpenMP-Clang driver targeting NVPTX,
+  containing a set of flags, cubin object files, and zero or more archive
+  files.
+
+Example::
+
+  clang-nvlink-wrapper main.cubin /tmp/libTest-nvptx-sm_50.a -o main-linked.out
+
+**Processing**
+
+  1. From each archive file extract all cubin files as temporary files and
+ store their names in a list, `CubinFiles`.
+  2. Create a new command line, `NVLinkCommand`, such that
+ * Program is ``nvlink``
+ * All input flags are transparently passed on as flags
+ * All input archive file are replaced with `CubinFiles`
+  3. Execute NVLinkCommand
+
+::
+
+  1. Extract (libTest-nvptx-sm_50.a) => /tmp/a.cubin /tmp/b.cubin
+  2. nvlink -o a.out-openmp-nvptx64 main.cubin /tmp/a.cubin /tmp/b.cubin
+  
+**Output**
+
+  Output file generated by ``nvlink`` which links all cubin files.

diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c62cefc54a663..6caf24ca4ef77 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -147,7 +147,8 @@ ABI Changes in Clang
 OpenMP Support in Clang
 ---
 
-- ...
+- ``clang-nvlink-wrapper`` tool introduced to support linking of cubin files 
archived in an archive. See :doc:`ClangNvlinkWrapper`.
+
 
 CUDA Support in Clang
 -

diff  --git a/clang/docs/index.rst b/clang/docs/index.rst
index 149f5680b3338..bf598b1eda035 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -80,6 +80,7 @@ Using Clang Tools
ClangFormat
ClangFormatStyleOptions
ClangFormattedStatus
+   ClangNvlinkWrapper
ClangOffloadBundler
 
 Design Documents

diff  --git a/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp 
b/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
index 00c371e35e75c..5c8b7b9db6884 100644
--- a/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
+++ b/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
@@ -13,7 +13,7 @@
 /// These temp (*.cubin) files are passed to nvlink, because nvlink does not
 /// support linking of archive files implicitly.
 ///
-/// During linking of heteregenous device archive libraries, the
+/// During linking of heterogeneous device archive libraries, the
 /// clang-offload-bundler creates a device specific archive of cubin files.
 /// Such an archive is then passed to this tool to extract cubin files before
 /// passing to nvlink.
@@ -60,7 +60,7 @@ static Error extractArchiveFiles(StringRef Filename,
   std::vector> ArchiveBuffers;
 
   ErrorOr> BufOrErr =
-  MemoryBuffer::getFileOrSTDIN(Filename, -1, false);
+  MemoryBuffer::getFileOrSTDIN(Filename, false, false);
   if (std::error_code EC = BufOrErr.getError())
 return createFileError(Filename, EC);
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 83f3782 - [clang-nvlink-wrapper] Wrapper around nvlink for archive files

2021-09-01 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-09-01T16:00:29+05:30
New Revision: 83f3782c6129e7a5df3faaf0ae576611d16a8d49

URL: 
https://github.com/llvm/llvm-project/commit/83f3782c6129e7a5df3faaf0ae576611d16a8d49
DIFF: 
https://github.com/llvm/llvm-project/commit/83f3782c6129e7a5df3faaf0ae576611d16a8d49.diff

LOG: [clang-nvlink-wrapper] Wrapper around nvlink for archive files

 nvlink does not support linking of cubin files archived in an archive.
 This tool extracts all the cubin files in the given device specific archive
 and pass them to nvlink. It is required for linking static device libraries
 for nvptx.

 Reviewed By: ye-luo

 Differential Revision: https://reviews.llvm.org/D108291

Added: 
clang/tools/clang-nvlink-wrapper/CMakeLists.txt
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp

Modified: 
clang/tools/CMakeLists.txt

Removed: 




diff  --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index c929f6e665e2c..38b7496b97f72 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -8,6 +8,7 @@ add_clang_subdirectory(clang-format)
 add_clang_subdirectory(clang-format-vs)
 add_clang_subdirectory(clang-fuzzer)
 add_clang_subdirectory(clang-import-test)
+add_clang_subdirectory(clang-nvlink-wrapper)
 add_clang_subdirectory(clang-offload-bundler)
 add_clang_subdirectory(clang-offload-wrapper)
 add_clang_subdirectory(clang-scan-deps)

diff  --git a/clang/tools/clang-nvlink-wrapper/CMakeLists.txt 
b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
new file mode 100644
index 0..033392f1c2bdc
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
@@ -0,0 +1,25 @@
+set(LLVM_LINK_COMPONENTS BitWriter Core Object Support)
+
+if(NOT CLANG_BUILT_STANDALONE)
+  set(tablegen_deps intrinsics_gen)
+endif()
+
+add_clang_executable(clang-nvlink-wrapper
+  ClangNvlinkWrapper.cpp
+
+  DEPENDS
+  ${tablegen_deps}
+  )
+
+set(CLANG_NVLINK_WRAPPER_LIB_DEPS
+  clangBasic
+  )
+
+add_dependencies(clang clang-nvlink-wrapper)
+
+target_link_libraries(clang-nvlink-wrapper
+  PRIVATE
+  ${CLANG_NVLINK_WRAPPER_LIB_DEPS}
+  )
+
+install(TARGETS clang-nvlink-wrapper RUNTIME DESTINATION bin)

diff  --git a/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp 
b/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
new file mode 100644
index 0..00c371e35e75c
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
@@ -0,0 +1,164 @@
+//===-- clang-nvlink-wrapper/ClangNvlinkWrapper.cpp - wrapper over nvlink-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===-===//
+///
+/// \file
+/// This tool works as a wrapper over nvlink program. It transparently passes
+/// every input option and objects to nvlink except archive files. It reads
+/// each input archive file to extract archived cubin files as temporary files.
+/// These temp (*.cubin) files are passed to nvlink, because nvlink does not
+/// support linking of archive files implicitly.
+///
+/// During linking of heteregenous device archive libraries, the
+/// clang-offload-bundler creates a device specific archive of cubin files.
+/// Such an archive is then passed to this tool to extract cubin files before
+/// passing to nvlink.
+///
+/// Example:
+/// clang-nvlink-wrapper -o a.out-openmp-nvptx64 /tmp/libTest-nvptx-sm_50.a
+///
+/// 1. Extract (libTest-nvptx-sm_50.a) => /tmp/a.cubin /tmp/b.cubin
+/// 2. nvlink -o a.out-openmp-nvptx64 /tmp/a.cubin /tmp/b.cubin
+//===-===//
+
+#include "llvm/Object/Archive.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Errc.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/StringSaver.h"
+#include "llvm/Support/WithColor.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace llvm;
+
+static cl::opt Help("h", cl::desc("Alias for -help"), cl::Hidden);
+
+static Error runNVLink(std::string NVLinkPath,
+   SmallVectorImpl ) {
+  std::vector NVLArgs;
+  NVLArgs.push_back(NVLinkPath);
+  for (auto  : Args) {
+NVLArgs.push_back(Arg);
+  }
+
+  if (sys::ExecuteAndWait(NVLinkPath.c_str(), NVLArgs))
+return createStringError(inconvertibleErrorCode(), "'nvlink' failed");
+  return Error::success();
+}
+
+static Error extractArchiveFiles(StringRef Filename,
+ SmallVectorImpl ,
+ SmallVectorImpl ) {
+  std::vector> ArchiveBuffers;
+
+  ErrorOr> BufOrErr =
+  MemoryBuffer::getFileOrSTDIN(Filename, -1, 

[clang] 94d5f2a - [Clang] Add test dependency on llvm-ar

2021-07-07 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-07-07T14:30:57+05:30
New Revision: 94d5f2afbef0bc18cf92f6d147c336da990e7a40

URL: 
https://github.com/llvm/llvm-project/commit/94d5f2afbef0bc18cf92f6d147c336da990e7a40
DIFF: 
https://github.com/llvm/llvm-project/commit/94d5f2afbef0bc18cf92f6d147c336da990e7a40.diff

LOG: [Clang] Add test dependency on llvm-ar

Following clang driver tests invoke llvm-ar, so ensure that
check-clang rebuilds llvm-ar.

 * test/Driver/clang-offload-bundler.c
 * test/Driver/hip-link-save-temps.hip
 * test/Driver/hip-link-static-library.hip
 * test/Driver/hip-toolchain-rdc-static-lib.hip

Differential Revision: https://reviews.llvm.org/D105285

Added: 


Modified: 
clang/test/CMakeLists.txt

Removed: 




diff  --git a/clang/test/CMakeLists.txt b/clang/test/CMakeLists.txt
index 8e3460f75c37..e2f6d6772dea 100644
--- a/clang/test/CMakeLists.txt
+++ b/clang/test/CMakeLists.txt
@@ -108,6 +108,7 @@ if( NOT CLANG_BUILT_STANDALONE )
 llvm-config
 FileCheck count not
 llc
+llvm-ar
 llvm-as
 llvm-bcanalyzer
 llvm-cat



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] f7ce532 - [clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives

2021-06-30 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2021-06-30T17:55:50+05:30
New Revision: f7ce532d622dc26eddd25f87faec0ff35dc0c2e9

URL: 
https://github.com/llvm/llvm-project/commit/f7ce532d622dc26eddd25f87faec0ff35dc0c2e9
DIFF: 
https://github.com/llvm/llvm-project/commit/f7ce532d622dc26eddd25f87faec0ff35dc0c2e9.diff

LOG: [clang-offload-bundler] Add unbundling of archives containing bundled 
object files into device specific archives

This patch adds unbundling support of an archive file. It takes an
archive file along with a set of offload targets as input.
Output is a device specific archive for each given offload target.
Input archive contains bundled code objects bundled using
clang-offload-bundler. Each generated device specific archive contains
a set of device code object files which are named as
-.

Entries in input archive can be of any binary type which is
supported by clang-offload-bundler, like *.bc. Output archives will
contain files in same type.

Example Usuage:
  clang-offload-bundler --unbundle --inputs=lib-generic.a -type=a
  -targets=openmp-amdgcn-amdhsa--gfx906,openmp-amdgcn-amdhsa--gfx908
  -outputs=devicelib-gfx906.a,deviceLib-gfx908.a

Reviewed By: jdoerfert, yaxunl

Differential Revision: https://reviews.llvm.org/D93525

Added: 


Modified: 
clang/docs/ClangOffloadBundler.rst
clang/lib/Driver/ToolChains/Clang.cpp
clang/test/Driver/clang-offload-bundler.c
clang/test/Driver/hip-rdc-device-only.hip
clang/test/Driver/hip-toolchain-rdc-separate.hip
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Removed: 




diff  --git a/clang/docs/ClangOffloadBundler.rst 
b/clang/docs/ClangOffloadBundler.rst
index 68c5116b235f4..c92d8a94cfb54 100644
--- a/clang/docs/ClangOffloadBundler.rst
+++ b/clang/docs/ClangOffloadBundler.rst
@@ -121,7 +121,15 @@ Where:
   = 
==
 
 **target-triple**
-  The target triple of the code object.
+The target triple of the code object:
+
+.. code::
+
+  ---
+
+It is required to have all four components present, if target-id is present.
+Components are hyphen separated. If a component is not specified then the
+empty string must be used in its place.
 
 **target-id**
   The canonical target ID of the code object. Present only if the target

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index c265e1c4e53cb..00939eae42998 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7629,10 +7629,16 @@ void OffloadBundler::ConstructJob(Compilation , const 
JobAction ,
   });
 }
 Triples += Action::GetOffloadKindName(CurKind);
-Triples += '-';
-Triples += CurTC->getTriple().normalize();
-if (CurKind == Action::OFK_HIP && CurDep->getOffloadingArch()) {
-  Triples += '-';
+Triples += "-";
+std::string NormalizedTriple = CurTC->getTriple().normalize();
+Triples += NormalizedTriple;
+
+if (CurDep->getOffloadingArch() != nullptr) {
+  // If OffloadArch is present it can only appear as the 6th hypen
+  // sepearated field of Bundle Entry ID. So, pad required number of
+  // hyphens in Triple.
+  for (int i = 4 - StringRef(NormalizedTriple).count("-"); i > 0; i--)
+Triples += "-";
   Triples += CurDep->getOffloadingArch();
 }
   }
@@ -7702,11 +7708,17 @@ void OffloadBundler::ConstructJobMultipleOutputs(
 
 auto  = DepInfo[I];
 Triples += Action::GetOffloadKindName(Dep.DependentOffloadKind);
-Triples += '-';
-Triples += Dep.DependentToolChain->getTriple().normalize();
-if (Dep.DependentOffloadKind == Action::OFK_HIP &&
-!Dep.DependentBoundArch.empty()) {
-  Triples += '-';
+Triples += "-";
+std::string NormalizedTriple =
+Dep.DependentToolChain->getTriple().normalize();
+Triples += NormalizedTriple;
+
+if (!Dep.DependentBoundArch.empty()) {
+  // If OffloadArch is present it can only appear as the 6th hypen
+  // sepearated field of Bundle Entry ID. So, pad required number of
+  // hyphens in Triple.
+  for (int i = 4 - StringRef(NormalizedTriple).count("-"); i > 0; i--)
+Triples += "-";
   Triples += Dep.DependentBoundArch;
 }
   }

diff  --git a/clang/test/Driver/clang-offload-bundler.c 
b/clang/test/Driver/clang-offload-bundler.c
index faa6c5161a8f9..e1afa19570ec3 100644
--- a/clang/test/Driver/clang-offload-bundler.c
+++ b/clang/test/Driver/clang-offload-bundler.c
@@ -46,6 +46,7 @@
 // CK-HELP: {{.*}}bc {{.*}}- llvm-bc
 // CK-HELP: {{.*}}s {{.*}}- assembler
 // CK-HELP: {{.*}}o {{.*}}- object
+// CK-HELP: {{.*}}a {{.*}}- archive of objects
 // CK-HELP: {{.*}}gch {{.*}}- precompiled-header
 // CK-HELP: {{.*}}ast {{.*}}- clang AST file
 // CK-HELP: {{.*}}-unbundle {{.*}}- Unbundle bundled file into several output 
files.
@@ -103,6 +104,9 @@
 // RUN: 

[clang] a1ac047 - [OpenMP] Fix a failing test after D85214

2020-08-27 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-08-27T20:57:17Z
New Revision: a1ac047b3453f205eacc36adc787ac31b952a502

URL: 
https://github.com/llvm/llvm-project/commit/a1ac047b3453f205eacc36adc787ac31b952a502
DIFF: 
https://github.com/llvm/llvm-project/commit/a1ac047b3453f205eacc36adc787ac31b952a502.diff

LOG: [OpenMP] Fix a failing test after D85214

Removed version 45 testing from a failing test for now.

Added: 


Modified: 
clang/test/OpenMP/declare_target_ast_print.cpp

Removed: 




diff  --git a/clang/test/OpenMP/declare_target_ast_print.cpp 
b/clang/test/OpenMP/declare_target_ast_print.cpp
index 4831b3bde7a4..c086f8526147 100644
--- a/clang/test/OpenMP/declare_target_ast_print.cpp
+++ b/clang/test/OpenMP/declare_target_ast_print.cpp
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -I %S/Inputs 
-ast-print %s | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-version=45 -x c++ -std=c++11 -I %S/Inputs 
-emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-version=45 -std=c++11 -include-pch %t 
-fsyntax-only -I %S/Inputs -verify %s -ast-print | FileCheck %s
+// RUN: %clang_cc1 -verify -fopenmp -I %S/Inputs -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -I %S/Inputs -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -std=c++11 -include-pch %t -fsyntax-only -I 
%S/Inputs -verify %s -ast-print | FileCheck %s
 
 // RUN: %clang_cc1 -verify -fopenmp -I %S/Inputs -ast-print %s | FileCheck %s 
--check-prefix=CHECK --check-prefix=OMP50
 // RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -I %S/Inputs -emit-pch -o %t %s



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] eaa341f - [OpenMP] Ensure testing for versions 4.5 and default - Part 1

2020-08-13 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-08-13T07:37:10Z
New Revision: eaa341fbea961894759355256d25d785509002ef

URL: 
https://github.com/llvm/llvm-project/commit/eaa341fbea961894759355256d25d785509002ef
DIFF: 
https://github.com/llvm/llvm-project/commit/eaa341fbea961894759355256d25d785509002ef.diff

LOG: [OpenMP] Ensure testing for versions 4.5 and default - Part 1

Many OpenMP Clang tests do not RUN for version 4.5 and the default
version. This first patch in the series only handles test cases
which do not require any modifications in the CHECK lines after
adding RUN lines for default version.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84844

Added: 


Modified: 
clang/test/OpenMP/cancel_ast_print.cpp
clang/test/OpenMP/cancel_codegen.cpp
clang/test/OpenMP/cancel_codegen_cleanup.cpp
clang/test/OpenMP/cancel_if_messages.cpp
clang/test/OpenMP/capturing_in_templates.cpp
clang/test/OpenMP/distribute_parallel_for_if_codegen.cpp
clang/test/OpenMP/distribute_parallel_for_if_messages.cpp
clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp
clang/test/OpenMP/distribute_parallel_for_reduction_codegen.cpp
clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
clang/test/OpenMP/nvptx_target_requires_unified_shared_memory.cpp
clang/test/OpenMP/parallel_default_messages.cpp
clang/test/OpenMP/parallel_for_if_messages.cpp
clang/test/OpenMP/parallel_if_codegen.cpp
clang/test/OpenMP/parallel_if_messages.cpp
clang/test/OpenMP/parallel_master_if_messages.cpp
clang/test/OpenMP/parallel_sections_if_messages.cpp
clang/test/OpenMP/report_default_DSA.cpp
clang/test/OpenMP/target_ast_print.cpp
clang/test/OpenMP/target_enter_data_ast_print.cpp
clang/test/OpenMP/target_enter_data_if_messages.cpp
clang/test/OpenMP/target_exit_data_ast_print.cpp
clang/test/OpenMP/target_exit_data_if_messages.cpp
clang/test/OpenMP/target_if_messages.cpp
clang/test/OpenMP/target_parallel_codegen.cpp
clang/test/OpenMP/target_parallel_for_codegen.cpp
clang/test/OpenMP/target_parallel_if_messages.cpp
clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
clang/test/OpenMP/target_teams_distribute_codegen.cpp
clang/test/OpenMP/target_teams_distribute_if_messages.cpp
clang/test/OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
clang/test/OpenMP/target_teams_distribute_parallel_for_if_messages.cpp
clang/test/OpenMP/target_teams_if_messages.cpp
clang/test/OpenMP/target_teams_num_teams_codegen.cpp
clang/test/OpenMP/target_teams_thread_limit_codegen.cpp
clang/test/OpenMP/target_update_if_messages.cpp
clang/test/OpenMP/task_if_codegen.cpp
clang/test/OpenMP/task_if_messages.cpp
clang/test/OpenMP/teams_distribute_parallel_for_if_codegen.cpp
clang/test/OpenMP/teams_distribute_parallel_for_if_messages.cpp
clang/test/OpenMP/teams_distribute_parallel_for_num_threads_codegen.cpp
clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_codegen.cpp

Removed: 




diff  --git a/clang/test/OpenMP/cancel_ast_print.cpp 
b/clang/test/OpenMP/cancel_ast_print.cpp
index b376d4ec2807..f5173ed4ca51 100644
--- a/clang/test/OpenMP/cancel_ast_print.cpp
+++ b/clang/test/OpenMP/cancel_ast_print.cpp
@@ -5,6 +5,15 @@
 // RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=45 -ast-print %s | 
FileCheck %s
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -x c++ -std=c++11 
-emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -std=c++11 -include-pch 
%t -fsyntax-only -verify %s -ast-print | FileCheck %s
+
+// RUN: %clang_cc1 -verify -fopenmp -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -std=c++11 -include-pch %t -fsyntax-only -verify 
%s -ast-print | FileCheck %s
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -std=c++11 -include-pch %t -fsyntax-only 
-verify %s -ast-print | FileCheck %s
+
 // expected-no-diagnostics
 
 #ifndef HEADER

diff  --git a/clang/test/OpenMP/cancel_codegen.cpp 
b/clang/test/OpenMP/cancel_codegen.cpp
index 0942c7cf4236..80e2e294a60c 100644
--- a/clang/test/OpenMP/cancel_codegen.cpp
+++ b/clang/test/OpenMP/cancel_codegen.cpp
@@ -10,6 +10,20 @@
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -x c++ -std=c++11 -triple 
x86_64-apple-darwin13.4.0 -emit-pch -o %t %s
 // RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -std=c++11 -include-pch 
%t -fsyntax-only -verify %s -triple x86_64-apple-darwin13.4.0 -emit-llvm -o - | 
FileCheck --check-prefix SIMD-ONLY0 %s
 // SIMD-ONLY0-NOT: {{__kmpc|__tgt}}
+
+// RUN: %clang_cc1 -verify -fopenmp -triple x86_64-apple-darwin13.4.0 
-emit-llvm -o - %s | FileCheck %s 

[clang] 160ff83 - [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3

2020-08-02 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-08-03T05:38:39Z
New Revision: 160ff83765ac284f3c7dd7b25d4ef105b9952ac0

URL: 
https://github.com/llvm/llvm-project/commit/160ff83765ac284f3c7dd7b25d4ef105b9952ac0
DIFF: 
https://github.com/llvm/llvm-project/commit/160ff83765ac284f3c7dd7b25d4ef105b9952ac0.diff

LOG: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3

Provides AMDGCN and NVPTX specific specialization of getGPUWarpSize,
getGPUThreadID, and getGPUNumThreads methods. Adds tests for AMDGCN
codegen for these methods in generic and simd modes. Also changes the
precondition in InitTempAlloca to be slightly more permissive. Useful for
AMDGCN OpenMP codegen where allocas are created with a cast to an
address space.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84260

Added: 
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
clang/test/OpenMP/amdgcn_target_codegen.cpp
clang/test/OpenMP/amdgcn_target_init_temp_alloca.cpp

Modified: 
clang/lib/CodeGen/CGExpr.cpp
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
clang/lib/CodeGen/CMakeLists.txt
clang/lib/CodeGen/CodeGenModule.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index ab29e32929ce..5d74d91065f5 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -125,8 +125,13 @@ Address 
CodeGenFunction::CreateDefaultAlignTempAlloca(llvm::Type *Ty,
 }
 
 void CodeGenFunction::InitTempAlloca(Address Var, llvm::Value *Init) {
-  assert(isa(Var.getPointer()));
-  auto *Store = new llvm::StoreInst(Init, Var.getPointer(), /*volatile*/ false,
+  auto *Alloca = Var.getPointer();
+  assert(isa(Alloca) ||
+ (isa(Alloca) &&
+  isa(
+  cast(Alloca)->getPointerOperand(;
+
+  auto *Store = new llvm::StoreInst(Init, Alloca, /*volatile*/ false,
 Var.getAlignment().getAsAlign());
   llvm::BasicBlock *Block = AllocaInsertPt->getParent();
   Block->getInstList().insertAfter(AllocaInsertPt->getIterator(), Store);

diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
new file mode 100644
index ..ccffdf43549f
--- /dev/null
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
@@ -0,0 +1,61 @@
+//===-- CGOpenMPRuntimeAMDGCN.cpp - Interface to OpenMP AMDGCN Runtimes --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This provides a class for OpenMP runtime code generation specialized to
+// AMDGCN targets from generalized CGOpenMPRuntimeGPU class.
+//
+//===--===//
+
+#include "CGOpenMPRuntimeAMDGCN.h"
+#include "CGOpenMPRuntimeGPU.h"
+#include "CodeGenFunction.h"
+#include "clang/AST/Attr.h"
+#include "clang/AST/DeclOpenMP.h"
+#include "clang/AST/StmtOpenMP.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/Basic/Cuda.h"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
+
+using namespace clang;
+using namespace CodeGen;
+using namespace llvm::omp;
+
+CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule )
+: CGOpenMPRuntimeGPU(CGM) {
+  if (!CGM.getLangOpts().OpenMPIsDevice)
+llvm_unreachable("OpenMP AMDGCN can only handle device code.");
+}
+
+llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction ) {
+  CGBuilderTy  = CGF.Builder;
+  // return constant compile-time target-specific warp size
+  unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
+  return Bld.getInt32(WarpSize);
+}
+
+llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUThreadID(CodeGenFunction ) {
+  CGBuilderTy  = CGF.Builder;
+  llvm::Function *F =
+  CGF.CGM.getIntrinsic(llvm::Intrinsic::amdgcn_workitem_id_x);
+  return Bld.CreateCall(F, llvm::None, "nvptx_tid");
+}
+
+llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUNumThreads(CodeGenFunction ) {
+  CGBuilderTy  = CGF.Builder;
+  llvm::Module *M = ();
+  const char *LocSize = "__ockl_get_local_size";
+  llvm::Function *F = M->getFunction(LocSize);
+  if (!F) {
+F = llvm::Function::Create(
+llvm::FunctionType::get(CGF.Int64Ty, {CGF.Int32Ty}, false),
+llvm::GlobalVariable::ExternalLinkage, LocSize, ());
+  }
+  return Bld.CreateTrunc(
+  Bld.CreateCall(F, {Bld.getInt32(0)}, "nvptx_num_threads"), CGF.Int32Ty);
+}

diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h 
b/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
new file mode 100644
index 

[Differential] D83492: [OpenMP] Use common interface to access GPU Grid Values

2020-07-21 Thread Saiyedul Islam via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGfc7d2908ab38: [OpenMP] Use common interface to access GPU 
Grid Values (authored by saiislam).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83492/new/


  https://reviews.llvm.org/D83492

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp



Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -20,6 +20,7 @@
 #include "clang/AST/StmtVisitor.h"
 #include "clang/Basic/Cuda.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/Frontend/OpenMP/OMPGridValues.h"
 #include "llvm/IR/IntrinsicsNVPTX.h"
 
 using namespace clang;
@@ -195,11 +196,9 @@
 /// code.  For all practical purposes this is fine because the configuration
 /// is the same for all known NVPTX architectures.
 enum MachineConfiguration : unsigned {
-  WarpSize = 32,
-  /// Number of bits required to represent a lane identifier, which is
-  /// computed as log_2(WarpSize).
-  LaneIDBits = 5,
-  LaneIDMask = WarpSize - 1,
+  /// See "llvm/Frontend/OpenMP/OMPGridValues.h" for various related target
+  /// specific Grid Values like GV_Warp_Size, GV_Warp_Size_Log2,
+  /// and GV_Warp_Size_Log2_Mask.
 
   /// Global memory alignment for performance.
   GlobalMemoryAlignment = 128,
@@ -431,6 +430,7 @@
 assert(!GlobalizedRD &&
"Record for globalized variables is built already.");
 ArrayRef EscapedDeclsForParallel, EscapedDeclsForTeams;
+unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
 if (IsInTTDRegion)
   EscapedDeclsForTeams = EscapedDecls.getArrayRef();
 else
@@ -634,6 +634,8 @@
 /// on the NVPTX device, to generate more efficient code.
 static llvm::Value *getNVPTXWarpID(CodeGenFunction ) {
   CGBuilderTy  = CGF.Builder;
+  unsigned LaneIDBits =
+  CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size_Log2);
   return Bld.CreateAShr(getNVPTXThreadID(CGF), LaneIDBits, "nvptx_warp_id");
 }
 
@@ -642,6 +644,8 @@
 /// on the NVPTX device, to generate more efficient code.
 static llvm::Value *getNVPTXLaneID(CodeGenFunction ) {
   CGBuilderTy  = CGF.Builder;
+  unsigned LaneIDMask = CGF.getContext().getTargetInfo().getGridValue(
+  llvm::omp::GV_Warp_Size_Log2_Mask);
   return Bld.CreateAnd(getNVPTXThreadID(CGF), Bld.getInt32(LaneIDMask),
"nvptx_lane_id");
 }
@@ -2058,6 +2062,7 @@
   const RecordDecl *GlobalizedRD = nullptr;
   llvm::SmallVector LastPrivatesReductions;
   llvm::SmallDenseMap MappedDeclsFields;
+  unsigned WarpSize = CGM.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
   // Globalize team reductions variable unconditionally in all modes.
   if (getExecutionMode() != CGOpenMPRuntimeGPU::EM_SPMD)
 getTeamsReductionVars(CGM.getContext(), D, LastPrivatesReductions);
@@ -3233,6 +3238,7 @@
   "__openmp_nvptx_data_transfer_temporary_storage";
   llvm::GlobalVariable *TransferMedium =
   M.getGlobalVariable(TransferMediumName);
+  unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
   if (!TransferMedium) {
 auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);
 unsigned SharedAddressSpace = C.getTargetAddressSpace(LangAS::cuda_shared);


Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -20,6 +20,7 @@
 #include "clang/AST/StmtVisitor.h"
 #include "clang/Basic/Cuda.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/Frontend/OpenMP/OMPGridValues.h"
 #include "llvm/IR/IntrinsicsNVPTX.h"
 
 using namespace clang;
@@ -195,11 +196,9 @@
 /// code.  For all practical purposes this is fine because the configuration
 /// is the same for all known NVPTX architectures.
 enum MachineConfiguration : unsigned {
-  WarpSize = 32,
-  /// Number of bits required to represent a lane identifier, which is
-  /// computed as log_2(WarpSize).
-  LaneIDBits = 5,
-  LaneIDMask = WarpSize - 1,
+  /// See "llvm/Frontend/OpenMP/OMPGridValues.h" for various related target
+  /// specific Grid Values like GV_Warp_Size, GV_Warp_Size_Log2,
+  /// and GV_Warp_Size_Log2_Mask.
 
   /// Global memory alignment for performance.
   GlobalMemoryAlignment = 128,
@@ -431,6 +430,7 @@
 assert(!GlobalizedRD &&
"Record for globalized variables is built already.");
 ArrayRef EscapedDeclsForParallel, EscapedDeclsForTeams;
+unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
 if (IsInTTDRegion)
   EscapedDeclsForTeams = EscapedDecls.getArrayRef();
 else
@@ -634,6 +634,8 @@
 /// on the NVPTX device, to generate more efficient code.
 static llvm::Value *getNVPTXWarpID(CodeGenFunction ) {
   

[clang] fc7d290 - [OpenMP] Use common interface to access GPU Grid Values

2020-07-20 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-07-21T05:25:46Z
New Revision: fc7d2908ab38e1934b3b6a8ab3ec5c674484434b

URL: 
https://github.com/llvm/llvm-project/commit/fc7d2908ab38e1934b3b6a8ab3ec5c674484434b
DIFF: 
https://github.com/llvm/llvm-project/commit/fc7d2908ab38e1934b3b6a8ab3ec5c674484434b.diff

LOG: [OpenMP] Use common interface to access GPU Grid Values

Use common interface for accessing target specific GPU grid values in NVPTX
OpenMP codegen as proposed in https://reviews.llvm.org/D80917

Originally authored by Greg Rodgers (@gregrodgers).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83492

Added: 


Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 92eca33ee97d..1cd89c540f47 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -20,6 +20,7 @@
 #include "clang/AST/StmtVisitor.h"
 #include "clang/Basic/Cuda.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/Frontend/OpenMP/OMPGridValues.h"
 #include "llvm/IR/IntrinsicsNVPTX.h"
 
 using namespace clang;
@@ -195,11 +196,9 @@ class ExecutionRuntimeModesRAII {
 /// code.  For all practical purposes this is fine because the configuration
 /// is the same for all known NVPTX architectures.
 enum MachineConfiguration : unsigned {
-  WarpSize = 32,
-  /// Number of bits required to represent a lane identifier, which is
-  /// computed as log_2(WarpSize).
-  LaneIDBits = 5,
-  LaneIDMask = WarpSize - 1,
+  /// See "llvm/Frontend/OpenMP/OMPGridValues.h" for various related target
+  /// specific Grid Values like GV_Warp_Size, GV_Warp_Size_Log2,
+  /// and GV_Warp_Size_Log2_Mask.
 
   /// Global memory alignment for performance.
   GlobalMemoryAlignment = 128,
@@ -431,6 +430,7 @@ class CheckVarsEscapingDeclContext final
 assert(!GlobalizedRD &&
"Record for globalized variables is built already.");
 ArrayRef EscapedDeclsForParallel, EscapedDeclsForTeams;
+unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
 if (IsInTTDRegion)
   EscapedDeclsForTeams = EscapedDecls.getArrayRef();
 else
@@ -634,6 +634,8 @@ static llvm::Value *getNVPTXThreadID(CodeGenFunction ) {
 /// on the NVPTX device, to generate more efficient code.
 static llvm::Value *getNVPTXWarpID(CodeGenFunction ) {
   CGBuilderTy  = CGF.Builder;
+  unsigned LaneIDBits =
+  CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size_Log2);
   return Bld.CreateAShr(getNVPTXThreadID(CGF), LaneIDBits, "nvptx_warp_id");
 }
 
@@ -642,6 +644,8 @@ static llvm::Value *getNVPTXWarpID(CodeGenFunction ) {
 /// on the NVPTX device, to generate more efficient code.
 static llvm::Value *getNVPTXLaneID(CodeGenFunction ) {
   CGBuilderTy  = CGF.Builder;
+  unsigned LaneIDMask = CGF.getContext().getTargetInfo().getGridValue(
+  llvm::omp::GV_Warp_Size_Log2_Mask);
   return Bld.CreateAnd(getNVPTXThreadID(CGF), Bld.getInt32(LaneIDMask),
"nvptx_lane_id");
 }
@@ -2058,6 +2062,7 @@ llvm::Function 
*CGOpenMPRuntimeGPU::emitTeamsOutlinedFunction(
   const RecordDecl *GlobalizedRD = nullptr;
   llvm::SmallVector LastPrivatesReductions;
   llvm::SmallDenseMap MappedDeclsFields;
+  unsigned WarpSize = CGM.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
   // Globalize team reductions variable unconditionally in all modes.
   if (getExecutionMode() != CGOpenMPRuntimeGPU::EM_SPMD)
 getTeamsReductionVars(CGM.getContext(), D, LastPrivatesReductions);
@@ -3233,6 +3238,7 @@ static llvm::Value 
*emitInterWarpCopyFunction(CodeGenModule ,
   "__openmp_nvptx_data_transfer_temporary_storage";
   llvm::GlobalVariable *TransferMedium =
   M.getGlobalVariable(TransferMediumName);
+  unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
   if (!TransferMedium) {
 auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);
 unsigned SharedAddressSpace = C.getTargetAddressSpace(LangAS::cuda_shared);



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 0882c9d - [AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values

2020-07-07 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-07-07T06:36:25Z
New Revision: 0882c9d4fc49858338c9655154f1ad8357a8e516

URL: 
https://github.com/llvm/llvm-project/commit/0882c9d4fc49858338c9655154f1ad8357a8e516
DIFF: 
https://github.com/llvm/llvm-project/commit/0882c9d4fc49858338c9655154f1ad8357a8e516.diff

LOG: [AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned 
values

builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const 
char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned 
MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const 
char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned 
MemoryOrdering, const char *SyncScope)

As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83121

Added: 


Modified: 
clang/include/clang/Basic/BuiltinsAMDGPU.def
clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
clang/test/Sema/builtin-amdgcn-atomic-inc-dec-failure.cpp
clang/test/SemaOpenCL/builtins-amdgcn-error.cl

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 60be0525fabc..042a86368559 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -60,11 +60,11 @@ BUILTIN(__builtin_amdgcn_ds_gws_sema_br, "vUiUi", "n")
 BUILTIN(__builtin_amdgcn_ds_gws_sema_p, "vUi", "n")
 BUILTIN(__builtin_amdgcn_fence, "vUicC*", "n")
 
-BUILTIN(__builtin_amdgcn_atomic_inc32, "ZiZiD*ZiUicC*", "n")
-BUILTIN(__builtin_amdgcn_atomic_inc64, "WiWiD*WiUicC*", "n")
+BUILTIN(__builtin_amdgcn_atomic_inc32, "UZiUZiD*UZiUicC*", "n")
+BUILTIN(__builtin_amdgcn_atomic_inc64, "UWiUWiD*UWiUicC*", "n")
 
-BUILTIN(__builtin_amdgcn_atomic_dec32, "ZiZiD*ZiUicC*", "n")
-BUILTIN(__builtin_amdgcn_atomic_dec64, "WiWiD*WiUicC*", "n")
+BUILTIN(__builtin_amdgcn_atomic_dec32, "UZiUZiD*UZiUicC*", "n")
+BUILTIN(__builtin_amdgcn_atomic_dec64, "UWiUWiD*UWiUicC*", "n")
 
 // FIXME: Need to disallow constant address space.
 BUILTIN(__builtin_amdgcn_div_scale, "dddbb*", "n")

diff  --git a/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp 
b/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
index 535c3d754954..77ea3d485c8a 100644
--- a/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
+++ b/clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
@@ -2,9 +2,9 @@
 // RUN: %clang_cc1 %s -x hip -fcuda-is-device -emit-llvm -O0 -o - \
 // RUN:   -triple=amdgcn-amd-amdhsa  | opt -S | FileCheck %s
 
-__attribute__((device)) void test_non_volatile_parameter32(int *ptr) {
+__attribute__((device)) void test_non_volatile_parameter32(__UINT32_TYPE__ 
*ptr) {
   // CHECK-LABEL: test_non_volatile_parameter32
-  int res;
+  __UINT32_TYPE__ res;
   // CHECK: %ptr.addr = alloca i32*, align 8, addrspace(5)
   // CHECK-NEXT: %ptr.addr.ascast = addrspacecast i32* addrspace(5)* %ptr.addr 
to i32**
   // CHECK-NEXT: %res = alloca i32, align 4, addrspace(5)
@@ -25,9 +25,9 @@ __attribute__((device)) void 
test_non_volatile_parameter32(int *ptr) {
   res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, 
"workgroup");
 }
 
-__attribute__((device)) void test_non_volatile_parameter64(__INT64_TYPE__ 
*ptr) {
+__attribute__((device)) void test_non_volatile_parameter64(__UINT64_TYPE__ 
*ptr) {
   // CHECK-LABEL: test_non_volatile_parameter64
-  __INT64_TYPE__ res;
+  __UINT64_TYPE__ res;
   // CHECK: %ptr.addr = alloca i64*, align 8, addrspace(5)
   // CHECK-NEXT: %ptr.addr.ascast = addrspacecast i64* addrspace(5)* %ptr.addr 
to i64**
   // CHECK-NEXT: %res = alloca i64, align 8, addrspace(5)
@@ -48,9 +48,9 @@ __attribute__((device)) void 
test_non_volatile_parameter64(__INT64_TYPE__ *ptr)
   res = __builtin_amdgcn_atomic_dec64(ptr, *ptr, __ATOMIC_SEQ_CST, 
"workgroup");
 }
 
-__attribute__((device)) void test_volatile_parameter32(volatile int *ptr) {
+__attribute__((device)) void test_volatile_parameter32(volatile 
__UINT32_TYPE__ *ptr) {
   // CHECK-LABEL: test_volatile_parameter32
-  int res;
+  __UINT32_TYPE__ res;
   // CHECK: %ptr.addr = alloca i32*, align 8, addrspace(5)
   // CHECK-NEXT: %ptr.addr.ascast = addrspacecast i32* addrspace(5)* %ptr.addr 
to i32**
   // CHECK-NEXT: %res = alloca i32, align 4, addrspace(5)
@@ -71,9 +71,9 @@ __attribute__((device)) void 
test_volatile_parameter32(volatile int *ptr) {
   res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, 
"workgroup");
 }
 
-__attribute__((device)) void test_volatile_parameter64(volatile __INT64_TYPE__ 
*ptr) {
+__attribute__((device)) void test_volatile_parameter64(volatile 
__UINT64_TYPE__ *ptr) {
   // CHECK-LABEL: 

[clang] 4022bc2 - [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 2

2020-06-10 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-06-10T18:09:59Z
New Revision: 4022bc2a6c5e585d99a76b1b2f14c2986b823ce9

URL: 
https://github.com/llvm/llvm-project/commit/4022bc2a6c5e585d99a76b1b2f14c2986b823ce9
DIFF: 
https://github.com/llvm/llvm-project/commit/4022bc2a6c5e585d99a76b1b2f14c2986b823ce9.diff

LOG: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 2

Summary:
New file include to support platform dependent grid constants. It will be
used by clang, libomptarget plugins, and deviceRTLs to access constant
values consistently and with fast access in the deviceRTLs.

Originally authored by Greg Rodgers (@gregrodgers).

Reviewers: arsenm, sameerds, jdoerfert, yaxunl, b-sumner, scchan, 
JonChesterfield

Reviewed By: arsenm

Subscribers: llvm-commits, pdhaliwal, jholewinski, jvesely, wdng, nhaehnle, 
guansong, kerbowa, sstefan1, cfe-commits, ronlieb, gregrodgers

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80917

Added: 
llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h

Modified: 
clang/include/clang/Basic/TargetInfo.h
clang/lib/Basic/Targets/AMDGPU.cpp
clang/lib/Basic/Targets/NVPTX.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/TargetInfo.h 
b/clang/include/clang/Basic/TargetInfo.h
index 4a7257d1b426..b1a4017b4c76 100644
--- a/clang/include/clang/Basic/TargetInfo.h
+++ b/clang/include/clang/Basic/TargetInfo.h
@@ -15,6 +15,7 @@
 #define LLVM_CLANG_BASIC_TARGETINFO_H
 
 #include "clang/Basic/AddressSpaces.h"
+#include "clang/Basic/CodeGenOptions.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/LangOptions.h"
 #include "clang/Basic/Specifiers.h"
@@ -29,6 +30,7 @@
 #include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Triple.h"
+#include "llvm/Frontend/OpenMP/OMPGridValues.h"
 #include "llvm/Support/DataTypes.h"
 #include "llvm/Support/VersionTuple.h"
 #include 
@@ -198,6 +200,9 @@ class TargetInfo : public virtual TransferrableTargetInfo,
   unsigned char RegParmMax, SSERegParmMax;
   TargetCXXABI TheCXXABI;
   const LangASMap *AddrSpaceMap;
+  const unsigned *GridValues =
+  nullptr; // Array of target-specific GPU grid values that must be
+   // consistent between host RTL (plugin), device RTL, and clang.
 
   mutable StringRef PlatformName;
   mutable VersionTuple PlatformMinVersion;
@@ -1321,6 +1326,12 @@ class TargetInfo : public virtual 
TransferrableTargetInfo,
 return LangAS::Default;
   }
 
+  /// Return a target-specific GPU grid value based on the GVIDX enum \param gv
+  unsigned getGridValue(llvm::omp::GVIDX gv) const {
+assert(GridValues != nullptr && "GridValues not initialized");
+return GridValues[gv];
+  }
+
   /// Retrieve the name of the platform as it is used in the
   /// availability attribute.
   StringRef getPlatformName() const { return PlatformName; }

diff  --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index b9d7640a10b8..8017b9ee402f 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -17,6 +17,7 @@
 #include "clang/Basic/MacroBuilder.h"
 #include "clang/Basic/TargetBuiltins.h"
 #include "llvm/ADT/StringSwitch.h"
+#include "llvm/Frontend/OpenMP/OMPGridValues.h"
 #include "llvm/IR/DataLayout.h"
 
 using namespace clang;
@@ -286,6 +287,7 @@ AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple 
,
   resetDataLayout(isAMDGCN(getTriple()) ? DataLayoutStringAMDGCN
 : DataLayoutStringR600);
   assert(DataLayout->getAllocaAddrSpace() == Private);
+  GridValues = llvm::omp::AMDGPUGpuGridValues;
 
   setAddressSpaceMap(Triple.getOS() == llvm::Triple::Mesa3D ||
  !isAMDGCN(Triple));

diff  --git a/clang/lib/Basic/Targets/NVPTX.cpp 
b/clang/lib/Basic/Targets/NVPTX.cpp
index 39b07872b142..fda9fad777d4 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -16,6 +16,7 @@
 #include "clang/Basic/MacroBuilder.h"
 #include "clang/Basic/TargetBuiltins.h"
 #include "llvm/ADT/StringSwitch.h"
+#include "llvm/Frontend/OpenMP/OMPGridValues.h"
 
 using namespace clang;
 using namespace clang::targets;
@@ -62,6 +63,7 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple ,
   TLSSupported = false;
   VLASupported = false;
   AddrSpaceMap = 
+  GridValues = llvm::omp::NVPTXGpuGridValues;
   UseAddrSpaceMapMangling = true;
 
   // Define available target features

diff  --git a/llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
new file mode 100644
index ..3ae4a2edbf96
--- /dev/null
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
@@ -0,0 +1,131 @@
+//--- OMPGridValues.h - Language-specific address spaces --*- C++ -*-//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois 

[clang] 675cefb - [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics

2020-06-09 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-06-09T17:02:58Z
New Revision: 675cefbf60270f59057972e33365a09590fb3694

URL: 
https://github.com/llvm/llvm-project/commit/675cefbf60270f59057972e33365a09590fb3694
DIFF: 
https://github.com/llvm/llvm-project/commit/675cefbf60270f59057972e33365a09590fb3694.diff

LOG: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec 
intrinsics

Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const 
char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned 
MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const 
char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned 
MemoryOrdering, const char *SyncScope)

First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.

Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert

Reviewed By: arsenm, sameerds

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, 
jfb, kerbowa, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80804

Added: 
clang/test/CodeGenCXX/builtin-amdgcn-atomic-inc-dec.cpp
clang/test/Sema/builtin-amdgcn-atomic-inc-dec-failure.cpp

Modified: 
clang/include/clang/Basic/BuiltinsAMDGPU.def
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/CodeGen/CodeGenFunction.h
clang/lib/Sema/SemaChecking.cpp
clang/test/SemaOpenCL/builtins-amdgcn-error.cl

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 28379142b05a..9add10c64962 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -60,6 +60,12 @@ BUILTIN(__builtin_amdgcn_ds_gws_sema_br, "vUiUi", "n")
 BUILTIN(__builtin_amdgcn_ds_gws_sema_p, "vUi", "n")
 BUILTIN(__builtin_amdgcn_fence, "vUicC*", "n")
 
+BUILTIN(__builtin_amdgcn_atomic_inc32, "ZiZiD*ZiUicC*", "n")
+BUILTIN(__builtin_amdgcn_atomic_inc64, "WiWiD*WiUicC*", "n")
+
+BUILTIN(__builtin_amdgcn_atomic_dec32, "ZiZiD*ZiUicC*", "n")
+BUILTIN(__builtin_amdgcn_atomic_dec64, "WiWiD*WiUicC*", "n")
+
 // FIXME: Need to disallow constant address space.
 BUILTIN(__builtin_amdgcn_div_scale, "dddbb*", "n")
 BUILTIN(__builtin_amdgcn_div_scalef, "fffbb*", "n")

diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index bfc78ce94892..f0092e2fa7ec 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -14301,8 +14301,49 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction , 
unsigned Index) {
 }
 } // namespace
 
+// For processing memory ordering and memory scope arguments of various
+// amdgcn builtins.
+// \p Order takes a C++11 comptabile memory-ordering specifier and converts
+// it into LLVM's memory ordering specifier using atomic C ABI, and writes
+// to \p AO. \p Scope takes a const char * and converts it into AMDGCN
+// specific SyncScopeID and writes it to \p SSID.
+bool CodeGenFunction::ProcessOrderScopeAMDGCN(Value *Order, Value *Scope,
+  llvm::AtomicOrdering ,
+  llvm::SyncScope::ID ) {
+  if (isa(Order)) {
+int ord = cast(Order)->getZExtValue();
+
+// Map C11/C++11 memory ordering to LLVM memory ordering
+switch (static_cast(ord)) {
+case llvm::AtomicOrderingCABI::acquire:
+  AO = llvm::AtomicOrdering::Acquire;
+  break;
+case llvm::AtomicOrderingCABI::release:
+  AO = llvm::AtomicOrdering::Release;
+  break;
+case llvm::AtomicOrderingCABI::acq_rel:
+  AO = llvm::AtomicOrdering::AcquireRelease;
+  break;
+case llvm::AtomicOrderingCABI::seq_cst:
+  AO = llvm::AtomicOrdering::SequentiallyConsistent;
+  break;
+case llvm::AtomicOrderingCABI::consume:
+case llvm::AtomicOrderingCABI::relaxed:
+  break;
+}
+
+StringRef scp;
+llvm::getConstantStringInfo(Scope, scp);
+SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+return true;
+  }
+  return false;
+}
+
 Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   const CallExpr *E) {
+  llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
+  llvm::SyncScope::ID SSID;
   switch (BuiltinID) {
   case 

[clang] 602d9b0 - [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-27 Thread Saiyedul Islam via cfe-commits

Author: Saiyedul Islam
Date: 2020-05-27T07:51:27Z
New Revision: 602d9b0afc77828f419869289b159a567c62ae81

URL: 
https://github.com/llvm/llvm-project/commit/602d9b0afc77828f419869289b159a567c62ae81
DIFF: 
https://github.com/llvm/llvm-project/commit/602d9b0afc77828f419869289b159a567c62ae81.diff

LOG: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

Summary:
Allow AMDGCN as a GPU offloading target for OpenMP during compiler
invocation and allow setting CUDAMode for it.

Originally authored by Greg Rodgers (@gregrodgers).

Reviewers: ronlieb, yaxunl, b-sumner, scchan, JonChesterfield, jdoerfert, 
sameerds, msearles, hliao, arsenm

Reviewed By: sameerds

Subscribers: sstefan1, jvesely, wdng, arsenm, guansong, dexonsmith, 
cfe-commits, llvm-commits, gregrodgers

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79754

Added: 
clang/test/OpenMP/amdgcn_device_function_call.cpp

Modified: 
clang/lib/AST/Decl.cpp
clang/lib/Frontend/CompilerInvocation.cpp
clang/test/Driver/openmp-offload-gpu.c
clang/test/OpenMP/target_parallel_no_exceptions.cpp
llvm/include/llvm/ADT/Triple.h

Removed: 




diff  --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 27b3ae3ef00e..e6800073ee58 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -3224,6 +3224,15 @@ unsigned FunctionDecl::getBuiltinID(bool 
ConsiderWrapperFunctions) const {
   !(BuiltinID == Builtin::BIprintf || BuiltinID == Builtin::BImalloc))
 return 0;
 
+  // As AMDGCN implementation of OpenMP does not have a device-side standard
+  // library, none of the predefined library functions except printf and malloc
+  // should be treated as a builtin i.e. 0 should be returned for them.
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+  Context.getLangOpts().OpenMPIsDevice &&
+  Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID) &&
+  !(BuiltinID == Builtin::BIprintf || BuiltinID == Builtin::BImalloc))
+return 0;
+
   return BuiltinID;
 }
 

diff  --git a/clang/lib/Frontend/CompilerInvocation.cpp 
b/clang/lib/Frontend/CompilerInvocation.cpp
index f98490cd9a11..1d820090f810 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -3109,7 +3109,8 @@ static void ParseLangArgs(LangOptions , ArgList 
, InputKind IK,
 
   // Set the flag to prevent the implementation from emitting device exception
   // handling code for those requiring so.
-  if ((Opts.OpenMPIsDevice && T.isNVPTX()) || Opts.OpenCLCPlusPlus) {
+  if ((Opts.OpenMPIsDevice && (T.isNVPTX() || T.isAMDGCN())) ||
+  Opts.OpenCLCPlusPlus) {
 Opts.Exceptions = 0;
 Opts.CXXExceptions = 0;
   }
@@ -3143,6 +3144,7 @@ static void ParseLangArgs(LangOptions , ArgList 
, InputKind IK,
 TT.getArch() == llvm::Triple::ppc64le ||
 TT.getArch() == llvm::Triple::nvptx ||
 TT.getArch() == llvm::Triple::nvptx64 ||
+TT.getArch() == llvm::Triple::amdgcn ||
 TT.getArch() == llvm::Triple::x86 ||
 TT.getArch() == llvm::Triple::x86_64))
 Diags.Report(diag::err_drv_invalid_omp_target) << A->getValue(i);
@@ -3160,13 +3162,13 @@ static void ParseLangArgs(LangOptions , ArgList 
, InputKind IK,
   << Opts.OMPHostIRFile;
   }
 
-  // Set CUDA mode for OpenMP target NVPTX if specified in options
-  Opts.OpenMPCUDAMode = Opts.OpenMPIsDevice && T.isNVPTX() &&
+  // Set CUDA mode for OpenMP target NVPTX/AMDGCN if specified in options
+  Opts.OpenMPCUDAMode = Opts.OpenMPIsDevice && (T.isNVPTX() || T.isAMDGCN()) &&
 Args.hasArg(options::OPT_fopenmp_cuda_mode);
 
-  // Set CUDA mode for OpenMP target NVPTX if specified in options
+  // Set CUDA mode for OpenMP target NVPTX/AMDGCN if specified in options
   Opts.OpenMPCUDAForceFullRuntime =
-  Opts.OpenMPIsDevice && T.isNVPTX() &&
+  Opts.OpenMPIsDevice && (T.isNVPTX() || T.isAMDGCN()) &&
   Args.hasArg(options::OPT_fopenmp_cuda_force_full_runtime);
 
   // Record whether the __DEPRECATED define was requested.

diff  --git a/clang/test/Driver/openmp-offload-gpu.c 
b/clang/test/Driver/openmp-offload-gpu.c
index dc4dbd1f37c9..6415f1d61b72 100644
--- a/clang/test/Driver/openmp-offload-gpu.c
+++ b/clang/test/Driver/openmp-offload-gpu.c
@@ -6,6 +6,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: powerpc-registered-target
 // REQUIRES: nvptx-registered-target
+// REQUIRES: amdgpu-registered-target
 
 /// ###
 
@@ -254,24 +255,40 @@
 // RUN:   | FileCheck -check-prefix=CUDA_MODE %s
 // RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s 
-fno-openmp-cuda-mode -fopenmp-cuda-mode 2>&1 \
 // RUN:   | FileCheck -check-prefix=CUDA_MODE %s
-// CUDA_MODE: