from:"Gheorghe\-Teodor Bercea via Phabricator via cfe\-commits"

[PATCH] D142569: [OpenMP] Introduce kernel environment

2023-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

The changes in this patch do not actually work correctly with kernels that 
really need to be in Generic mode. The ExecMode value recovered in the 
kmpc_kernel_init function is 3 in the case where it needs to be 1.

The problem lies with the OpenMPOpt changes since disabling them makes 
everything work again.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142569/new/

https://reviews.llvm.org/D142569

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D154568: [Clang][OpenMP] GPU simd directive code generation

2023-07-28 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

In D154568#4528274 , @efwright wrote:

> Dropping off a simple test case. If this looks about what you would expect 
> for the tests I have a couple more involved ones that I can repurpose and add 
> in. For more complex tests we have a couple of the benchmark codes from ICPP 
> that were working.
>
> Some cleanup of the code gen is coming, will be on travel tomorrow so might 
> take a day or two.

If you want to do this properly you need to have tests for various ways in 
which the simd directive can be used in combination with other directives. I am 
actually surprised you haven't had to modify any of the existing lit tests that 
involve the simd directive. Do you know why that is? Is it because you're not 
doing anything other than running sequentially?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154568/new/

https://reviews.llvm.org/D154568

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit: 1370e568dea84c4ea65fe5c01ef4f4ccc751 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

@ABataev thank you for the review! I have now fixed the last nit and will 
commit the patch soon!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 537706.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64 [[TMP7]])
+// CHECK-NEXT:store i64 [[TMP3]], ptr [[__VLA_EXPR0_ASCAST]], align 8
+//

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 537498.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64 [[TMP7]])
+// CHECK-NEXT:store i64 [[TMP3]], ptr [[__VLA_EXPR0_ASCAST]], align 8
+//

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntime.h:699-710
+  /// Get call to __kmpc_alloc_shared
+  virtual std::pair
+  getKmpcAllocShared(CodeGenFunction , const VarDecl *VD) {
+llvm_unreachable("not implemented");
+  }
+
+  /// Get call to __kmpc_free_shared

@ABataev I have added the interface entries here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGDecl.cpp:1606
+  CGOpenMPRuntimeGPU  =
+  *(static_cast(()));
+  if (RT.isDelayedVariableLengthDecl(*this, )) {

ABataev wrote:
> ABataev wrote:
> > 1. use `static_cast(CGM.getOpenMPRuntime())`
> > 2. It will crash if your device is not GPU. Better to make 
> > `getKmpcAllocShared` and `getKmpcFreeShared` virtual (just like 
> > `isDelayedVariableLengthDecl`) in base CGOpenMPRuntime, since it may be 
> > required not only for GPU-based devices.
> Check the second item, please, better to make all new member function virtual 
> and handle it for non-GPU devices too
The support I am adding is only meant for GPUs. I am not sure why we need to 
consider non-GPUs. There already exists a VLA handling for non-GPUs and that 
one should be used.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 537485.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64 [[TMP7]])
+// CHECK-NEXT:store i64 [[TMP3]],

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-07-05 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 537478.
doru1004 marked an inline comment as done.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

In any case the patch is good to go. It no longer relies on VLA size checks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

@ABataev This is as close as I could get it to what you wanted. I don't know 
how to get hold of the target directive so late in the emission process i.e. in 
markedAsEscaped function. The target directive doesn't get visited in the var 
checked for escaped vars so I cannot get the list of captures from it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 536489.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64 [[TMP7]])
+// CHECK-NEXT:store i64 [[TMP3]], ptr [[__VLA_EXPR0_ASCAST]], align 8
+//

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

ABataev wrote:
> doru1004 wrote:
> > doru1004 wrote:
> > > ABataev wrote:
> > > > doru1004 wrote:
> > > > > doru1004 wrote:
> > > > > > doru1004 wrote:
> > > > > > > ABataev wrote:
> > > > > > > > doru1004 wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > doru1004 wrote:
> > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > doru1004 wrote:
> > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > 1. Is it possible that VariableArrayType does not 
> > > > > > > > > > > > > > have VLA size?
> > > > > > > > > > > > > > 2. Fix param name
> > > > > > > > > > > > > @ABataev How would point 1 happen?
> > > > > > > > > > > > You're adding a function that checks if VLA type has 
> > > > > > > > > > > > VLA size. I'm asking, if it is possible for VLA type to 
> > > > > > > > > > > > not have VLA size at all? Why do you need this function?
> > > > > > > > > > > This function checks if the expression of the size of the 
> > > > > > > > > > > VLA has already been emitted and can be used.
> > > > > > > > > > Why the emission of VLA size can be delayed?
> > > > > > > > > Because the size of the VLA is emitted in the user code and 
> > > > > > > > > the prolog of the function happens before that. The emission 
> > > > > > > > > of the VLA needs to be delayed until its size has been 
> > > > > > > > > emitted in the user code.
> > > > > > > > This is very fragile approach. Can you try instead try to 
> > > > > > > > improve markAsEscaped function and fix insertion of VD to 
> > > > > > > > EscapedVariableLengthDecls and if the declaration is internal 
> > > > > > > > for the target region, insert it to DelayedVariableLengthDecls?
> > > > > > > I am not sure what the condition would be, at that point, to 
> > > > > > > choose between one list or the other. I'm not sure what you mean 
> > > > > > > by the declaration being internal to the target region.
> > > > > > Any thoughts? As far as I can tell all VLAs that reach that point 
> > > > > > belong in `DelayedVariableLengthDecls`
> > > > > @ABataev I cannot think of a condition to use for the distinction in 
> > > > > markedAsEscaped(). Could you please explain in more detail what you 
> > > > > want me to check? I can make the rest of the changes happen no 
> > > > > problem but I don't know what the condition is. Unless you tell me 
> > > > > otherwise, I think the best condition is to check whether the VLA 
> > > > > size has been emitted (i.e. that is is part of the VLASize list) in 
> > > > > which case the code as is now is fine.
> > > > Can you check that the declaration is not captured in the target 
> > > > context? If it is not captured, it is declared in the target region and 
> > > > should be emitted as delayed.
> > > How do I check that? There doesn't seem to be a list of captured 
> > > variables available at that point in the code.
> > > 
> > So the complication is that the same declaration is captured and not 
> > captured at the same time. It can be declared inside a teams distribute 
> > (not captured) but captured by an inner parallel for (captured). I think I 
> > can come up with something though.
> Need to check the captures in the target regions only
I cannot get a handle on the target directive in markedAsEscaped function in 
order to look at its captures.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

doru1004 wrote:
> ABataev wrote:
> > doru1004 wrote:
> > > doru1004 wrote:
> > > > doru1004 wrote:
> > > > > ABataev wrote:
> > > > > > doru1004 wrote:
> > > > > > > ABataev wrote:
> > > > > > > > doru1004 wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > doru1004 wrote:
> > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > 1. Is it possible that VariableArrayType does not have 
> > > > > > > > > > > > VLA size?
> > > > > > > > > > > > 2. Fix param name
> > > > > > > > > > > @ABataev How would point 1 happen?
> > > > > > > > > > You're adding a function that checks if VLA type has VLA 
> > > > > > > > > > size. I'm asking, if it is possible for VLA type to not 
> > > > > > > > > > have VLA size at all? Why do you need this function?
> > > > > > > > > This function checks if the expression of the size of the VLA 
> > > > > > > > > has already been emitted and can be used.
> > > > > > > > Why the emission of VLA size can be delayed?
> > > > > > > Because the size of the VLA is emitted in the user code and the 
> > > > > > > prolog of the function happens before that. The emission of the 
> > > > > > > VLA needs to be delayed until its size has been emitted in the 
> > > > > > > user code.
> > > > > > This is very fragile approach. Can you try instead try to improve 
> > > > > > markAsEscaped function and fix insertion of VD to 
> > > > > > EscapedVariableLengthDecls and if the declaration is internal for 
> > > > > > the target region, insert it to DelayedVariableLengthDecls?
> > > > > I am not sure what the condition would be, at that point, to choose 
> > > > > between one list or the other. I'm not sure what you mean by the 
> > > > > declaration being internal to the target region.
> > > > Any thoughts? As far as I can tell all VLAs that reach that point 
> > > > belong in `DelayedVariableLengthDecls`
> > > @ABataev I cannot think of a condition to use for the distinction in 
> > > markedAsEscaped(). Could you please explain in more detail what you want 
> > > me to check? I can make the rest of the changes happen no problem but I 
> > > don't know what the condition is. Unless you tell me otherwise, I think 
> > > the best condition is to check whether the VLA size has been emitted 
> > > (i.e. that is is part of the VLASize list) in which case the code as is 
> > > now is fine.
> > Can you check that the declaration is not captured in the target context? 
> > If it is not captured, it is declared in the target region and should be 
> > emitted as delayed.
> How do I check that? There doesn't seem to be a list of captured variables 
> available at that point in the code.
> 
So the complication is that the same declaration is captured and not captured 
at the same time. It can be declared inside a teams distribute (not captured) 
but captured by an inner parallel for (captured). I think I can come up with 
something though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

ABataev wrote:
> doru1004 wrote:
> > doru1004 wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > doru1004 wrote:
> > > > > > ABataev wrote:
> > > > > > > doru1004 wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > doru1004 wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > 1. Is it possible that VariableArrayType does not have 
> > > > > > > > > > > VLA size?
> > > > > > > > > > > 2. Fix param name
> > > > > > > > > > @ABataev How would point 1 happen?
> > > > > > > > > You're adding a function that checks if VLA type has VLA 
> > > > > > > > > size. I'm asking, if it is possible for VLA type to not have 
> > > > > > > > > VLA size at all? Why do you need this function?
> > > > > > > > This function checks if the expression of the size of the VLA 
> > > > > > > > has already been emitted and can be used.
> > > > > > > Why the emission of VLA size can be delayed?
> > > > > > Because the size of the VLA is emitted in the user code and the 
> > > > > > prolog of the function happens before that. The emission of the VLA 
> > > > > > needs to be delayed until its size has been emitted in the user 
> > > > > > code.
> > > > > This is very fragile approach. Can you try instead try to improve 
> > > > > markAsEscaped function and fix insertion of VD to 
> > > > > EscapedVariableLengthDecls and if the declaration is internal for the 
> > > > > target region, insert it to DelayedVariableLengthDecls?
> > > > I am not sure what the condition would be, at that point, to choose 
> > > > between one list or the other. I'm not sure what you mean by the 
> > > > declaration being internal to the target region.
> > > Any thoughts? As far as I can tell all VLAs that reach that point belong 
> > > in `DelayedVariableLengthDecls`
> > @ABataev I cannot think of a condition to use for the distinction in 
> > markedAsEscaped(). Could you please explain in more detail what you want me 
> > to check? I can make the rest of the changes happen no problem but I don't 
> > know what the condition is. Unless you tell me otherwise, I think the best 
> > condition is to check whether the VLA size has been emitted (i.e. that is 
> > is part of the VLASize list) in which case the code as is now is fine.
> Can you check that the declaration is not captured in the target context? If 
> it is not captured, it is declared in the target region and should be emitted 
> as delayed.
How do I check that? There doesn't seem to be a list of captured variables 
available at that point in the code.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

doru1004 wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > doru1004 wrote:
> > > > > > ABataev wrote:
> > > > > > > doru1004 wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > 1. Is it possible that VariableArrayType does not have VLA 
> > > > > > > > > size?
> > > > > > > > > 2. Fix param name
> > > > > > > > @ABataev How would point 1 happen?
> > > > > > > You're adding a function that checks if VLA type has VLA size. 
> > > > > > > I'm asking, if it is possible for VLA type to not have VLA size 
> > > > > > > at all? Why do you need this function?
> > > > > > This function checks if the expression of the size of the VLA has 
> > > > > > already been emitted and can be used.
> > > > > Why the emission of VLA size can be delayed?
> > > > Because the size of the VLA is emitted in the user code and the prolog 
> > > > of the function happens before that. The emission of the VLA needs to 
> > > > be delayed until its size has been emitted in the user code.
> > > This is very fragile approach. Can you try instead try to improve 
> > > markAsEscaped function and fix insertion of VD to 
> > > EscapedVariableLengthDecls and if the declaration is internal for the 
> > > target region, insert it to DelayedVariableLengthDecls?
> > I am not sure what the condition would be, at that point, to choose between 
> > one list or the other. I'm not sure what you mean by the declaration being 
> > internal to the target region.
> Any thoughts? As far as I can tell all VLAs that reach that point belong in 
> `DelayedVariableLengthDecls`
@ABataev I cannot think of a condition to use for the distinction in 
markedAsEscaped(). Could you please explain in more detail what you want me to 
check? I can make the rest of the changes happen no problem but I don't know 
what the condition is. Unless you tell me otherwise, I think the best condition 
is to check whether the VLA size has been emitted (i.e. that is is part of the 
VLASize list) in which case the code as is now is fine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

doru1004 wrote:
> ABataev wrote:
> > doru1004 wrote:
> > > ABataev wrote:
> > > > doru1004 wrote:
> > > > > ABataev wrote:
> > > > > > doru1004 wrote:
> > > > > > > ABataev wrote:
> > > > > > > > 1. Is it possible that VariableArrayType does not have VLA size?
> > > > > > > > 2. Fix param name
> > > > > > > @ABataev How would point 1 happen?
> > > > > > You're adding a function that checks if VLA type has VLA size. I'm 
> > > > > > asking, if it is possible for VLA type to not have VLA size at all? 
> > > > > > Why do you need this function?
> > > > > This function checks if the expression of the size of the VLA has 
> > > > > already been emitted and can be used.
> > > > Why the emission of VLA size can be delayed?
> > > Because the size of the VLA is emitted in the user code and the prolog of 
> > > the function happens before that. The emission of the VLA needs to be 
> > > delayed until its size has been emitted in the user code.
> > This is very fragile approach. Can you try instead try to improve 
> > markAsEscaped function and fix insertion of VD to 
> > EscapedVariableLengthDecls and if the declaration is internal for the 
> > target region, insert it to DelayedVariableLengthDecls?
> I am not sure what the condition would be, at that point, to choose between 
> one list or the other. I'm not sure what you mean by the declaration being 
> internal to the target region.
Any thoughts? As far as I can tell all VLAs that reach that point belong in 
`DelayedVariableLengthDecls`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

ABataev wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > doru1004 wrote:
> > > > > > ABataev wrote:
> > > > > > > 1. Is it possible that VariableArrayType does not have VLA size?
> > > > > > > 2. Fix param name
> > > > > > @ABataev How would point 1 happen?
> > > > > You're adding a function that checks if VLA type has VLA size. I'm 
> > > > > asking, if it is possible for VLA type to not have VLA size at all? 
> > > > > Why do you need this function?
> > > > This function checks if the expression of the size of the VLA has 
> > > > already been emitted and can be used.
> > > Why the emission of VLA size can be delayed?
> > Because the size of the VLA is emitted in the user code and the prolog of 
> > the function happens before that. The emission of the VLA needs to be 
> > delayed until its size has been emitted in the user code.
> This is very fragile approach. Can you try instead try to improve 
> markAsEscaped function and fix insertion of VD to EscapedVariableLengthDecls 
> and if the declaration is internal for the target region, insert it to 
> DelayedVariableLengthDecls?
I am not sure what the condition would be, at that point, to choose between one 
list or the other. I'm not sure what you mean by the declaration being internal 
to the target region.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 536326.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64 [[TMP7]])

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

ABataev wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > 1. Is it possible that VariableArrayType does not have VLA size?
> > > > > 2. Fix param name
> > > > @ABataev How would point 1 happen?
> > > You're adding a function that checks if VLA type has VLA size. I'm 
> > > asking, if it is possible for VLA type to not have VLA size at all? Why 
> > > do you need this function?
> > This function checks if the expression of the size of the VLA has already 
> > been emitted and can be used.
> Why the emission of VLA size can be delayed?
Because the size of the VLA is emitted in the user code and the prolog of the 
function happens before that. The emission of the VLA needs to be delayed until 
its size has been emitted in the user code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 536322.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64 [[TMP7]])

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGDecl.cpp:1605-1606
+if (getLangOpts().OpenMPIsDevice) {
+  CGOpenMPRuntimeGPU  =
+  *(static_cast(()));
+  if (RT.isDelayedVariableLengthDecl(*this, )) {

ABataev wrote:
> No need to cast to CGOpenMPRuntimeGPU since isDelayedVariableLengthDecl is a 
> member of CGOpenMPRuntime.
RT is also used further down to call getKmpcAllocShared().



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

ABataev wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > 1. Is it possible that VariableArrayType does not have VLA size?
> > > 2. Fix param name
> > @ABataev How would point 1 happen?
> You're adding a function that checks if VLA type has VLA size. I'm asking, if 
> it is possible for VLA type to not have VLA size at all? Why do you need this 
> function?
This function checks if the expression of the size of the VLA has already been 
emitted and can be used.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 536321.
doru1004 marked 4 inline comments as done.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.h:2806
+  /// Return true if all the emissions for the VLA size have occured.
+  bool hasVLASize(const VariableArrayType *type);
+

ABataev wrote:
> 1. Is it possible that VariableArrayType does not have VLA size?
> 2. Fix param name
@ABataev How would point 1 happen?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 marked 3 inline comments as done.
doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGDecl.cpp:1605-1609
+(CGM.getContext().getTargetInfo().getTriple().isAMDGPU() ||
+ CGM.getContext().getTargetInfo().getTriple().isNVPTX())) {
+  CGOpenMPRuntimeGPU  =
+  *(static_cast(()));
+  if (RT.isDelayedVariableLengthDecl(*this, )) {

ABataev wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > I think you can drop triple checks and rely completely on 
> > > RT.isDelayedVariableLengthDecl(*this, ) result here
> > I tried it but there is a lit test (which I cannot identify) that hangs 
> > when offloading to the host (I think) so it has to be an actual GPU. Any 
> > ideas?
> Make isDelayedVariableLengthDecl virtual in base OpenMPRuntime and make it 
> return false by default, and true in base implementation for GPU. This should 
> fix the problem, I hope
It worked thank you for the suggestion!! 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 536288.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64 [[TMP7]])

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-30 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGDecl.cpp:1605-1609
+(CGM.getContext().getTargetInfo().getTriple().isAMDGPU() ||
+ CGM.getContext().getTargetInfo().getTriple().isNVPTX())) {
+  CGOpenMPRuntimeGPU  =
+  *(static_cast(()));
+  if (RT.isDelayedVariableLengthDecl(*this, )) {

ABataev wrote:
> I think you can drop triple checks and rely completely on 
> RT.isDelayedVariableLengthDecl(*this, ) result here
I tried it but there is a lit test (which I cannot identify) that hangs when 
offloading to the host (I think) so it has to be an actual GPU. Any ideas?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-29 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

I have modified the patch to only do one thing rather than several things as 
the previous patch. Essentially this patch now only handles the delayed 
emission of the __kmpc_alloc_shared for the VLA which it could not emit in the 
Prolog of the function. This is now very precise in terms of which VLAs it will 
transform into __kmpc_alloc_shared i.e. only the ones that were previously 
attempted in the Prolog and could not be emitted because their size was missing 
(had not been emitted yet).

I have dropped the previous intention of emitting __kmpc_alloc_shared for 
thread local variables which have dynamic size. I am emitting dynamic allocas 
(as the test shows) which will fail in the backend as expected. This behavior 
needs to be resolved separately in the backend according to @arsenm  and any 
workaround in the frontend would have to live in a standalone patch that can be 
reverted when a fix to the backend is performed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs

2023-06-29 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 536059.
doru1004 retitled this revision from "[Clang][OpenMP] Enable use of 
__kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions" to 
"[Clang][OpenMP] Delay emission of __kmpc_alloc_shared for escaped VLAs ".
doru1004 edited the summary of this revision.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1260 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]]

[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

2023-06-28 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGDecl.cpp:1603
+// deallocation call of __kmpc_free_shared() is emitted later.
+if (getLangOpts().OpenMP && getTarget().getTriple().isAMDGCN()) {
+  // Emit call to __kmpc_alloc_shared() instead of the alloca.

arsenm wrote:
> ABataev wrote:
> > doru1004 wrote:
> > > jhuber6 wrote:
> > > > ABataev wrote:
> > > > > OpenMPIsDevice?
> > > > Does NVPTX handle this already? If not, is there a compelling reason to 
> > > > exclude NVPTX? Otherwise we should check if we are the OpenMP device.
> > > Does NVPTX support dynamic allocas?
> > It does not matter here, it depends on the runtime library implementations. 
> > The compiler just shall provide proper runtime calls emission, everything 
> > else is part of the runtime support.
> I think I heard recent ptx introdced new instructions for it. amdgpu codegen 
> just happens to be broken because we don't properly restore the stack 
> afterwards. When I added the support we had no way of testing (and still 
> don't really, __builtin_alloca doesn't handle non-0 stack address space 
> correctly)
If NVPTX supports that then there is no reason to have NVPTX avoid emitting 
allocas (i.e. the condition stays as it is right now) but I am willing to reach 
a consensus so please let me know what you would all prefer.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

2023-06-27 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGDecl.cpp:1603
+// deallocation call of __kmpc_free_shared() is emitted later.
+if (getLangOpts().OpenMP && getTarget().getTriple().isAMDGCN()) {
+  // Emit call to __kmpc_alloc_shared() instead of the alloca.

jhuber6 wrote:
> ABataev wrote:
> > OpenMPIsDevice?
> Does NVPTX handle this already? If not, is there a compelling reason to 
> exclude NVPTX? Otherwise we should check if we are the OpenMP device.
Does NVPTX support dynamic allocas?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

2023-06-27 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 535186.
doru1004 marked 3 inline comments as done.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,1258 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo4() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  int N = 10;
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3() + foo4();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]] = mul nuw i64 [[TMP6]], 4
+// CHECK-NEXT:[[A:%.*]] = call align 4 ptr @__kmpc_alloc_shared(i64

[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

2023-06-27 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1085
   }
-  for (const auto *VD : I->getSecond().EscapedVariableLengthDecls) {
-// Use actual memory size of the VLA object including the padding

doru1004 wrote:
> ABataev wrote:
> > jhuber6 wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > Why this code is removed?
> > > > I could not understand why this code is here in the first place since 
> > > > it doesn't seem that it could ever work correctly (and it doesn't seem 
> > > > to be covered by any existing tests). Maybe I'm wrong but that was my 
> > > > understanding of it. So what seems to happen is that this code attempts 
> > > > to emit a kmpc_alloc_shared before the actual size calculation is 
> > > > emitted. So if the VLA size is something that the user defines such as 
> > > > `int N = 10;` then that code will not have been emitted at this point. 
> > > > When the expression computing the size of the VLA uses `N`, the code 
> > > > that is deleted here would just fail to find the VLA size in the 
> > > > attempt to emit the kmpc_alloc_shared. The emission of the VLA as 
> > > > kmpc_alloc_shared needs to happen after the expression of the size is 
> > > > emitted.
> > > I'm pretty sure I was the one that wrote this code, and at the time I 
> > > don't recall it really working. I remember there was something else that 
> > > expected this to be here, but for what utility I do not recall. VLAs were 
> > > never tested or used.
> > They are tested, check 
> > test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp for 
> > example, where it captures VLA implicitly. I assume this should not be 
> > AMDGCN specific.
> Oh I see so this code path would cover the case when the VLA is defined 
> outside the target region? I'm surprised I haven't seen any lit test fails 
> for AMD GPUs, maybe this kind of test only exists for NVPTX. I'll add a test 
> for AMD GPUs in that case.
Edit: the VLA is defined outside the target region => the VLA //size// is 
defined outside the target region


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

2023-06-27 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1085
   }
-  for (const auto *VD : I->getSecond().EscapedVariableLengthDecls) {
-// Use actual memory size of the VLA object including the padding

ABataev wrote:
> jhuber6 wrote:
> > doru1004 wrote:
> > > ABataev wrote:
> > > > Why this code is removed?
> > > I could not understand why this code is here in the first place since it 
> > > doesn't seem that it could ever work correctly (and it doesn't seem to be 
> > > covered by any existing tests). Maybe I'm wrong but that was my 
> > > understanding of it. So what seems to happen is that this code attempts 
> > > to emit a kmpc_alloc_shared before the actual size calculation is 
> > > emitted. So if the VLA size is something that the user defines such as 
> > > `int N = 10;` then that code will not have been emitted at this point. 
> > > When the expression computing the size of the VLA uses `N`, the code that 
> > > is deleted here would just fail to find the VLA size in the attempt to 
> > > emit the kmpc_alloc_shared. The emission of the VLA as kmpc_alloc_shared 
> > > needs to happen after the expression of the size is emitted.
> > I'm pretty sure I was the one that wrote this code, and at the time I don't 
> > recall it really working. I remember there was something else that expected 
> > this to be here, but for what utility I do not recall. VLAs were never 
> > tested or used.
> They are tested, check 
> test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp for 
> example, where it captures VLA implicitly. I assume this should not be AMDGCN 
> specific.
Oh I see so this code path would cover the case when the VLA is defined outside 
the target region? I'm surprised I haven't seen any lit test fails for AMD 
GPUs, maybe this kind of test only exists for NVPTX. I'll add a test for AMD 
GPUs in that case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

2023-06-27 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1085
   }
-  for (const auto *VD : I->getSecond().EscapedVariableLengthDecls) {
-// Use actual memory size of the VLA object including the padding

ABataev wrote:
> Why this code is removed?
I could not understand why this code is here in the first place since it 
doesn't seem that it could ever work correctly (and it doesn't seem to be 
covered by any existing tests). Maybe I'm wrong but that was my understanding 
of it. So what seems to happen is that this code attempts to emit a 
kmpc_alloc_shared before the actual size calculation is emitted. So if the VLA 
size is something that the user defines such as `int N = 10;` then that code 
will not have been emitted at this point. When the expression computing the 
size of the VLA uses `N`, the code that is deleted here would just fail to find 
the VLA size in the attempt to emit the kmpc_alloc_shared. The emission of the 
VLA as kmpc_alloc_shared needs to happen after the expression of the size is 
emitted.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153883/new/

https://reviews.llvm.org/D153883

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153883: [Clang][OpenMP] Enable use of __kmpc_alloc_shared for VLAs defined in AMD GPU offloaded regions

2023-06-27 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: ronlieb, gregrodgers, carlo.bertolli, arsenm, 
jdoerfert, dhruvachak, ABataev.
doru1004 added a project: OpenMP.
Herald added subscribers: sunshaoce, guansong, yaxunl, jvesely.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added subscribers: cfe-commits, jplehr, sstefan1, wdng.
Herald added a project: clang.

This patch enables the use of `___kmpc_alloc_shared` to allocate dynamically 
sized allocation on AMD GPUs. For example:

  #pragma omp target
  {
int N = 10;
double A[N];
...
  }

This will generate a pair of `__kmpc_alloc_shared / __kmpc_free_shared` to 
handle the allocation and deallocation of `A` inside the target region.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D153883

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/amdgcn_target_device_vla.cpp

Index: clang/test/OpenMP/amdgcn_target_device_vla.cpp
===
--- /dev/null
+++ clang/test/OpenMP/amdgcn_target_device_vla.cpp
@@ -0,0 +1,869 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int foo1() {
+  int sum = 0.0;
+  #pragma omp target map(tofrom: sum)
+  {
+int N = 10;
+int A[N];
+
+for (int i = 0; i < N; i++)
+  A[i] = i;
+
+for (int i = 0; i < N; i++)
+  sum += A[i];
+  }
+  return sum;
+}
+
+int foo2() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute parallel for map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int foo3() {
+  int sum = 0.0;
+  int M = 12;
+  int result[M];
+  #pragma omp target teams distribute map(from: result[:M])
+  for (int i = 0; i < M; i++) {
+int N = 10;
+int A[N];
+result[i] = i;
+
+#pragma omp parallel for
+for (int j = 0; j < N; j++)
+  A[j] = j;
+
+for (int j = 0; j < N; j++)
+  result[i] += A[j];
+  }
+
+  for (int i = 0; i < M; i++)
+sum += result[i];
+  return sum;
+}
+
+int main() {
+  return foo1() + foo2() + foo3();
+}
+
+#endif
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4foo1v_l12
+// CHECK-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[SUM:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[SUM_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT:[[N:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__VLA_EXPR0:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[I1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[SUM_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[SUM_ADDR]] to ptr
+// CHECK-NEXT:[[N_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N]] to ptr
+// CHECK-NEXT:[[__VLA_EXPR0_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[__VLA_EXPR0]] to ptr
+// CHECK-NEXT:[[I_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I]] to ptr
+// CHECK-NEXT:[[I1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I1]] to ptr
+// CHECK-NEXT:store ptr [[SUM]], ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[SUM_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:store i32 10, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ASCAST]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = zext i32 [[TMP2]] to i64
+// CHECK-NEXT:[[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+// CHECK-NEXT:[[TMP5:%.*]] = add nuw i64 [[TMP4]], 3
+// CHECK-NEXT:[[TMP6:%.*]] = udiv i64 [[TMP5]], 4
+// CHECK-NEXT:[[TMP7:%.*]]

[PATCH] D148849: [OpenMP-OPT] Remove limit for heap to stack conversions of __kmpc_alloc_shared allocations

2023-04-21 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit: 1a58c3d601b4c982afeb714c3a6c4be4d787cbf1 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148849/new/

https://reviews.llvm.org/D148849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D148849: [OpenMP-OPT] Remove limit for heap to stack conversions of __kmpc_alloc_shared allocations

2023-04-20 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

In D148849#4285236 , @jdoerfert wrote:

> Make a test for the attributor/openmp-opt, also don't use O2 
>  in tests, the IR only test is 
> sufficient.

I removed the clang test since it wasn't testing anything new.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148849/new/

https://reviews.llvm.org/D148849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D148849: [OpenMP-OPT] Remove limit for heap to stack conversions of __kmpc_alloc_shared allocations

2023-04-20 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 515516.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148849/new/

https://reviews.llvm.org/D148849

Files:
  llvm/lib/Transforms/IPO/AttributorAttributes.cpp
  llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll


Index: llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
===
--- llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
+++ llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
@@ -741,6 +741,21 @@
   ret void
 }
 
+define void @convert_large_kmpc_alloc_shared() {
+; CHECK-LABEL: define {{[^@]+}}@convert_large_kmpc_alloc_shared() {
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:[[I_H2S:%.*]] = alloca i8, i64 256, align 1, addrspace(5)
+; CHECK-NEXT:[[MALLOC_CAST:%.*]] = addrspacecast ptr addrspace(5) 
[[I_H2S]] to ptr
+; CHECK-NEXT:tail call void @usei8(ptr noalias nocapture nofree 
[[MALLOC_CAST]]) #[[ATTR7]]
+; CHECK-NEXT:ret void
+;
+bb:
+  %i = tail call noalias ptr @__kmpc_alloc_shared(i64 256)
+  tail call void @usei8(ptr nocapture nofree %i) nosync nounwind willreturn
+  tail call void @__kmpc_free_shared(ptr %i, i64 256)
+  ret void
+}
+
 
 ;.
 ; CHECK: attributes #[[ATTR0:[0-9]+]] = { nounwind willreturn }
Index: llvm/lib/Transforms/IPO/AttributorAttributes.cpp
===
--- llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -7180,7 +7180,8 @@
 }
 
 std::optional Size = getSize(A, *this, AI);
-if (MaxHeapToStackSize != -1) {
+if (AI.LibraryFunctionId != LibFunc___kmpc_alloc_shared &&
+MaxHeapToStackSize != -1) {
   if (!Size || Size->ugt(MaxHeapToStackSize)) {
 LLVM_DEBUG({
   if (!Size)


Index: llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
===
--- llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
+++ llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
@@ -741,6 +741,21 @@
   ret void
 }
 
+define void @convert_large_kmpc_alloc_shared() {
+; CHECK-LABEL: define {{[^@]+}}@convert_large_kmpc_alloc_shared() {
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:[[I_H2S:%.*]] = alloca i8, i64 256, align 1, addrspace(5)
+; CHECK-NEXT:[[MALLOC_CAST:%.*]] = addrspacecast ptr addrspace(5) [[I_H2S]] to ptr
+; CHECK-NEXT:tail call void @usei8(ptr noalias nocapture nofree [[MALLOC_CAST]]) #[[ATTR7]]
+; CHECK-NEXT:ret void
+;
+bb:
+  %i = tail call noalias ptr @__kmpc_alloc_shared(i64 256)
+  tail call void @usei8(ptr nocapture nofree %i) nosync nounwind willreturn
+  tail call void @__kmpc_free_shared(ptr %i, i64 256)
+  ret void
+}
+
 
 ;.
 ; CHECK: attributes #[[ATTR0:[0-9]+]] = { nounwind willreturn }
Index: llvm/lib/Transforms/IPO/AttributorAttributes.cpp
===
--- llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -7180,7 +7180,8 @@
 }
 
 std::optional Size = getSize(A, *this, AI);
-if (MaxHeapToStackSize != -1) {
+if (AI.LibraryFunctionId != LibFunc___kmpc_alloc_shared &&
+MaxHeapToStackSize != -1) {
   if (!Size || Size->ugt(MaxHeapToStackSize)) {
 LLVM_DEBUG({
   if (!Size)
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-20 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit: 01910787d386584ea5a3d5dc317a908423ba39ed 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D148805: [Clang][OpenMP] Avoid emitting a __kmpc_alloc_shared for implicit casts which do not have their address taken

2023-04-20 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:448
   return;
-if (E->getCastKind() == CK_ArrayToPointerDecay) {
-  const bool SavedAllEscaped = AllEscaped;

ABataev wrote:
> I think you need to check that the array is allocated in the parallel 
> context, otherwise there might be a crash, if it is allocated in the target 
> context and many threads would like to access it.
I believe that this is how this condition got here: the inability to check that 
particular aspect (since we are in a function and the target/parallel is not 
visible) so basically just emit it as kmpc_alloc_shared conservatively. The 
more I think about it the more I believe we should leave this as is and not 
change it. The solution might be to improve the optimization of these cases 
rather than the emission itself.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148805/new/

https://reviews.llvm.org/D148805

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D148805: [Clang][OpenMP] Avoid emitting a __kmpc_alloc_shared for implicit casts which do not have their address taken

2023-04-20 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: ronl, jdoerfert, jhuber6, carlo.bertolli, 
JonChesterfield, dhruvachak, gregrodgers, ABataev.
doru1004 added a project: OpenMP.
Herald added subscribers: sunshaoce, guansong, yaxunl.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added subscribers: cfe-commits, jplehr, sstefan1.
Herald added a project: clang.

This patch avoids emitting `__kmpc_alloc_shared` allocation calls for 
implicitly cast variables which are `CK_ArrayToPointerDecay` that are not 
having their address taken explicitly.

Note: if the condition should be refined instead of removed then I am looking 
for suggestions as to how to keep the check for CK_ArrayToPointerDecay but 
restrict its applicability with further conditions. It is not clear to me what 
those conditions could be hence the complete removal of the condition. So far 
none of the existing lit tests needed to be changed as a consuquence of this 
change and no  LLVM/OpenMP tests have failed.

OpenMP-Opt is usually able to transform the `__kmpc_alloc_shared` calls emitted 
this way to allocas except in this case the size of the allocated local array 
(256) is preventing that from happening (limit is 128).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D148805

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/target_alloc_shared_emission.cpp

Index: clang/test/OpenMP/target_alloc_shared_emission.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_alloc_shared_emission.cpp
@@ -0,0 +1,827 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | FileCheck %s --check-prefix=CHECK-NVIDIA
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+void foo(int *stack);
+
+void emits_alloc_shared(const int *localPadding , int *res)
+{
+int stack[64];
+int stackptr = 0;
+stack[stackptr++] = -1;
+*res = 0;
+
+do
+{
+  if(localPadding[0] > 0)
+stack[stackptr++] = 0;
+  *res = stack[--stackptr];
+  foo([2]);
+} while (*res > 0);
+}
+
+void does_not_emit_alloc_shared(const int *localPadding , int *res)
+{
+int stack[64];
+int stackptr = 0;
+stack[stackptr++] = -1;
+*res = 0;
+
+do
+{
+  if(localPadding[0] > 0)
+stack[stackptr++] = 0;
+  *res = stack[--stackptr];
+} while (*res > 0);
+}
+
+#define N 1000
+
+int main() {
+const int maz = 1;
+const int may = 2;
+const int max = 3;
+int res;
+int localPadding[N];
+#pragma omp target teams distribute parallel for map(tofrom: localPadding[:N],maz, may, max)
+
+for (int pi = 0; pi < N; pi++)
+{
+for (int hz = 0; hz <= maz; hz++)
+for (int hy = 0; hy <= may; hy++)
+for (int hx = 0; hx <= max; hx++) {
+emits_alloc_shared(localPadding, );
+does_not_emit_alloc_shared(localPadding, );
+}
+localPadding[pi] = res;
+}
+return 0;
+}
+
+#endif
+// CHECK-AMD-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l58
+// CHECK-AMD-SAME: (ptr noundef nonnull align 4 dereferenceable(4000) [[LOCALPADDING:%.*]], i64 noundef [[RES:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-AMD-NEXT:  entry:
+// CHECK-AMD-NEXT:[[LOCALPADDING_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-AMD-NEXT:[[RES_ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-AMD-NEXT:[[RES_CASTED:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-AMD-NEXT:[[DOTZERO_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-AMD-NEXT:[[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-AMD-NEXT:[[LOCALPADDING_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[LOCALPADDING_ADDR]] to ptr
+// CHECK-AMD-NEXT:

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-11 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 511444.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/target_team_variable_codegen.cpp


Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex 
"llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" 
"_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | 
FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | 
FileCheck %s --check-prefix=CHECK-NVIDIA
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+typedef enum omp_allocator_handle_t {
+  omp_null_allocator = 0,
+  omp_default_mem_alloc = 1,
+  omp_large_cap_mem_alloc = 2,
+  omp_const_mem_alloc = 3,
+  omp_high_bw_mem_alloc = 4,
+  omp_low_lat_mem_alloc = 5,
+  omp_cgroup_mem_alloc = 6,
+  omp_pteam_mem_alloc = 7,
+  omp_thread_mem_alloc = 8,
+  KMP_ALLOCATOR_MAX_HANDLE = __UINTPTR_MAX__
+} omp_allocator_handle_t;
+
+//.
+// CHECK-AMD: @local_a = internal addrspace(3) global [10 x i32] poison, align 
4
+//.
+// CHECK-NVIDIA: @local_a = internal addrspace(3) global [10 x i32] poison, 
align 4
+//.
+int main()
+{
+   int N = 1;
+   int *a = new int[N];
+#pragma omp target data map(tofrom:a[:N])
+   {
+#pragma omp target teams distribute parallel for
+for(int i = 0; i < N; i++)
+{
+  int local_a[10];
+#pragma omp allocate(local_a) allocator(omp_pteam_mem_alloc)
+  for(int j = 0; j < 10; j++)
+   local_a[j] = a[(i + j) % N];
+  a[i] = local_a[0];
+}
+   }
+  return a[17];
+}
+
+#endif
+ NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+// CHECK-AMD: {{.*}}
+// CHECK-NVIDIA: {{.*}}
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -3351,7 +3351,7 @@
 llvm::Type *VarTy = CGF.ConvertTypeForMem(VD->getType());
 auto *GV = new llvm::GlobalVariable(
 CGM.getModule(), VarTy, /*isConstant=*/false,
-llvm::GlobalValue::InternalLinkage, 
llvm::Constant::getNullValue(VarTy),
+llvm::GlobalValue::InternalLinkage, llvm::PoisonValue::get(VarTy),
 VD->getName(),
 /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal,
 CGM.getContext().getTargetAddressSpace(AS));


Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex "llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" "_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | FileCheck %s

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 511436.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/target_team_variable_codegen.cpp


Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex 
"llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" 
"_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | 
FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | 
FileCheck %s --check-prefix=CHECK-NVIDIA
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+typedef enum omp_allocator_handle_t {
+  omp_null_allocator = 0,
+  omp_default_mem_alloc = 1,
+  omp_large_cap_mem_alloc = 2,
+  omp_const_mem_alloc = 3,
+  omp_high_bw_mem_alloc = 4,
+  omp_low_lat_mem_alloc = 5,
+  omp_cgroup_mem_alloc = 6,
+  omp_pteam_mem_alloc = 7,
+  omp_thread_mem_alloc = 8,
+  KMP_ALLOCATOR_MAX_HANDLE = __UINTPTR_MAX__
+} omp_allocator_handle_t;
+
+//.
+// CHECK-AMD: @local_a = internal addrspace(3) global [10 x i32] poison, align 
4
+//.
+// CHECK-NVIDIA: @local_a = internal addrspace(3) global [10 x i32] 
zeroinitializer, align 4
+//.
+int main()
+{
+   int N = 1;
+   int *a = new int[N];
+#pragma omp target data map(tofrom:a[:N])
+   {
+#pragma omp target teams distribute parallel for
+for(int i = 0; i < N; i++)
+{
+  int local_a[10];
+#pragma omp allocate(local_a) allocator(omp_pteam_mem_alloc)
+  for(int j = 0; j < 10; j++)
+   local_a[j] = a[(i + j) % N];
+  a[i] = local_a[0];
+}
+   }
+  return a[17];
+}
+
+#endif
+ NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+// CHECK-AMD: {{.*}}
+// CHECK-NVIDIA: {{.*}}
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -3351,7 +3351,7 @@
 llvm::Type *VarTy = CGF.ConvertTypeForMem(VD->getType());
 auto *GV = new llvm::GlobalVariable(
 CGM.getModule(), VarTy, /*isConstant=*/false,
-llvm::GlobalValue::InternalLinkage, 
llvm::Constant::getNullValue(VarTy),
+llvm::GlobalValue::InternalLinkage, llvm::PoisonValue::get(VarTy),
 VD->getName(),
 /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal,
 CGM.getContext().getTargetAddressSpace(AS));


Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex "llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" "_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | FileCheck %s

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/target_team_variable_codegen.cpp:33
+//.
+// CHECK-NVIDIA: @local_a = internal addrspace(3) global [10 x i32] 
zeroinitializer, align 4
+//.

jhuber6 wrote:
> jdoerfert wrote:
> > doru1004 wrote:
> > > jhuber6 wrote:
> > > > Shouldn't the Nvidia version also be undefined? Not sure why this 
> > > > should vary depending on the target.
> > > Perhaps NVIDIA code path can tolerate a zeroinitializer? I don't want to 
> > > change it if it's not needed. I am basing this check on the code path for 
> > > AMD GPUs and the initial bug that was reported.
> > for AS 3 we should make it always poison.
> We should probably change this in `HeadToShared` in `OpenMPOpt` as well.
Happy to remove the guard and have it always use poison for both NVIDIA and AMD.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/target_team_variable_codegen.cpp:33
+//.
+// CHECK-NVIDIA: @local_a = internal addrspace(3) global [10 x i32] 
zeroinitializer, align 4
+//.

jhuber6 wrote:
> Shouldn't the Nvidia version also be undefined? Not sure why this should vary 
> depending on the target.
Perhaps NVIDIA code path can tolerate a zeroinitializer? I don't want to change 
it if it's not needed. I am basing this check on the code path for AMD GPUs and 
the initial bug that was reported.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-04 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 510943.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/target_team_variable_codegen.cpp


Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex 
"llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" 
"_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | 
FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | 
FileCheck %s --check-prefix=CHECK-NVIDIA
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+typedef enum omp_allocator_handle_t {
+  omp_null_allocator = 0,
+  omp_default_mem_alloc = 1,
+  omp_large_cap_mem_alloc = 2,
+  omp_const_mem_alloc = 3,
+  omp_high_bw_mem_alloc = 4,
+  omp_low_lat_mem_alloc = 5,
+  omp_cgroup_mem_alloc = 6,
+  omp_pteam_mem_alloc = 7,
+  omp_thread_mem_alloc = 8,
+  KMP_ALLOCATOR_MAX_HANDLE = __UINTPTR_MAX__
+} omp_allocator_handle_t;
+
+//.
+// CHECK-AMD: @local_a = internal addrspace(3) global [10 x i32] poison, align 
4
+//.
+// CHECK-NVIDIA: @local_a = internal addrspace(3) global [10 x i32] 
zeroinitializer, align 4
+//.
+int main()
+{
+   int N = 1;
+   int *a = new int[N];
+#pragma omp target data map(tofrom:a[:N])
+   {
+#pragma omp target teams distribute parallel for
+for(int i = 0; i < N; i++)
+{
+  int local_a[10];
+#pragma omp allocate(local_a) allocator(omp_pteam_mem_alloc)
+  for(int j = 0; j < 10; j++)
+   local_a[j] = a[(i + j) % N];
+  a[i] = local_a[0];
+}
+   }
+  return a[17];
+}
+
+#endif
+ NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+// CHECK-AMD: {{.*}}
+// CHECK-NVIDIA: {{.*}}
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -3351,7 +3351,9 @@
 llvm::Type *VarTy = CGF.ConvertTypeForMem(VD->getType());
 auto *GV = new llvm::GlobalVariable(
 CGM.getModule(), VarTy, /*isConstant=*/false,
-llvm::GlobalValue::InternalLinkage, 
llvm::Constant::getNullValue(VarTy),
+llvm::GlobalValue::InternalLinkage,
+CGM.getTriple().isAMDGCN() ? llvm::PoisonValue::get(VarTy)
+   : llvm::Constant::getNullValue(VarTy),
 VD->getName(),
 /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal,
 CGM.getContext().getTargetAddressSpace(AS));


Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex "llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" "_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-04 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:303
+if (GV->hasInitializer() && !(isa(GV->getInitializer()) ||
+  isa(GV->getInitializer( {
   OutContext.reportError({},

arsenm wrote:
> Isa covers PoisonValue already 
Perfect! I'll revert this part.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-04 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 marked an inline comment as done.
doru1004 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:3355
+llvm::GlobalValue::InternalLinkage,
+CGM.getTriple().isAMDGCN() ? llvm::UndefValue::get(VarTy)
+   : llvm::Constant::getNullValue(VarTy),

nlopes wrote:
> Please use poison instead of undef wherever possible as we are tying to 
> remove undef. The replacement is usually safe when you just need a 
> placeholder.
> Thank you!
I've made the change as requested, this also means that I had to add another 
check in `AMDGPUAsmPrinter.cpp`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-04 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 510934.
Herald added subscribers: llvm-commits, kosarev, foad, kerbowa, hiraditya, 
jvesely, arsenm.
Herald added a project: LLVM.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147572/new/

https://reviews.llvm.org/D147572

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/target_team_variable_codegen.cpp
  llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp


Index: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
===
--- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -299,7 +299,8 @@
 
 void AMDGPUAsmPrinter::emitGlobalVariable(const GlobalVariable *GV) {
   if (GV->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {
-if (GV->hasInitializer() && !isa(GV->getInitializer())) {
+if (GV->hasInitializer() && !(isa(GV->getInitializer()) ||
+  isa(GV->getInitializer( {
   OutContext.reportError({},
  Twine(GV->getName()) +
  ": unsupported initializer for address 
space");
Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex 
"llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" 
"_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | 
FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | 
FileCheck %s --check-prefix=CHECK-NVIDIA
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+typedef enum omp_allocator_handle_t {
+  omp_null_allocator = 0,
+  omp_default_mem_alloc = 1,
+  omp_large_cap_mem_alloc = 2,
+  omp_const_mem_alloc = 3,
+  omp_high_bw_mem_alloc = 4,
+  omp_low_lat_mem_alloc = 5,
+  omp_cgroup_mem_alloc = 6,
+  omp_pteam_mem_alloc = 7,
+  omp_thread_mem_alloc = 8,
+  KMP_ALLOCATOR_MAX_HANDLE = __UINTPTR_MAX__
+} omp_allocator_handle_t;
+
+//.
+// CHECK-AMD: @local_a = internal addrspace(3) global [10 x i32] poison, align 
4
+//.
+// CHECK-NVIDIA: @local_a = internal addrspace(3) global [10 x i32] 
zeroinitializer, align 4
+//.
+int main()
+{
+   int N = 1;
+   int *a = new int[N];
+#pragma omp target data map(tofrom:a[:N])
+   {
+#pragma omp target teams distribute parallel for
+for(int i = 0; i < N; i++)
+{
+  int local_a[10];
+#pragma omp allocate(local_a) allocator(omp_pteam_mem_alloc)
+  for(int j = 0; j < 10; j++)
+   local_a[j] = a[(i + j) % N];
+  a[i] = local_a[0];
+}
+   }
+  return a[17];
+}
+
+#endif
+ NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+// CHECK-AMD: {{.*}}
+// CHECK-NVIDIA: {{.*}}
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -3351,7 +3351,9 @@
 llvm::Type *VarTy = CGF.ConvertTypeForMem(VD->getType());
 auto *GV = new llvm::GlobalVariable(
 CGM.getModule(), VarTy, /*isConstant=*/false,
-llvm::GlobalValue::InternalLinkage, 
llvm::Constant::getNullValue(VarTy),
+llvm::GlobalValue::InternalLinkage,
+CGM.getTriple().isAMDGCN() ? llvm::PoisonValue::get(VarTy)
+   : llvm::Constant::getNullValue(VarTy),
 VD->getName(),
 /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal,
 CGM.getContext().getTargetAddressSpace(AS));


Index: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
===
--- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -299,7 +299,8 @@
 
 void AMDGPUAsmPrinter::emitGlobalVariable(const GlobalVariable *GV) {
   if (GV->getAddressSpace()

[PATCH] D147572: [Clang][OpenMP] Fix failure with team-wide allocated variable

2023-04-04 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: ronl, carlo.bertolli, jhuber6, jdoerfert, dhruvachak, 
gregrodgers.
doru1004 added a project: OpenMP.
Herald added subscribers: sunshaoce, nlopes, guansong, arichardson, yaxunl.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added subscribers: cfe-commits, jplehr, sstefan1.
Herald added a project: clang.

This patch aims to resolve issue: 
https://github.com/llvm/llvm-project/issues/60345

The following code:

  #include 
  #include 
  #include 
  
  
  int main()
  {
int N =1<<30;
int *a = new int[N];
  #pragma omp target data map(tofrom:a[:N])
{
 #pragma omp target teams distribute parallel for
  for(int i = 0; i < N; i++)
  {
   int local_a[10];
 #pragma omp allocate(local_a) allocator(omp_pteam_mem_alloc)
for(int j = 0; j < 10; j++)
local_a[j] = a[(i+j)%N];
a[i] = local_a[0];
}
}
  std::cout << a[0] << "\n";
  }

Fails with the following linker errors:

  clang-linker-wrapper: error: :0: local_a: unsupported initializer 
for address space
  
  clang-linker-wrapper: error: Errors encountered inside the LTO pipeline.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D147572

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/target_team_variable_codegen.cpp


Index: clang/test/OpenMP/target_team_variable_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_team_variable_codegen.cpp
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex 
"llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" 
"_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | 
FileCheck %s --check-prefix=CHECK-AMD
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host-nvidia.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-unknown-unknown -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-nvidia.bc -o - | 
FileCheck %s --check-prefix=CHECK-NVIDIA
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+typedef enum omp_allocator_handle_t {
+  omp_null_allocator = 0,
+  omp_default_mem_alloc = 1,
+  omp_large_cap_mem_alloc = 2,
+  omp_const_mem_alloc = 3,
+  omp_high_bw_mem_alloc = 4,
+  omp_low_lat_mem_alloc = 5,
+  omp_cgroup_mem_alloc = 6,
+  omp_pteam_mem_alloc = 7,
+  omp_thread_mem_alloc = 8,
+  KMP_ALLOCATOR_MAX_HANDLE = __UINTPTR_MAX__
+} omp_allocator_handle_t;
+
+//.
+// CHECK-AMD: @local_a = internal addrspace(3) global [10 x i32] undef, align 4
+//.
+// CHECK-NVIDIA: @local_a = internal addrspace(3) global [10 x i32] 
zeroinitializer, align 4
+//.
+int main()
+{
+   int N = 1;
+   int *a = new int[N];
+#pragma omp target data map(tofrom:a[:N])
+   {
+#pragma omp target teams distribute parallel for
+for(int i = 0; i < N; i++)
+{
+  int local_a[10];
+#pragma omp allocate(local_a) allocator(omp_pteam_mem_alloc)
+  for(int j = 0; j < 10; j++)
+   local_a[j] = a[(i + j) % N];
+  a[i] = local_a[0];
+}
+   }
+  return a[17];
+}
+
+#endif
+ NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+// CHECK-AMD: {{.*}}
+// CHECK-NVIDIA: {{.*}}
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -3351,7 +3351,9 @@
 llvm::Type *VarTy = CGF.ConvertTypeForMem(VD->getType());
 auto *GV = new llvm::GlobalVariable(
 CGM.getModule(), VarTy, /*isConstant=*/false,
-llvm::GlobalValue::InternalLinkage, 
llvm::Constant::getNullValue(VarTy),
+llvm::GlobalValue::InternalLinkage,
+CGM.getTriple().isAMDGCN() ? llvm::UndefValue::get(VarTy)
+   : llvm::Constant::getNullValue(VarTy),
 VD->getName(),
 /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal,
 CGM.getContext().getTargetAddressSpace(AS));


Index: clang/test/OpenMP/target_team_variable_codegen.cpp

[PATCH] D146552: [Clang][OpenMP] Enable device-mapped constexpr class members to not be optimized out

2023-03-23 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit: 0eabf59528f3c3f64923900cae740d9f26c45ae8 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146552/new/

https://reviews.llvm.org/D146552

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146552: [Clang][OpenMP] Enable device-mapped constexpr class members to not be optimized out

2023-03-22 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 507485.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146552/new/

https://reviews.llvm.org/D146552

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/declare_target_constexpr_codegen.cpp
  openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp


Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/test/OpenMP/declare_target_constexpr_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/declare_target_constexpr_codegen.cpp
@@ -0,0 +1,40 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex 
"llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" 
"_[0-9a-zA-Z]+anotherPi" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck 
%s --check-prefix=CHECK
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+#pragma omp declare target
+class A {
+public:
+  static constexpr double pi = 3.141592653589793116;
+//.
+// CHECK: @_ZN1A2piE = linkonce_odr constant double 0x400921FB54442D18, 
comdat, align 8
+// CHECK: @_ZL9anotherPi = internal constant double 3.14e+00, align 8
+// CHECK: @llvm.compiler.used = appending global [2 x ptr] [ptr 
@"__ZN1A2piE$ref", ptr @"__ZL9anotherPi$ref"], section "llvm.metadata"
+//.
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+void F(const double &);
+void Test() { F(A::pi); }
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+#endif
+
+
+//
+ NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+// CHECK: {{.*}}
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10387,7 +10387,9 @@
 }
 Linkage = CGM.getLLVMLinkageVarDefinition(VD, /*IsConstant=*/false);
 // Temp solution to prevent optimizations of the internal variables.
-if (CGM.getLangOpts().OpenMPIsDevice && !VD->isExternallyVisible()) {
+if (CGM.getLangOpts().OpenMPIsDevice &&
+(!VD->isExternallyVisible() ||
+ Linkage == llvm::GlobalValue::LinkOnceODRLinkage)) {
   // Do not create a "ref-variable" if the original is not also available
   // on the host.
   if (!OffloadEntriesInfoManager.hasDeviceGlobalVarEntryInfo(VarName))


Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/test/OpenMP/declare_target_constexpr_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/declare_target_constexpr_codegen.cpp
@@ -0,0 +1,40 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py

[PATCH] D146552: [Clang][OpenMP] Enable device-mapped constexpr class members to not be optimized out

2023-03-22 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 507483.
doru1004 added a comment.

Updated lit test to show variable added to compiler used vars.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146552/new/

https://reviews.llvm.org/D146552

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/declare_target_constexpr_codegen.cpp
  openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp


Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/test/OpenMP/declare_target_constexpr_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/declare_target_constexpr_codegen.cpp
@@ -0,0 +1,40 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals --prefix-filecheck-ir-name _ --global-value-regex 
"llvm.compiler.used" "_[0-9a-zA-Z]+A[0-9a-zA-Z]+pi[0-9a-zA-Z]+" 
"_[0-9a-zA-Z]+A[0-9a-zA-Z]+anotherPi[0-9a-zA-Z]+" --version 2
+// REQUIRES: amdgpu-registered-target
+
+
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-target-debug 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck 
%s --check-prefix=CHECK
+
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+#pragma omp declare target
+class A {
+public:
+  static constexpr double pi = 3.141592653589793116;
+//.
+// CHECK: @_ZN1A2piE = linkonce_odr constant double 0x400921FB54442D18, 
comdat, align 8
+// CHECK: @_ZL9anotherPi = internal constant double 3.14e+00, align 8
+// CHECK: @llvm.compiler.used = appending global [2 x ptr] [ptr 
@"__ZN1A2piE$ref", ptr @"__ZL9anotherPi$ref"], section "llvm.metadata"
+//.
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+void F(const double &);
+void Test() { F(A::pi); }
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+#endif
+
+
+//
+ NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+// CHECK: {{.*}}
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10387,7 +10387,9 @@
 }
 Linkage = CGM.getLLVMLinkageVarDefinition(VD, /*IsConstant=*/false);
 // Temp solution to prevent optimizations of the internal variables.
-if (CGM.getLangOpts().OpenMPIsDevice && !VD->isExternallyVisible()) {
+if (CGM.getLangOpts().OpenMPIsDevice &&
+(!VD->isExternallyVisible() ||
+ Linkage == llvm::GlobalValue::LinkOnceODRLinkage)) {
   // Do not create a "ref-variable" if the original is not also available
   // on the host.
   if (!OffloadEntriesInfoManager.hasDeviceGlobalVarEntryInfo(VarName))


Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/test/OpenMP/declare_target_constexpr_codegen.cpp
===
--- /dev/null

[PATCH] D144569: [Clang][OpenMP] Fix accessing of aligned arrays in offloaded target regions

2023-03-22 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit: 65a0d669b4625c34775436a6d3643d15bbc2465a 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144569/new/

https://reviews.llvm.org/D144569

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146552: [Clang][OpenMP] Enable device-mapped constexpr class members to not be optimized out

2023-03-21 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 507190.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146552/new/

https://reviews.llvm.org/D146552

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/declare_target_constexpr_codegen.cpp
  openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp

Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/test/OpenMP/declare_target_constexpr_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/declare_target_constexpr_codegen.cpp
@@ -0,0 +1,79 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-globals --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK1
+// expected-no-diagnostics
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  return (int)(a[0] + a[1]);
+}
+
+//.
+// CHECK1: @__omp_rtl_debug_kind = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_no_thread_state = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @_ZN1A2piE = available_externally addrspace(4) constant double 0x400921FB54442D18, align 8
+// CHECK1: @0 = private unnamed_addr constant [23 x i8] c"
+// CHECK1: @1 = private unnamed_addr addrspace(1) constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @0 }, align 8
+// CHECK1: @__omp_offloading_fd00_240171e_main_l24_exec_mode = weak protected addrspace(1) constant i8 1
+// CHECK1: @_ZL9anotherPi = internal addrspace(4) constant double 3.14e+00, align 8
+// CHECK1: @"__ZL9anotherPi$ref" = internal constant ptr addrspace(4) @_ZL9anotherPi
+// CHECK1: @llvm.compiler.used = appending addrspace(1) global [2 x ptr] [ptr addrspacecast (ptr addrspace(1) @__omp_offloading_fd00_240171e_main_l24_exec_mode to ptr), ptr @"__ZL9anotherPi$ref"], section "llvm.metadata"
+//.
+// CHECK1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l24
+// CHECK1-SAME: (ptr noundef nonnull align 8 dereferenceable(16) [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK1-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// CHECK1-NEXT:store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// CHECK1-NEXT:[[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8
+// CHECK1-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK1-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK1-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK1:   user_code.entry:
+// CHECK1-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds [2 x double], ptr [[TMP0]], i64 0, i64 0
+// CHECK1-NEXT:store double 0x400921FB54442D18, ptr [[ARRAYIDX]], align 8
+// CHECK1-NEXT:[[ARRAYIDX1:%.*]] = getelementptr

[PATCH] D146552: [Clang][OpenMP] Enable device-mapped constexpr class members to not be optimized out

2023-03-21 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 507114.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146552/new/

https://reviews.llvm.org/D146552

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/declare_target_constexpr_codegen.cpp
  openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp

Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/test/OpenMP/declare_target_constexpr_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/declare_target_constexpr_codegen.cpp
@@ -0,0 +1,76 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-globals --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK1
+// expected-no-diagnostics
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  return (int)(a[0] + a[1]);
+}
+
+//.
+// CHECK1: @__omp_rtl_debug_kind = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_no_thread_state = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden addrspace(1) constant i32 0
+// CHECK1: @_ZN1A2piE = available_externally addrspace(4) constant double 0x400921FB54442D18, align 8
+// CHECK1: @0 = private unnamed_addr constant [23 x i8] c"
+// CHECK1: @1 = private unnamed_addr addrspace(1) constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @0 }, align 8
+// CHECK1: @__omp_offloading_fd00_240171e_main_l21_exec_mode = weak protected addrspace(1) constant i8 1
+// CHECK1: @_ZL9anotherPi = internal addrspace(4) constant double 3.14e+00, align 8
+// CHECK1: @"__ZL9anotherPi$ref" = internal constant ptr addrspace(4) @_ZL9anotherPi
+// CHECK1: @llvm.compiler.used = appending addrspace(1) global [2 x ptr] [ptr addrspacecast (ptr addrspace(1) @__omp_offloading_fd00_240171e_main_l21_exec_mode to ptr), ptr @"__ZL9anotherPi$ref"], section "llvm.metadata"
+//.
+// CHECK1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l21
+// CHECK1-SAME: (ptr noundef nonnull align 8 dereferenceable(16) [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK1-NEXT:[[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// CHECK1-NEXT:store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// CHECK1-NEXT:[[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8
+// CHECK1-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK1-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// CHECK1-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK1:   user_code.entry:
+// CHECK1-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds [2 x double], ptr [[TMP0]], i64 0, i64 0
+// CHECK1-NEXT:store double 0x400921FB54442D18, ptr [[ARRAYIDX]], align 8
+// CHECK1-NEXT:[[ARRAYIDX1:%.*]] = getelementptr

[PATCH] D146552: [Clang][OpenMP] Enable device-mapped constexpr class members to not be optimized out

2023-03-21 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

In D146552#4210757 , @jhuber6 wrote:

> We should have a clang test as well

Agreed, working on one currently.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146552/new/

https://reviews.llvm.org/D146552

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146552: [Clang][OpenMP] Enable device-mapped constexpr class members to not be optimized out

2023-03-21 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: ronl, carlo.bertolli, jhuber6, jdoerfert, 
gregrodgers, dhruvachak.
doru1004 added a project: OpenMP.
Herald added subscribers: sunshaoce, guansong, yaxunl.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added subscribers: openmp-commits, cfe-commits, jplehr, sstefan1.
Herald added a project: clang.

This patch fixes an issue whereby a constexpr class member which is mapped to 
the device is being optimized out thus leading to a runtime error:

  Libomptarget error: Unable to generate entries table for device id 0.
  Libomptarget error: Failed to init globals on device 0

This is due to the optimized-out variable not being present when host entry 
table values are matched with their device counterparts.

A currently failing example is included in the runtime test included in this 
patch.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D146552

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp


Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10377,7 +10377,9 @@
 }
 Linkage = CGM.getLLVMLinkageVarDefinition(VD, /*IsConstant=*/false);
 // Temp solution to prevent optimizations of the internal variables.
-if (CGM.getLangOpts().OpenMPIsDevice && !VD->isExternallyVisible()) {
+if (CGM.getLangOpts().OpenMPIsDevice &&
+(!VD->isExternallyVisible() ||
+ Linkage == llvm::GlobalValue::LinkOnceODRLinkage)) {
   // Do not create a "ref-variable" if the original is not also available
   // on the host.
   if (!OffloadEntriesInfoManager.hasDeviceGlobalVarEntryInfo(VarName))


Index: openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/target_constexpr_mapping.cpp
@@ -0,0 +1,34 @@
+// RUN: %libomptarget-compileoptxx-run-and-check-generic
+
+#include 
+#include 
+
+#pragma omp declare target
+class A {
+public:
+  constexpr static double pi = 3.141592653589793116;
+  A() { ; }
+  ~A() { ; }
+};
+#pragma omp end declare target
+
+#pragma omp declare target
+constexpr static double anotherPi = 3.14;
+#pragma omp end declare target
+
+int main() {
+  double a[2];
+#pragma omp target map(tofrom : a[:2])
+  {
+a[0] = A::pi;
+a[1] = anotherPi;
+  }
+
+  // CHECK: pi = 3.141592653589793116
+  printf("pi = %.18f\n", a[0]);
+
+  // CHECK: anotherPi = 3.14
+  printf("anotherPi = %.2f\n", a[1]);
+
+  return 0;
+}
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10377,7 +10377,9 @@
 }
 Linkage = CGM.getLLVMLinkageVarDefinition(VD, /*IsConstant=*/false);
 // Temp solution to prevent optimizations of the internal variables.
-if (CGM.getLangOpts().OpenMPIsDevice && !VD->isExternallyVisible()) {
+if (CGM.getLangOpts().OpenMPIsDevice &&
+(!VD->isExternallyVisible() ||
+ Linkage == llvm::GlobalValue::LinkOnceODRLinkage)) {
   // Do not create a "ref-variable" if the original is not also available
   // on the host.
   if (!OffloadEntriesInfoManager.hasDeviceGlobalVarEntryInfo(VarName))
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144569: [Clang][OpenMP] Fix accessing of aligned arrays in offloaded target regions

2023-03-14 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 505157.
Herald added a subscriber: sunshaoce.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144569/new/

https://reviews.llvm.org/D144569

Files:
  clang/lib/Sema/SemaOpenMP.cpp
  clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
  clang/test/OpenMP/parallel_firstprivate_codegen.cpp
  clang/test/OpenMP/parallel_master_taskloop_firstprivate_codegen.cpp
  clang/test/OpenMP/parallel_master_taskloop_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/target_firstprivate_codegen.cpp
  clang/test/OpenMP/target_is_device_ptr_codegen.cpp
  clang/test/OpenMP/teams_firstprivate_codegen.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144569: [Clang][OpenMP] Fix accessing of aligned arrays in offloaded target regions

2023-02-24 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:2274
+  if (!IsByRef) {
+if ((Ctx.getTargetInfo().getTriple().isAMDGCN()) ||
+(Ctx.getTargetInfo().getTriple().isNVPTX())) {

jhuber6 wrote:
> doru1004 wrote:
> > jhuber6 wrote:
> > > Why does this handling need to be different between CPU and GPU 
> > > offloading? Strictly speaking, I'm not sure why we need the alignment 
> > > type here since we'd only get improper alignment on primitive types. So I 
> > > figured that it should only care about the alignment of the type itself 
> > > in all cases. Maybe someone can correct me on that.
> > Are you saying that the previous check was not correct?
> This is the first I've looked at this code, so I don't know what the 
> intention was. But I would assume it's just making sure that the alignment of 
> the `uintptr_t` is sufficient to contain the by-value copy without causing an 
> addressing error. By that logic I figured it would only care about the 
> alignment of the type, not the declaration itself.
Assuming that what was there before was correct, then you're saying that the 
Decl type is always the same as Ty. Is that the case?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144569/new/

https://reviews.llvm.org/D144569

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144569: [Clang][OpenMP] Fix accessing of aligned arrays in offloaded target regions

2023-02-24 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:2274
+  if (!IsByRef) {
+if ((Ctx.getTargetInfo().getTriple().isAMDGCN()) ||
+(Ctx.getTargetInfo().getTriple().isNVPTX())) {

jhuber6 wrote:
> Why does this handling need to be different between CPU and GPU offloading? 
> Strictly speaking, I'm not sure why we need the alignment type here since 
> we'd only get improper alignment on primitive types. So I figured that it 
> should only care about the alignment of the type itself in all cases. Maybe 
> someone can correct me on that.
Are you saying that the previous check was not correct?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144569/new/

https://reviews.llvm.org/D144569

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144569: [Clang][OpenMP] Fix accessing of aligned arrays in offloaded target regions

2023-02-22 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: jdoerfert, jhuber6, ronl, carlo.bertolli, arsenm, 
gregrodgers, ABataev.
doru1004 added a project: OpenMP.
Herald added subscribers: kosarev, kerbowa, guansong, yaxunl, jvesely.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1, wdng.
Herald added a project: clang.

This patch fixes a memory error that occurs when we access an aligned array on 
the device:

  void write_index(int*a, int N) {
  int *aptr __attribute__ ((aligned(64))) = a; // This failed but is fixed 
by this patch.
  #pragma omp target teams distribute parallel for map(tofrom: aptr[0:N])
  for(int i=0;ihttps://reviews.llvm.org/D144569

Files:
  clang/lib/Sema/SemaOpenMP.cpp
  clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c

Index: clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
===
--- /dev/null
+++ clang/test/OpenMP/amdgpu_target_with_aligned_attribute.c
@@ -0,0 +1,305 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// REQUIRES: amdgpu-registered-target
+
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-amd.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host-amd.bc -o - | FileCheck %s --check-prefix=CHECK-AMD
+
+
+void write_to_aligned_array(int *a, int N) {
+  int *aptr __attribute__ ((aligned(64))) = a;
+  #pragma omp target teams distribute parallel for map(tofrom: aptr[0:N])
+  for(int i = 0; i < N; i++) {
+aptr[i] = i;
+  }
+}
+
+#endif
+// CHECK-AMD-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_write_to_aligned_array_l14
+// CHECK-AMD-SAME: (i64 noundef [[N:%.*]], ptr noundef [[APTR:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-AMD-NEXT:  entry:
+// CHECK-AMD-NEXT:[[N_ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-AMD-NEXT:[[APTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-AMD-NEXT:[[N_CASTED:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-AMD-NEXT:[[DOTZERO_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-AMD-NEXT:[[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-AMD-NEXT:[[N_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N_ADDR]] to ptr
+// CHECK-AMD-NEXT:[[APTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[APTR_ADDR]] to ptr
+// CHECK-AMD-NEXT:[[N_CASTED_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[N_CASTED]] to ptr
+// CHECK-AMD-NEXT:[[DOTZERO_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTZERO_ADDR]] to ptr
+// CHECK-AMD-NEXT:[[DOTTHREADID_TEMP__ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTTHREADID_TEMP_]] to ptr
+// CHECK-AMD-NEXT:store i64 [[N]], ptr [[N_ADDR_ASCAST]], align 8
+// CHECK-AMD-NEXT:store ptr [[APTR]], ptr [[APTR_ADDR_ASCAST]], align 8
+// CHECK-AMD-NEXT:[[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 2, i1 false)
+// CHECK-AMD-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+// CHECK-AMD-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK-AMD:   user_code.entry:
+// CHECK-AMD-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr))
+// CHECK-AMD-NEXT:[[TMP2:%.*]] = load i32, ptr [[N_ADDR_ASCAST]], align 4
+// CHECK-AMD-NEXT:store i32 [[TMP2]], ptr [[N_CASTED_ASCAST]], align 4
+// CHECK-AMD-NEXT:[[TMP3:%.*]] = load i64, ptr [[N_CASTED_ASCAST]], align 8
+// CHECK-AMD-NEXT:[[TMP4:%.*]] = load ptr, ptr [[APTR_ADDR_ASCAST]], align 8
+// CHECK-AMD-NEXT:store i32 0, ptr [[DOTZERO_ADDR_ASCAST]], align 4
+// CHECK-AMD-NEXT:store i32 [[TMP1]], ptr [[DOTTHREADID_TEMP__ASCAST]], align 4
+// CHECK-AMD-NEXT:call void @__omp_outlined__(ptr [[DOTTHREADID_TEMP__ASCAST]], ptr [[DOTZERO_ADDR_ASCAST]], i64 [[TMP3]], ptr [[TMP4]]) #[[ATTR2:[0-9]+]]
+// CHECK-AMD-NEXT:call void @__kmpc_target_deinit(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i8 2)
+// CHECK-AMD-NEXT:ret void
+// CHECK-AMD:   worker.exit:
+// CHECK-AMD-NEXT:ret void
+//
+//
+// CHECK-AMD-LABEL: define {{[^@]+}}@__omp_outlined__
+// CHECK-AMD-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], i64 noundef [[N:%.*]], ptr noundef [[APTR:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-AMD-NEXT:  entry:
+// CHECK-AMD-NEXT:

[PATCH] D141528: [Clang][OpenMP] Fix loop directive nested inside a parallel

2023-01-20 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit: 1407dbeabcfed114f0918b022365d03713dac028 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141528/new/

https://reviews.llvm.org/D141528

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-20 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit 49d47c4d2f280d15d1de94c53b72b6ab3c127b35 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 490659.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/OpenMPKinds.def
  clang/include/clang/Basic/OpenMPKinds.h
  clang/include/clang/Sema/Sema.h
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
  clang/lib/Sema/TreeTransform.h
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/OpenMP/declare_mapper_ast_print.c
  clang/test/OpenMP/declare_mapper_messages.c
  clang/test/OpenMP/target_ast_print.cpp
  clang/test/OpenMP/target_map_messages.cpp

Index: clang/test/OpenMP/target_map_messages.cpp
===
--- clang/test/OpenMP/target_map_messages.cpp
+++ clang/test/OpenMP/target_map_messages.cpp
@@ -4,6 +4,7 @@
 // RUN: %clang_cc1 -verify=expected,lt50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=45 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=50 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,ge51,omp,ge51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=51 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,ge50,ge51,ge52,omp,ge52-omp -fopenmp -fno-openmp-extensions -fopenmp-version=52 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -DCCODE -verify -fopenmp -fno-openmp-extensions -ferror-limit 300 -x c %s -Wno-openmp -Wuninitialized
 
 // -fopenmp-simd, -fno-openmp-extensions
@@ -158,23 +159,28 @@
 // expected-error@+1 {{use of undeclared identifier 'present'}}
 #pragma omp target map(present)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[1:2],f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f[1:2])
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[:],f)
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
@@ -191,11 +197,15 @@
 // lt51-error@+1 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(present, present, tofrom: a)
 {}
+// ge52-omp-error@+5 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ompx-error@+3 {{same map type modifier has been specified more than once}}
 // ge51-omp-error@+2 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 2

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/declare_mapper_messages.c:33-36
+#pragma omp declare mapper(id2: struct vec vvec) 
map(iterator(it=0:vvec.len:2), tofrom:vvec.data[it])
+int var; // expected-note {{'var' declared here}}
+// expected-error@+1 {{only variable 'vvec' is allowed in map clauses of this 
'omp declare mapper' directive}}
+#pragma omp declare mapper(id3: struct vec vvec) 
map(iterator(it=0:vvec.len:2), tofrom:vvec.data[var])

ABataev wrote:
> doru1004 wrote:
> > Here we have both the positive and the negative declare mapper cases. 
> > Please let me know if you meant something different.
> Would be good to have ast print case and codegen
I can add ast print! Code gen for this map modifier is going to be done in a 
separate patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaExpr.cpp:5423-5426
+
+/// Act on the iterator variable declaration.
+ActOnOpenMPIteratorVarDecl(VD);
+

ABataev wrote:
> Can we register this variable only in declare mapper context, i.e. add a 
> check that we add it only for declare mapper?
Happy to do that!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/declare_mapper_messages.c:33-36
+#pragma omp declare mapper(id2: struct vec vvec) 
map(iterator(it=0:vvec.len:2), tofrom:vvec.data[it])
+int var; // expected-note {{'var' declared here}}
+// expected-error@+1 {{only variable 'vvec' is allowed in map clauses of this 
'omp declare mapper' directive}}
+#pragma omp declare mapper(id3: struct vec vvec) 
map(iterator(it=0:vvec.len:2), tofrom:vvec.data[var])

Here we have both the positive and the negative declare mapper cases. Please 
let me know if you meant something different.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:22399-22412
 bool Sema::isOpenMPDeclareMapperVarDeclAllowed(const VarDecl *VD) const {
   assert(LangOpts.OpenMP && "Expected OpenMP mode.");
   const Expr *Ref = DSAStack->getDeclareMapperVarRef();
   if (const auto *DRE = cast_or_null(Ref)) {
 if (VD->getCanonicalDecl() == DRE->getDecl()->getCanonicalDecl())
   return true;
 if (VD->isUsableInConstantExpressions(Context))

This input to this function is the VD variable I've been talking about. If you 
print it all out it's just a simple VarDecl:

if you do `VD->dump()`;
```
VarDecl 0x55b57f81dec8  col:52 implicit 
used it 'int'
```

if you do `VD->getType()->dump()`:
```
BuiltinType 0x55b57f6d2560 'int'
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:1166-1168
+return llvm::any_of(Top->IteratorVarDecls, [VD](const VarDecl *IteratorVD) 
{
+  return IteratorVD == VD->getCanonicalDecl();
+});

doru1004 wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > doru1004 wrote:
> > > > doru1004 wrote:
> > > > > ABataev wrote:
> > > > > > doru1004 wrote:
> > > > > > > ABataev wrote:
> > > > > > > > doru1004 wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > Do you really need to store the variable in the stack, is 
> > > > > > > > > > not it enough just to check that the type of this variable 
> > > > > > > > > > is BuiltinType::OMPIterator?
> > > > > > > > > I'm happy to replace this if you think it will work. Is there 
> > > > > > > > > an example somewhere in the code where I can get from the 
> > > > > > > > > VarDecl to the build in type you mention?
> > > > > > > > You have already a check 
> > > > > > > > IteratorModifier->getType()->isSpecificBuiltinType(BuiltinType::OMPIterator),
> > > > > > > >  you can you something similar for the variable
> > > > > > > This didn't work and I had to revert to using the stack!
> > > > > > Why?
> > > > > I checked the output of the check and it was false when it should 
> > > > > have been true! If you check the latest test that I added it 
> > > > > showcases the source code and in the case of OpenMP 5.2 you can see 
> > > > > that the message "only variable 'vvec' is allowed in map clauses of 
> > > > > this 'omp declare mapper' directive" doesn't appear when a legal 
> > > > > iteration variable is used.
> > > > > If I used the check you suggested then the error message appears.
> > > > > In the example you pasted the check is performed on a `Expr *`. In 
> > > > > the case here, we only have VD which is a VarDecl.
> > > > I am not sure how I can force it to have that type when it just 
> > > > doesn't. Do you have any suggestions?
> > > Did not get it. It still shall be of type builtintype::OMPIterator.
> > The VD that we are checking for this builtin is coming from somewhere else 
> > in the code, it is passed into the `Sema::DiagnoseUseOfDecl(` function. 
> > It's not a VarDecl that is under the control of anything added in this 
> > patch.
> This implementation is in line with the current checks for the declaration of 
> the mapper variable. You store the declaration onto the stack so that you can 
> compare it with the incoming VarDecl passed to the diagnose function.
Some debug printouts regarding VD:

> VD->dump();
```
VarDecl 0x55b57f81dec8  col:52 implicit 
used it 'int'
```

This is the type of the variable if you do `VD->getType()->dump()`:
```
BuiltinType 0x55b57f6d2560 'int'
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:1166-1168
+return llvm::any_of(Top->IteratorVarDecls, [VD](const VarDecl *IteratorVD) 
{
+  return IteratorVD == VD->getCanonicalDecl();
+});

doru1004 wrote:
> ABataev wrote:
> > doru1004 wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > doru1004 wrote:
> > > > > > ABataev wrote:
> > > > > > > doru1004 wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > Do you really need to store the variable in the stack, is not 
> > > > > > > > > it enough just to check that the type of this variable is 
> > > > > > > > > BuiltinType::OMPIterator?
> > > > > > > > I'm happy to replace this if you think it will work. Is there 
> > > > > > > > an example somewhere in the code where I can get from the 
> > > > > > > > VarDecl to the build in type you mention?
> > > > > > > You have already a check 
> > > > > > > IteratorModifier->getType()->isSpecificBuiltinType(BuiltinType::OMPIterator),
> > > > > > >  you can you something similar for the variable
> > > > > > This didn't work and I had to revert to using the stack!
> > > > > Why?
> > > > I checked the output of the check and it was false when it should have 
> > > > been true! If you check the latest test that I added it showcases the 
> > > > source code and in the case of OpenMP 5.2 you can see that the message 
> > > > "only variable 'vvec' is allowed in map clauses of this 'omp declare 
> > > > mapper' directive" doesn't appear when a legal iteration variable is 
> > > > used.
> > > > If I used the check you suggested then the error message appears.
> > > > In the example you pasted the check is performed on a `Expr *`. In the 
> > > > case here, we only have VD which is a VarDecl.
> > > I am not sure how I can force it to have that type when it just doesn't. 
> > > Do you have any suggestions?
> > Did not get it. It still shall be of type builtintype::OMPIterator.
> The VD that we are checking for this builtin is coming from somewhere else in 
> the code, it is passed into the `Sema::DiagnoseUseOfDecl(` function. It's not 
> a VarDecl that is under the control of anything added in this patch.
This implementation is in line with the current checks for the declaration of 
the mapper variable. You store the declaration onto the stack so that you can 
compare it with the incoming VarDecl passed to the diagnose function.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:1166-1168
+return llvm::any_of(Top->IteratorVarDecls, [VD](const VarDecl *IteratorVD) 
{
+  return IteratorVD == VD->getCanonicalDecl();
+});

ABataev wrote:
> doru1004 wrote:
> > doru1004 wrote:
> > > ABataev wrote:
> > > > doru1004 wrote:
> > > > > ABataev wrote:
> > > > > > doru1004 wrote:
> > > > > > > ABataev wrote:
> > > > > > > > Do you really need to store the variable in the stack, is not 
> > > > > > > > it enough just to check that the type of this variable is 
> > > > > > > > BuiltinType::OMPIterator?
> > > > > > > I'm happy to replace this if you think it will work. Is there an 
> > > > > > > example somewhere in the code where I can get from the VarDecl to 
> > > > > > > the build in type you mention?
> > > > > > You have already a check 
> > > > > > IteratorModifier->getType()->isSpecificBuiltinType(BuiltinType::OMPIterator),
> > > > > >  you can you something similar for the variable
> > > > > This didn't work and I had to revert to using the stack!
> > > > Why?
> > > I checked the output of the check and it was false when it should have 
> > > been true! If you check the latest test that I added it showcases the 
> > > source code and in the case of OpenMP 5.2 you can see that the message 
> > > "only variable 'vvec' is allowed in map clauses of this 'omp declare 
> > > mapper' directive" doesn't appear when a legal iteration variable is used.
> > > If I used the check you suggested then the error message appears.
> > > In the example you pasted the check is performed on a `Expr *`. In the 
> > > case here, we only have VD which is a VarDecl.
> > I am not sure how I can force it to have that type when it just doesn't. Do 
> > you have any suggestions?
> Did not get it. It still shall be of type builtintype::OMPIterator.
The VD that we are checking for this builtin is coming from somewhere else in 
the code, it is passed into the `Sema::DiagnoseUseOfDecl(` function. It's not a 
VarDecl that is under the control of anything added in this patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:1166-1168
+return llvm::any_of(Top->IteratorVarDecls, [VD](const VarDecl *IteratorVD) 
{
+  return IteratorVD == VD->getCanonicalDecl();
+});

doru1004 wrote:
> ABataev wrote:
> > doru1004 wrote:
> > > ABataev wrote:
> > > > doru1004 wrote:
> > > > > ABataev wrote:
> > > > > > Do you really need to store the variable in the stack, is not it 
> > > > > > enough just to check that the type of this variable is 
> > > > > > BuiltinType::OMPIterator?
> > > > > I'm happy to replace this if you think it will work. Is there an 
> > > > > example somewhere in the code where I can get from the VarDecl to the 
> > > > > build in type you mention?
> > > > You have already a check 
> > > > IteratorModifier->getType()->isSpecificBuiltinType(BuiltinType::OMPIterator),
> > > >  you can you something similar for the variable
> > > This didn't work and I had to revert to using the stack!
> > Why?
> I checked the output of the check and it was false when it should have been 
> true! If you check the latest test that I added it showcases the source code 
> and in the case of OpenMP 5.2 you can see that the message "only variable 
> 'vvec' is allowed in map clauses of this 'omp declare mapper' directive" 
> doesn't appear when a legal iteration variable is used.
> If I used the check you suggested then the error message appears.
> In the example you pasted the check is performed on a `Expr *`. In the case 
> here, we only have VD which is a VarDecl.
I am not sure how I can force it to have that type when it just doesn't. Do you 
have any suggestions?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:1166-1168
+return llvm::any_of(Top->IteratorVarDecls, [VD](const VarDecl *IteratorVD) 
{
+  return IteratorVD == VD->getCanonicalDecl();
+});

ABataev wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > doru1004 wrote:
> > > > ABataev wrote:
> > > > > Do you really need to store the variable in the stack, is not it 
> > > > > enough just to check that the type of this variable is 
> > > > > BuiltinType::OMPIterator?
> > > > I'm happy to replace this if you think it will work. Is there an 
> > > > example somewhere in the code where I can get from the VarDecl to the 
> > > > build in type you mention?
> > > You have already a check 
> > > IteratorModifier->getType()->isSpecificBuiltinType(BuiltinType::OMPIterator),
> > >  you can you something similar for the variable
> > This didn't work and I had to revert to using the stack!
> Why?
I checked the output of the check and it was false when it should have been 
true! If you check the latest test that I added it showcases the source code 
and in the case of OpenMP 5.2 you can see that the message "only variable 
'vvec' is allowed in map clauses of this 'omp declare mapper' directive" 
doesn't appear when a legal iteration variable is used.
If I used the check you suggested then the error message appears.
In the example you pasted the check is performed on a `Expr *`. In the case 
here, we only have VD which is a VarDecl.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:1166-1168
+return llvm::any_of(Top->IteratorVarDecls, [VD](const VarDecl *IteratorVD) 
{
+  return IteratorVD == VD->getCanonicalDecl();
+});

ABataev wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > Do you really need to store the variable in the stack, is not it enough 
> > > just to check that the type of this variable is BuiltinType::OMPIterator?
> > I'm happy to replace this if you think it will work. Is there an example 
> > somewhere in the code where I can get from the VarDecl to the build in type 
> > you mention?
> You have already a check 
> IteratorModifier->getType()->isSpecificBuiltinType(BuiltinType::OMPIterator), 
> you can you something similar for the variable
This didn't work and I had to revert to using the stack!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 490277.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/OpenMPKinds.def
  clang/include/clang/Basic/OpenMPKinds.h
  clang/include/clang/Sema/Sema.h
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
  clang/lib/Sema/TreeTransform.h
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/OpenMP/declare_mapper_messages.c
  clang/test/OpenMP/target_ast_print.cpp
  clang/test/OpenMP/target_map_messages.cpp

Index: clang/test/OpenMP/target_map_messages.cpp
===
--- clang/test/OpenMP/target_map_messages.cpp
+++ clang/test/OpenMP/target_map_messages.cpp
@@ -4,6 +4,7 @@
 // RUN: %clang_cc1 -verify=expected,lt50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=45 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=50 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,ge51,omp,ge51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=51 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,ge50,ge51,ge52,omp,ge52-omp -fopenmp -fno-openmp-extensions -fopenmp-version=52 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -DCCODE -verify -fopenmp -fno-openmp-extensions -ferror-limit 300 -x c %s -Wno-openmp -Wuninitialized
 
 // -fopenmp-simd, -fno-openmp-extensions
@@ -158,23 +159,28 @@
 // expected-error@+1 {{use of undeclared identifier 'present'}}
 #pragma omp target map(present)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[1:2],f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f[1:2])
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[:],f)
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
@@ -191,11 +197,15 @@
 // lt51-error@+1 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(present, present, tofrom: a)
 {}
+// ge52-omp-error@+5 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ompx-error@+3 {{same map type modifier has been specified more than once}}
 // ge51-omp-error@+2 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 2 {{incorrect map type modifier, expected one of:

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/target_map_messages.cpp:970-979
+  // ompx-error@+8 {{use of undeclared identifier 'itt'; did you mean 'it'?}}
+  // ompx-note@+7 {{'it' declared here}}
+  // omp-error@+6 {{use of undeclared identifier 'itt'; did you mean 'it'?}}
+  // omp-note@+5 {{'it' declared here}}
+  // ge51-ompx-error@+4 {{incorrect map type modifier, expected one of: 
'always', 'close', 'mapper', 'present', 'ompx_hold'}}
+  // lt51-ompx-error@+3 {{incorrect map type modifier, expected one of: 
'always', 'close', 'mapper', 'ompx_hold'}}
+  // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 
'always', 'close', 'mapper', 'present'}}

ABataev wrote:
> doru1004 wrote:
> > ABataev wrote:
> > > Test cases for wrong variables in mappers?
> > You mean as part of the iterator ? like iterator(it = 0:UndefVar) ?
> I mean, you have a check that in mappers only iterator vars are allowed. Can 
> you add a check for this?
As far as I understand it the additional check I added to the existing mapper 
checks is needed because of the way the mapper check was written. The mapper 
checks looks at declarations and if a mapper clause exists then it assumes that 
the declaration must be coming from that mapper clause. This used to hold in 
the past since that was the only way you could have a declaration. This is not 
true anymore since we can now have declarations coming from both the mapper and 
the iterator modifier. I'll add a test to showcase this.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/target_map_messages.cpp:970-979
+  // ompx-error@+8 {{use of undeclared identifier 'itt'; did you mean 'it'?}}
+  // ompx-note@+7 {{'it' declared here}}
+  // omp-error@+6 {{use of undeclared identifier 'itt'; did you mean 'it'?}}
+  // omp-note@+5 {{'it' declared here}}
+  // ge51-ompx-error@+4 {{incorrect map type modifier, expected one of: 
'always', 'close', 'mapper', 'present', 'ompx_hold'}}
+  // lt51-ompx-error@+3 {{incorrect map type modifier, expected one of: 
'always', 'close', 'mapper', 'ompx_hold'}}
+  // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 
'always', 'close', 'mapper', 'present'}}

ABataev wrote:
> Test cases for wrong variables in mappers?
You mean as part of the iterator ? like iterator(it = 0:UndefVar) ?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-18 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 490188.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/OpenMPKinds.def
  clang/include/clang/Basic/OpenMPKinds.h
  clang/include/clang/Sema/Sema.h
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
  clang/lib/Sema/TreeTransform.h
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/OpenMP/target_ast_print.cpp
  clang/test/OpenMP/target_map_messages.cpp

Index: clang/test/OpenMP/target_map_messages.cpp
===
--- clang/test/OpenMP/target_map_messages.cpp
+++ clang/test/OpenMP/target_map_messages.cpp
@@ -4,6 +4,7 @@
 // RUN: %clang_cc1 -verify=expected,lt50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=45 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=50 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,ge51,omp,ge51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=51 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,ge50,ge51,ge52,omp,ge52-omp -fopenmp -fno-openmp-extensions -fopenmp-version=52 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -DCCODE -verify -fopenmp -fno-openmp-extensions -ferror-limit 300 -x c %s -Wno-openmp -Wuninitialized
 
 // -fopenmp-simd, -fno-openmp-extensions
@@ -158,23 +159,28 @@
 // expected-error@+1 {{use of undeclared identifier 'present'}}
 #pragma omp target map(present)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[1:2],f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f[1:2])
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[:],f)
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
@@ -191,11 +197,15 @@
 // lt51-error@+1 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(present, present, tofrom: a)
 {}
+// ge52-omp-error@+5 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ompx-error@+3 {{same map type modifier has been specified more than once}}
 // ge51-omp-error@+2 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-17 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/include/clang/Parse/Parser.h:3474-3475
   bool parseMapperModifier(Sema::OpenMPVarListDataTy );
+  /// Parses the iterator modifier in map clause.
+  bool parseIteratorModifier(Sema::OpenMPVarListDataTy );
   /// Parses map-type-modifiers in map clause.

ABataev wrote:
> Where is it defined, cannot find it in the patch?
Leftover, removed it.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:1166-1168
+return llvm::any_of(Top->IteratorVarDecls, [VD](const VarDecl *IteratorVD) 
{
+  return IteratorVD == VD->getCanonicalDecl();
+});

ABataev wrote:
> Do you really need to store the variable in the stack, is not it enough just 
> to check that the type of this variable is BuiltinType::OMPIterator?
I'm happy to replace this if you think it will work. Is there an example 
somewhere in the code where I can get from the VarDecl to the build in type you 
mention?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-17 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 489942.
doru1004 marked 3 inline comments as done.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/OpenMPKinds.def
  clang/include/clang/Basic/OpenMPKinds.h
  clang/include/clang/Sema/Sema.h
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
  clang/lib/Sema/TreeTransform.h
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/OpenMP/target_ast_print.cpp
  clang/test/OpenMP/target_map_messages.cpp

Index: clang/test/OpenMP/target_map_messages.cpp
===
--- clang/test/OpenMP/target_map_messages.cpp
+++ clang/test/OpenMP/target_map_messages.cpp
@@ -4,6 +4,7 @@
 // RUN: %clang_cc1 -verify=expected,lt50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=45 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=50 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,ge51,omp,ge51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=51 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,ge50,ge51,ge52,omp,ge52-omp -fopenmp -fno-openmp-extensions -fopenmp-version=52 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -DCCODE -verify -fopenmp -fno-openmp-extensions -ferror-limit 300 -x c %s -Wno-openmp -Wuninitialized
 
 // -fopenmp-simd, -fno-openmp-extensions
@@ -158,23 +159,28 @@
 // expected-error@+1 {{use of undeclared identifier 'present'}}
 #pragma omp target map(present)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[1:2],f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f[1:2])
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[:],f)
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
@@ -191,11 +197,15 @@
 // lt51-error@+1 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(present, present, tofrom: a)
 {}
+// ge52-omp-error@+5 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ompx-error@+3 {{same map type modifier has been specified more than once}}
 // ge51-omp-error@+2 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 2 {{incorrect map type modifier, expected one of:

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-17 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 489917.
doru1004 marked 3 inline comments as done.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/OpenMPKinds.def
  clang/include/clang/Basic/OpenMPKinds.h
  clang/include/clang/Parse/Parser.h
  clang/include/clang/Sema/Sema.h
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
  clang/lib/Sema/TreeTransform.h
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/OpenMP/target_ast_print.cpp
  clang/test/OpenMP/target_map_messages.cpp

Index: clang/test/OpenMP/target_map_messages.cpp
===
--- clang/test/OpenMP/target_map_messages.cpp
+++ clang/test/OpenMP/target_map_messages.cpp
@@ -4,6 +4,7 @@
 // RUN: %clang_cc1 -verify=expected,lt50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=45 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,lt51,omp,lt51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=50 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -verify=expected,ge50,ge51,omp,ge51-omp -fopenmp -fno-openmp-extensions -fopenmp-version=51 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
+// RUN: %clang_cc1 -verify=expected,ge50,ge51,ge52,omp,ge52-omp -fopenmp -fno-openmp-extensions -fopenmp-version=52 -ferror-limit 300 %s -Wno-openmp-target -Wuninitialized
 // RUN: %clang_cc1 -DCCODE -verify -fopenmp -fno-openmp-extensions -ferror-limit 300 -x c %s -Wno-openmp -Wuninitialized
 
 // -fopenmp-simd, -fno-openmp-extensions
@@ -158,23 +159,28 @@
 // expected-error@+1 {{use of undeclared identifier 'present'}}
 #pragma omp target map(present)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[1:2],f)
 {}
+// ge52-omp-error@+3 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c,f[1:2])
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(ompx_hold, tofrom: c[:],f)
 {}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // expected-error@+3 {{section length is unspecified and cannot be inferred because subscripted value is not an array}}
 // ge51-omp-error@+2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
@@ -191,11 +197,15 @@
 // lt51-error@+1 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper'}}
 #pragma omp target map(present, present, tofrom: a)
 {}
+// ge52-omp-error@+5 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
+// ge52-omp-error@+4 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present', 'iterator'}}
 // ompx-error@+3 {{same map type modifier has been specified more than once}}
 // ge51-omp-error@+2 2 {{incorrect map type modifier, expected one of: 'always', 'close', 'mapper', 'present'}}
 // lt51-omp-error@+1 2 {{incorrect

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-17 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 489787.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141871/new/

https://reviews.llvm.org/D141871

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/OpenMPKinds.def
  clang/include/clang/Basic/OpenMPKinds.h
  clang/include/clang/Parse/Parser.h
  clang/include/clang/Sema/Sema.h
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
  clang/lib/Sema/TreeTransform.h
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/OpenMP/target_ast_print.cpp

Index: clang/test/OpenMP/target_ast_print.cpp
===
--- clang/test/OpenMP/target_ast_print.cpp
+++ clang/test/OpenMP/target_ast_print.cpp
@@ -1139,6 +1139,60 @@
 }
 #endif // OMP51
 
+#ifdef OMP52
+
+///==///
+// RUN: %clang_cc1 -DOMP52 -verify -fopenmp -fopenmp-version=52 -ast-print %s | FileCheck %s --check-prefix OMP52
+// RUN: %clang_cc1 -DOMP52 -fopenmp -fopenmp-version=52 -x c++ -std=c++11 -emit-pch -o %t %s
+// RUN: %clang_cc1 -DOMP52 -fopenmp -fopenmp-version=52 -std=c++11 -include-pch %t -fsyntax-only -verify %s -ast-print | FileCheck %s --check-prefix OMP52
+
+// RUN: %clang_cc1 -DOMP52 -verify -fopenmp-simd -fopenmp-version=52 -ast-print %s | FileCheck %s --check-prefix OMP52
+// RUN: %clang_cc1 -DOMP52 -fopenmp-simd -fopenmp-version=52 -x c++ -std=c++11 -emit-pch -o %t %s
+// RUN: %clang_cc1 -DOMP52 -fopenmp-simd -fopenmp-version=52 -std=c++11 -include-pch %t -fsyntax-only -verify %s -ast-print | FileCheck %s --check-prefix OMP52
+
+void foo() {}
+
+template 
+T tmain(T argc, T *argv) {
+  int N = 100;
+  int v[N];
+  #pragma omp target map(iterator(it = 0:N:2), to: v[it])
+  foo();
+  #pragma omp target map(iterator(it = 0:N:4), from: v[it])
+  foo();
+
+  return 0;
+}
+
+// OMP52: template  T tmain(T argc, T *argv) {
+// OMP52-NEXT: int N = 100;
+// OMP52-NEXT: int v[N];
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:N:2),to: v[it])
+// OMP52-NEXT: foo()
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:N:4),from: v[it])
+// OMP52-NEXT: foo()
+
+// OMP52-LABEL: int main(int argc, char **argv) {
+int main (int argc, char **argv) {
+  int i, j, a[20], always, close;
+// OMP52-NEXT: int i, j, a[20]
+#pragma omp target
+// OMP52-NEXT: #pragma omp target
+  foo();
+// OMP52-NEXT: foo();
+#pragma omp target map(iterator(it = 0:20:2), to: a[it])
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:20:2),to: a[it])
+  foo();
+// OMP52-NEXT: foo();
+#pragma omp target map(iterator(it = 0:20:4), from: a[it])
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:20:4),from: a[it])
+foo();
+// OMP52-NEXT: foo();
+
+  return tmain(argc, ) + tmain(argv[0][0], argv[0]);
+}
+#endif // OMP52
+
 #ifdef OMPX
 
 // RUN: %clang_cc1 -DOMPX -verify -fopenmp -fopenmp-extensions -ast-print %s | FileCheck %s --check-prefix=OMPX
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -6792,6 +6792,8 @@
   for (unsigned I = 0; I < NumberOfOMPMapClauseModifiers; ++I) {
 Record.push_back(C->getMapTypeModifier(I));
 Record.AddSourceLocation(C->getMapTypeModifierLoc(I));
+if (C->getMapTypeModifier(I) == OMPC_MAP_MODIFIER_iterator)
+  Record.AddStmt(C->getIteratorModifier());
   }
   Record.AddNestedNameSpecifierLoc(C->getMapperQualifierLoc());
   Record.AddDeclarationNameInfo(C->getMapperIdInfo());
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -10773,6 +10773,8 @@
 C->setMapTypeModifier(
 I, static_cast(Record.readInt()));
 C->setMapTypeModifierLoc(I, Record.readSourceLocation());
+if (C->getMapTypeModifier(I) == OMPC_MAP_MODIFIER_iterator)
+  C->setIteratorModifier(Record.readSubExpr());
   }
   C->setMapperQualifierLoc(Record.readNestedNameSpecifierLoc());
   C->setMapperIdInfo(Record.readDeclarationNameInfo());
Index: clang/lib/Sema/TreeTransform.h
===
--- clang/lib/Sema/TreeTransform.h
+++ clang/lib/Sema/TreeTransform.h
@@ -1988,15 +1988,16 @@
   /// By default, performs semantic analysis to build the new OpenMP clause.
   /// Subclasses may override this routine to provide different behavior.
   OMPClause *RebuildOMPMapClause(
-  ArrayRef MapTypeModifiers,
+  Expr *IteratorModifier, ArrayRef MapTypeModifiers,
   ArrayRef MapTypeModifiersLoc,
   CXXScopeSpec

[PATCH] D141871: [Clang][OpenMP] Add parse and sema for iterator map modifier

2023-01-16 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: jdoerfert, ABataev, carlo.bertolli, ronl, 
gregrodgers, jhuber6.
doru1004 added a project: OpenMP.
Herald added subscribers: guansong, yaxunl.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

This patch adds parse and sema support for iterator map modifier.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D141871

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/include/clang/Basic/OpenMPKinds.def
  clang/include/clang/Basic/OpenMPKinds.h
  clang/include/clang/Parse/Parser.h
  clang/include/clang/Sema/Sema.h
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/Parse/ParseOpenMP.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
  clang/lib/Sema/TreeTransform.h
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/OpenMP/target_ast_print.cpp

Index: clang/test/OpenMP/target_ast_print.cpp
===
--- clang/test/OpenMP/target_ast_print.cpp
+++ clang/test/OpenMP/target_ast_print.cpp
@@ -1139,6 +1139,60 @@
 }
 #endif // OMP51
 
+#ifdef OMP52
+
+///==///
+// RUN: %clang_cc1 -DOMP52 -verify -fopenmp -fopenmp-version=52 -ast-print %s | FileCheck %s --check-prefix OMP52
+// RUN: %clang_cc1 -DOMP52 -fopenmp -fopenmp-version=52 -x c++ -std=c++11 -emit-pch -o %t %s
+// RUN: %clang_cc1 -DOMP52 -fopenmp -fopenmp-version=52 -std=c++11 -include-pch %t -fsyntax-only -verify %s -ast-print | FileCheck %s --check-prefix OMP52
+
+// RUN: %clang_cc1 -DOMP52 -verify -fopenmp-simd -fopenmp-version=52 -ast-print %s | FileCheck %s --check-prefix OMP52
+// RUN: %clang_cc1 -DOMP52 -fopenmp-simd -fopenmp-version=52 -x c++ -std=c++11 -emit-pch -o %t %s
+// RUN: %clang_cc1 -DOMP52 -fopenmp-simd -fopenmp-version=52 -std=c++11 -include-pch %t -fsyntax-only -verify %s -ast-print | FileCheck %s --check-prefix OMP52
+
+void foo() {}
+
+template 
+T tmain(T argc, T *argv) {
+  int N = 100;
+  int v[N];
+  #pragma omp target map(iterator(it = 0:N:2), to: v[it])
+  foo();
+  #pragma omp target map(iterator(it = 0:N:4), from: v[it])
+  foo();
+
+  return 0;
+}
+
+// OMP52: template  T tmain(T argc, T *argv) {
+// OMP52-NEXT: int N = 100;
+// OMP52-NEXT: int v[N];
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:N:2),to: v[it])
+// OMP52-NEXT: foo()
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:N:4),from: v[it])
+// OMP52-NEXT: foo()
+
+// OMP52-LABEL: int main(int argc, char **argv) {
+int main (int argc, char **argv) {
+  int i, j, a[20], always, close;
+// OMP52-NEXT: int i, j, a[20]
+#pragma omp target
+// OMP52-NEXT: #pragma omp target
+  foo();
+// OMP52-NEXT: foo();
+#pragma omp target map(iterator(it = 0:20:2), to: a[it])
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:20:2),to: a[it])
+  foo();
+// OMP52-NEXT: foo();
+#pragma omp target map(iterator(it = 0:20:4), from: a[it])
+// OMP52-NEXT: #pragma omp target map(iterator(int it = 0:20:4),from: a[it])
+foo();
+// OMP52-NEXT: foo();
+
+  return tmain(argc, ) + tmain(argv[0][0], argv[0]);
+}
+#endif // OMP52
+
 #ifdef OMPX
 
 // RUN: %clang_cc1 -DOMPX -verify -fopenmp -fopenmp-extensions -ast-print %s | FileCheck %s --check-prefix=OMPX
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -6792,6 +6792,8 @@
   for (unsigned I = 0; I < NumberOfOMPMapClauseModifiers; ++I) {
 Record.push_back(C->getMapTypeModifier(I));
 Record.AddSourceLocation(C->getMapTypeModifierLoc(I));
+if (C->getMapTypeModifier(I) == OMPC_MAP_MODIFIER_iterator)
+  Record.AddStmt(C->getIteratorModifier());
   }
   Record.AddNestedNameSpecifierLoc(C->getMapperQualifierLoc());
   Record.AddDeclarationNameInfo(C->getMapperIdInfo());
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -10773,6 +10773,8 @@
 C->setMapTypeModifier(
 I, static_cast(Record.readInt()));
 C->setMapTypeModifierLoc(I, Record.readSourceLocation());
+if (C->getMapTypeModifier(I) == OMPC_MAP_MODIFIER_iterator)
+  C->setIteratorModifier(Record.readSubExpr());
   }
   C->setMapperQualifierLoc(Record.readNestedNameSpecifierLoc());
   C->setMapperIdInfo(Record.readDeclarationNameInfo());
Index: clang/lib/Sema/TreeTransform.h
===
--- clang/lib/Sema/TreeTransform.h
+++ clang/lib/Sema/TreeTransform.h
@@ -1988,15 +1988,16 @@

[PATCH] D141528: [Clang][OpenMP] Fix loop directive nested inside a parallel

2023-01-16 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141528/new/

https://reviews.llvm.org/D141528

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141528: [Clang][OpenMP] Fix loop directive nested inside a parallel

2023-01-13 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 489105.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141528/new/

https://reviews.llvm.org/D141528

Files:
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/test/OpenMP/nested_loop_codegen.cpp

Index: clang/test/OpenMP/nested_loop_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/nested_loop_codegen.cpp
@@ -0,0 +1,950 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK3
+// RUN: %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -gno-column-info -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK4
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int outline_decl() {
+  int i, k;
+  #pragma omp parallel
+  for(i=0; i<10; i++) {
+#pragma omp loop
+for(k=0; k<5; k++) {
+  k++;
+}
+  }
+  return k;
+}
+
+int inline_decl() {
+  int i, res;
+  #pragma omp parallel
+  for(i=0; i<10; i++) {
+#pragma omp loop
+for(int k=0; k<5; k++) {
+  res++;
+}
+  }
+  return res;
+}
+
+#endif
+// CHECK1-LABEL: define {{[^@]+}}@_Z12outline_declv
+// CHECK1-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:[[K:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB1:[0-9]+]], i32 1, ptr @.omp_outlined., ptr [[I]])
+// CHECK1-NEXT:[[TMP0:%.*]] = load i32, ptr [[K]], align 4
+// CHECK1-NEXT:ret i32 [[TMP0]]
+//
+//
+// CHECK1-LABEL: define {{[^@]+}}@.omp_outlined.
+// CHECK1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[I:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[I_ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[K:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
+// CHECK1-NEXT:store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
+// CHECK1-NEXT:store ptr [[I]], ptr [[I_ADDR]], align 8
+// CHECK1-NEXT:[[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// CHECK1-NEXT:store i32 0, ptr [[TMP0]], align 4
+// CHECK1-NEXT:br label [[FOR_COND:%.*]]
+// CHECK1:   for.cond:
+// CHECK1-NEXT:[[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4
+// CHECK1-NEXT:[[CMP:%.*]] = icmp slt i32 [[TMP1]], 10
+// CHECK1-NEXT:br i1

[PATCH] D141528: [Clang][OpenMP] Fix loop directive nested inside a parallel

2023-01-13 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 489035.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141528/new/

https://reviews.llvm.org/D141528

Files:
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/test/OpenMP/nested_loop_codegen.cpp

Index: clang/test/OpenMP/nested_loop_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/nested_loop_codegen.cpp
@@ -0,0 +1,950 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK3
+// RUN: %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -gno-column-info -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK4
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int outline_decl() {
+  int i, k;
+  #pragma omp parallel
+  for(i=0; i<10; i++) {
+#pragma omp loop
+for(k=0; k<5; k++) {
+  k++;
+}
+  }
+  return k;
+}
+
+int inline_decl() {
+  int i, res;
+  #pragma omp parallel
+  for(i=0; i<10; i++) {
+#pragma omp loop
+for(int k=0; k<5; k++) {
+  res++;
+}
+  }
+  return res;
+}
+
+#endif
+// CHECK1-LABEL: define {{[^@]+}}@_Z12outline_declv
+// CHECK1-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:[[K:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB1:[0-9]+]], i32 1, ptr @.omp_outlined., ptr [[I]])
+// CHECK1-NEXT:[[TMP0:%.*]] = load i32, ptr [[K]], align 4
+// CHECK1-NEXT:ret i32 [[TMP0]]
+//
+//
+// CHECK1-LABEL: define {{[^@]+}}@.omp_outlined.
+// CHECK1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[I:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[I_ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[K:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
+// CHECK1-NEXT:store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
+// CHECK1-NEXT:store ptr [[I]], ptr [[I_ADDR]], align 8
+// CHECK1-NEXT:[[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// CHECK1-NEXT:store i32 0, ptr [[TMP0]], align 4
+// CHECK1-NEXT:br label [[FOR_COND:%.*]]
+// CHECK1:   for.cond:
+// CHECK1-NEXT:[[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4
+// CHECK1-NEXT:[[CMP:%.*]] = icmp slt i32 [[TMP1]], 10
+// CHECK1-NEXT:br i1

[PATCH] D141528: [Clang][OpenMP] Fix loop directed nested inside a parallel

2023-01-11 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: ronl, carlo.bertolli, ABataev, jdoerfert, jhuber6, 
gregrodgers.
doru1004 added a project: OpenMP.
Herald added subscribers: guansong, yaxunl.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

This patch fixes the case in which the loop directive is nested within a 
parallel directive. This patch ensures that the iteration variable has a valid 
var declaration in the local declare map when the for statement is emitted.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D141528

Files:
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/test/OpenMP/nested_loop_codegen.cpp

Index: clang/test/OpenMP/nested_loop_codegen.cpp
===
--- /dev/null
+++ clang/test/OpenMP/nested_loop_codegen.cpp
@@ -0,0 +1,950 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefix=CHECK3
+// RUN: %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -DIRBUILDER -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -gno-column-info -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK4
+
+// RUN: %clang_cc1 -verify -fopenmp-simd -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -emit-llvm %s -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-enable-irbuilder -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --implicit-check-not="{{__kmpc|__tgt}}"
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+int external_decl() {
+  int i, k;
+  #pragma omp parallel
+  for(i=0; i<10; i++) {
+#pragma omp loop
+for(k=0; k<5; k++) {
+  k++;
+}
+  }
+  return k;
+}
+
+int internal_decl() {
+  int i, res;
+  #pragma omp parallel
+  for(i=0; i<10; i++) {
+#pragma omp loop
+for(int k=0; k<5; k++) {
+  res++;
+}
+  }
+  return res;
+}
+
+#endif
+// CHECK1-LABEL: define {{[^@]+}}@_Z13external_declv
+// CHECK1-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:[[K:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB1:[0-9]+]], i32 1, ptr @.omp_outlined., ptr [[I]])
+// CHECK1-NEXT:[[TMP0:%.*]] = load i32, ptr [[K]], align 4
+// CHECK1-NEXT:ret i32 [[TMP0]]
+//
+//
+// CHECK1-LABEL: define {{[^@]+}}@.omp_outlined.
+// CHECK1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[I:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK1-NEXT:  entry:
+// CHECK1-NEXT:[[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[I_ADDR:%.*]] = alloca ptr, align 8
+// CHECK1-NEXT:[[K:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:store ptr [[DOTGLOBAL_TID_]], ptr

[PATCH] D140155: [Clang][OpenMP] Allow host call to nohost function with host variant

2023-01-04 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

In D140155#4016333 , @jhuber6 wrote:

> In D140155#4016274 , @mgorny wrote:
>
>> In D140155#4004505 , @doru1004 
>> wrote:
>>
>>> Commit 658ed9547cdd6657895339a6c390c31aa77a5698 
>>> 
>>
>> The added test fails on 32-bit platforms:
>>
>>   FAIL: Clang :: OpenMP/declare_target_nohost_variant_messages.cpp (10230 of 
>> 16135)
>>    TEST 'Clang :: 
>> OpenMP/declare_target_nohost_variant_messages.cpp' FAILED 
>> 
>>   Script:
>>   --
>>   : 'RUN: at line 3';   
>> /var/tmp/portage/sys-devel/clang-16.0.0_pre20221225/work/x/y/clang-abi_x86_32.x86/bin/clang
>>  -cc1 -internal-isystem 
>> /var/tmp/portage/sys-devel/clang-16.0.0_pre20221225/work/x/y/clang-abi_x86_32.x86/bin/../../../../lib/clang/16/include
>>  -nostdsysteminc -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa 
>> -fopenmp-version=52 -DVERBOSE_MODE=1 -verify=omp52 -fnoopenmp-use-tls 
>> -ferror-limit 100 -fopenmp-targets=amdgcn-amd-amdhsa -o - 
>> /var/tmp/portage/sys-devel/clang-16.0.0_pre20221225/work/clang/test/OpenMP/declare_target_nohost_variant_messages.cpp
>>   --
>>   Exit Code: 1
>>   
>>   Command Output (stderr):
>>   --
>>   + : 'RUN: at line 3'
>>   + 
>> /var/tmp/portage/sys-devel/clang-16.0.0_pre20221225/work/x/y/clang-abi_x86_32.x86/bin/clang
>>  -cc1 -internal-isystem 
>> /var/tmp/portage/sys-devel/clang-16.0.0_pre20221225/work/x/y/clang-abi_x86_32.x86/bin/../../../../lib/clang/16/include
>>  -nostdsysteminc -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa 
>> -fopenmp-version=52 -DVERBOSE_MODE=1 -verify=omp52 -fnoopenmp-use-tls 
>> -ferror-limit 100 -fopenmp-targets=amdgcn-amd-amdhsa -o - 
>> /var/tmp/portage/sys-devel/clang-16.0.0_pre20221225/work/clang/test/OpenMP/declare_target_nohost_variant_messages.cpp
>>   error: 'error' diagnostics seen but not expected: 
>> (frontend): OpenMP target architecture 'amdgcn-amd-amdhsa' pointer size 
>> is incompatible with host 'i686-pc-linux-gnu'
>>   
>>   --
>>   
>>   
>>
>> Please fix, or ideally revert, fix and then commit properly linking to the 
>> diff.
>
> Should be fixed in rGf74e3d2f81d2 
> .

Thank you for pushing a fix!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140155/new/

https://reviews.llvm.org/D140155

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D139723: [OpenMP][AMDGPU] Enable use of abs labs and llabs math functions in C code

2023-01-04 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added a comment.

In D139723#4016685 , @mgorny wrote:

> I've pushed a fix in dab67c66932b9149842f7c8431e951f952125fc0 
> , based 
> on @jhuber6's fix from the other diff.

Thank you for pushing a fix! :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139723/new/

https://reviews.llvm.org/D139723

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140295: [Fix][OpenMP] Fix commit for nohost variant.

2022-12-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit b5c809acd34c2489679300eb0b8a8b824aeb 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140295/new/

https://reviews.llvm.org/D140295

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140295: [Fix][OpenMP] Fix commit for nohost variant.

2022-12-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 created this revision.
doru1004 added reviewers: ABataev, carlo.bertolli, ronl.
doru1004 added a project: OpenMP.
Herald added subscribers: guansong, yaxunl.
Herald added a project: All.
doru1004 requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

This fixes the previous commit: 658ed9547cdd6657895339a6c390c31aa77a5698 



Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D140295

Files:
  clang/test/OpenMP/declare_target_messages.cpp
  clang/test/OpenMP/declare_target_nohost_variant_messages.cpp


Index: clang/test/OpenMP/declare_target_nohost_variant_messages.cpp
===
--- clang/test/OpenMP/declare_target_nohost_variant_messages.cpp
+++ clang/test/OpenMP/declare_target_nohost_variant_messages.cpp
@@ -1,21 +1,31 @@
+// REQUIRES: amdgpu-registered-target
+
 // RUN: %clang_cc1 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa 
-fopenmp-version=52 -DVERBOSE_MODE=1 -verify=omp52 -fnoopenmp-use-tls 
-ferror-limit 100 -fopenmp-targets=amdgcn-amd-amdhsa -o - %s
 
 void fun();
+void host_function();
+#pragma omp declare target enter(fun) device_type(nohost)
+#pragma omp declare variant(host_function) match(device={kind(host)})
+void fun() {}
+void host_function() {}
+void call_host_function() { fun(); }
+
+void fun1();
 void not_a_host_function();
-#pragma omp declare target enter(fun) device_type(nohost) // omp52-note 
{{marked as 'device_type(nohost)' here}}
+#pragma omp declare target enter(fun1) device_type(nohost) // omp52-note 
{{marked as 'device_type(nohost)' here}}
 #pragma omp declare variant(not_a_host_function) match(device={kind(host)}) // 
omp52-error {{function with 'device_type(nohost)' is not available on host}}
-void fun() {}
+void fun1() {}
 #pragma omp begin declare target device_type(nohost) // omp52-note {{marked as 
'device_type(nohost)' here}}
 void not_a_host_function() {}
 #pragma omp end declare target
-void failed_call_to_host_function() { fun(); } // omp52-error {{function with 
'device_type(nohost)' is not available on host}}
+void failed_call_to_host_function() { fun1(); } // omp52-error {{function with 
'device_type(nohost)' is not available on host}}
 
 void fun2();
-void host_function();
+void host_function2();
 #pragma omp declare target enter(fun2) device_type(nohost)
-#pragma omp declare variant(host_function) match(device={kind(host)})
+#pragma omp declare variant(host_function2) match(device={kind(host)})
 void fun2() {}
 #pragma omp begin declare target device_type(host)
-void host_function() {}
+void host_function2() {}
 #pragma omp end declare target
 void call_to_host_function() { fun2(); }
Index: clang/test/OpenMP/declare_target_messages.cpp
===
--- clang/test/OpenMP/declare_target_messages.cpp
+++ clang/test/OpenMP/declare_target_messages.cpp
@@ -11,7 +11,7 @@
 // RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp51 
-fopenmp-version=51 -fopenmp-simd -fnoopenmp-use-tls -ferror-limit 100 -o - %s
 // RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp51 
-fopenmp-version=51 -fopenmp-simd -fnoopenmp-use-tls -ferror-limit 100 -o - %s
 
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa 
-fopenmp-version=52 -DVERBOSE_MODE=1 -verify=expected,omp52 -fnoopenmp-use-tls 
-ferror-limit 100 -o - %s
+// RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp52 
-fopenmp -fopenmp-version=52 -DVERBOSE_MODE=1 -fnoopenmp-use-tls -ferror-limit 
100 -o - %s
 
 // RUN: %clang_cc1 -triple x86_64-apple-macos10.7.0 -verify=expected,omp5 
-fopenmp -fnoopenmp-use-tls -ferror-limit 100 -o - %s
 #pragma omp end declare target // expected-error {{unexpected OpenMP directive 
'#pragma omp end declare target'}}
@@ -242,11 +242,3 @@
 // expected-warning@+1 {{expected '#pragma omp end declare target' at end of 
file to match '#pragma omp begin declare target'}}
 #pragma omp begin declare target
 #endif
-
-void fun();
-void host_function();
-#pragma omp declare target enter(fun) device_type(nohost) // omp45-error 
{{unexpected 'enter' clause, use 'to' instead}} omp45-error {{expected at least 
one 'to' or 'link' clause}} omp5-error {{unexpected 'enter' clause, use 'to' 
instead}} omp5-error {{expected at least one 'to' or 'link' clause}} 
omp51-error {{expected at least one 'to', 'link' or 'indirect' clause}} 
omp51-error {{unexpected 'enter' clause, use 'to' instead}}
-#pragma omp declare variant(host_function) match(device={kind(host)})
-void fun() {}
-void host_function() {}
-void call_host_function() { fun(); }


Index: clang/test/OpenMP/declare_target_nohost_variant_messages.cpp
===
--- clang/test/OpenMP/declare_target_nohost_variant_messages.cpp
+++

[PATCH] D139723: [OpenMP][AMDGPU] Enable use of abs labs and llabs math functions in C code

2022-12-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit: 07ff3c5ccce68aed6c1a270b3f89ea14de7aa250 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139723/new/

https://reviews.llvm.org/D139723

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140155: [Clang][OpenMP] Allow host call to nohost function with host variant

2022-12-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 closed this revision.
doru1004 added a comment.

Commit 658ed9547cdd6657895339a6c390c31aa77a5698 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140155/new/

https://reviews.llvm.org/D140155

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D139723: [OpenMP][AMDGPU] Enable use of abs labs and llabs math functions in C code

2022-12-16 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 updated this revision to Diff 483659.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139723/new/

https://reviews.llvm.org/D139723

Files:
  clang/lib/Headers/CMakeLists.txt
  clang/lib/Headers/__clang_hip_runtime_wrapper.h
  clang/lib/Headers/__clang_hip_stdlib.h
  clang/lib/Headers/openmp_wrappers/stdlib.h
  clang/test/Headers/Inputs/include/stdlib.h
  clang/test/Headers/amdgcn_openmp_device_math_c.c
  llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn

Index: llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
===
--- llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
+++ llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
@@ -85,6 +85,7 @@
 "__clang_hip_cmath.h",
 "__clang_hip_libdevice_declares.h",
 "__clang_hip_math.h",
+"__clang_hip_stdlib.h",
 "__clang_hip_runtime_wrapper.h",
 "__stddef_max_align_t.h",
 "__wmmintrin_aes.h",
@@ -192,6 +193,7 @@
 "openmp_wrappers/complex.h",
 "openmp_wrappers/complex_cmath.h",
 "openmp_wrappers/math.h",
+"openmp_wrappers/stdlib.h",
 "pconfigintrin.h",
 "pkuintrin.h",
 "pmmintrin.h",
Index: clang/test/Headers/amdgcn_openmp_device_math_c.c
===
--- /dev/null
+++ clang/test/Headers/amdgcn_openmp_device_math_c.c
@@ -0,0 +1,131 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]"
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include -x c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -D__OFFLOAD_ARCH_gfx90a__ -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -x c -fopenmp -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - | FileCheck %s --check-prefixes=CHECK
+// REQUIRES: amdgpu-registered-target
+
+#include 
+
+void test_math_int(int x) {
+#pragma omp target
+  {
+int l1 = abs(x);
+  }
+}
+
+void test_math_long(long x) {
+#pragma omp target
+  {
+long l1 = labs(x);
+  }
+}
+
+void test_math_long_long(long long x) {
+#pragma omp target
+  {
+long long l1 = llabs(x);
+  }
+}
+// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_test_math_int_l9
+// CHECK-SAME: (i64 noundef [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[RETVAL_I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__X_ADDR_I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[__SGN_I:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[X_ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// CHECK-NEXT:[[L1:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:[[X_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[X_ADDR]] to ptr
+// CHECK-NEXT:[[L1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[L1]] to ptr
+// CHECK-NEXT:store i64 [[X]], ptr [[X_ADDR_ASCAST]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true)
+// CHECK-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+// CHECK-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// CHECK:   user_code.entry:
+// CHECK-NEXT:[[TMP1:%.*]] = load i32, ptr [[X_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[RETVAL_ASCAST_I:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL_I]] to ptr
+// CHECK-NEXT:[[__X_ADDR_ASCAST_I:%.*]] = addrspacecast ptr addrspace(5) [[__X_ADDR_I]] to ptr
+// CHECK-NEXT:[[__SGN_ASCAST_I:%.*]] = addrspacecast ptr addrspace(5) [[__SGN_I]] to ptr
+// CHECK-NEXT:store i32 [[TMP1]], ptr [[__X_ADDR_ASCAST_I]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = load i32, ptr [[__X_ADDR_ASCAST_I]], align 4
+// CHECK-NEXT:[[SHR_I:%.*]] = ashr i32 [[TMP2]], 31
+// CHECK-NEXT:store i32 [[SHR_I]], ptr [[__SGN_ASCAST_I]], align 4
+// CHECK-NEXT:[[TMP3:%.*]] = load i32, ptr [[__X_ADDR_ASCAST_I]], align 4
+// CHECK-NEXT:[[TMP4:%.*]] = load i32, ptr [[__SGN_ASCAST_I]], align 4
+// CHECK-NEXT:[[XOR_I:%.*]] = xor i32 [[TMP3]], [[TMP4]]
+// CHECK-NEXT:[[TMP5:%.*]] = load i32, ptr [[__SGN_ASCAST_I]], align 4
+// CHECK-NEXT:[[SUB_I:%.*]] = sub nsw i32 [[XOR_I]], [[TMP5]]
+// CHECK-NEXT:store i32 [[SUB_I]], ptr [[L1_ASCAST]], align 4
+// CHECK-NEXT:call void @__kmpc_target_deinit(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i8 1)
+// CHECK-NEXT:ret void
+// CHECK:   worker.exit:
+//

[PATCH] D140155: [Clang][OpenMP] Allow host call to nohost function with host variant

2022-12-16 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/declare_target_nohost_variant_messages.cpp:16
+#pragma omp declare target enter(fun2) device_type(nohost)
+#pragma omp declare variant(host_function) match(device={kind(host)})
+void fun2() {}

doru1004 wrote:
> ABataev wrote:
> > You mean this test case? But it still has kind(host).
> The condition checks the attribute of the `host_function` which in this case 
> is `host`. In the test above the condition is false because the 
> `not_a_host_function` has a `nohost` attribute.
It should always have host there because we are trying to fix the case where we 
have a nohost function that needs a host variant.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140155/new/

https://reviews.llvm.org/D140155

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140155: [Clang][OpenMP] Allow host call to nohost function with host variant

2022-12-16 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits

doru1004 added inline comments.



Comment at: clang/test/OpenMP/declare_target_nohost_variant_messages.cpp:16
+#pragma omp declare target enter(fun2) device_type(nohost)
+#pragma omp declare variant(host_function) match(device={kind(host)})
+void fun2() {}

ABataev wrote:
> You mean this test case? But it still has kind(host).
The condition checks the attribute of the `host_function` which in this case is 
`host`. In the test above the condition is false because the 
`not_a_host_function` has a `nohost` attribute.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140155/new/

https://reviews.llvm.org/D140155

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 3 4 5 6 7 >

1 - 100 of 660 matches

Mail list logo