[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-27 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield abandoned this revision.
JonChesterfield added a comment.

The diff doesn't look right here. I can't tell if that's a quirk of the phab 
gui or indicates a bad merge, going to recreate.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-27 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 301038.
JonChesterfield added a comment.

- prefer weak, update tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp

Index: clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
===
--- clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
+++ clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
@@ -24,7 +24,7 @@
 // SEQ-DAG: [[KERNEL_SIZE2:@.+]] = internal unnamed_addr constant i{{64|32}} 16
 
 // Check for the data transfer medium in shared memory to transfer the reduction list to the first warp.
-// CHECK-DAG: [[TRANSFER_STORAGE:@.+]] = internal addrspace([[SHARED_ADDRSPACE:[0-9]+]]) global [32 x i32]
+// CHECK-DAG: [[TRANSFER_STORAGE:@.+]] = weak addrspace([[SHARED_ADDRSPACE:[0-9]+]]) global [32 x i32]
 
 // Check that the execution mode of 2 target regions is set to Non-SPMD and the 3rd is in SPMD.
 // CHECK-DAG: {{@__omp_offloading_.+l44}}_exec_mode = weak constant i8 1
@@ -220,7 +220,7 @@
   // CHECK: [[BASE_ELT:%.+]] = bitcast i8* [[ELT_VOID]] to i32*
   // CHECK: [[ELT:%.+]] = getelementptr i32, i32* [[BASE_ELT]], i32 [[CNT]]
   //
-  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[WARPID]]
+  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[WARPID]]
   // CHECK: [[ELT_VAL:%.+]] = load i32, i32* [[ELT]],
   // CHECK: store volatile i32 [[ELT_VAL]], i32 addrspace([[SHARED_ADDRSPACE]])* [[MEDIUM_ELT]],
   // CHECK: br label {{%?}}[[COPY_CONT:.+]]
@@ -238,7 +238,7 @@
   // CHECK: br i1 [[IS_W0_ACTIVE_THREAD]], label {{%?}}[[DO_READ:.+]], label {{%?}}[[READ_ELSE:.+]]
   //
   // CHECK: [[DO_READ]]
-  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[TID]]
+  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[TID]]
   // CHECK: [[ELT_REF:%.+]] = getelementptr inbounds [1 x i8*], [1 x i8*]* [[RED_LIST:%.+]], i{{32|64}} 0, i{{32|64}} 0
   // CHECK: [[ELT_VOID:%.+]] = load i8*, i8** [[ELT_REF]],
   // CHECK: [[ELT_BASE:%.+]] = bitcast i8* [[ELT_VOID]] to i32*
@@ -531,7 +531,7 @@
   // CHECK: [[ELT_REF:%.+]] = getelementptr inbounds [2 x i8*], [2 x i8*]* [[RED_LIST]], i{{32|64}} 0, i{{32|64}} 0
   // CHECK: [[ELT_VOID:%.+]] = load i8*, i8** [[ELT_REF]],
   //
-  // CHECK: [[MEDIUM_ELT64:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[WARPID]]
+  // CHECK: [[MEDIUM_ELT64:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[WARPID]]
   // CHECK: [[MEDIUM_ELT:%.+]] = bitcast i32 addrspace([[SHARED_ADDRSPACE]])* [[MEDIUM_ELT64]] to i8 addrspace([[SHARED_ADDRSPACE]])*
   // CHECK: [[ELT_VAL:%.+]] = load i8, i8* [[ELT_VOID]], align
   // CHECK: store volatile i8 [[ELT_VAL]], i8 addrspace([[SHARED_ADDRSPACE]])* [[MEDIUM_ELT]], align
@@ -550,7 +550,7 @@
   // CHECK: br i1 [[IS_W0_ACTIVE_THREAD]], label {{%?}}[[DO_READ:.+]], label {{%?}}[[READ_ELSE:.+]]
   //
   // CHECK: [[DO_READ]]
-  // CHECK: [[MEDIUM_ELT32:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[TID]]
+  // CHECK: [[MEDIUM_ELT32:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[TID]]
   // CHECK: [[MEDIUM_ELT:%.+]] = bitcast i32 addrspace([[SHARED_ADDRSPACE]])* [[MEDIUM_ELT32]] to i8 addrspace([[SHARED_ADDRSPACE]])*
   // CHECK: [[ELT_REF:%.+]] = getelementptr inbounds [2 x i8*], [2 x i8*]* [[RED_LIST:%.+]], i{{32|64}} 0, i{{32|64}} 0
   // CHECK: [[ELT_VOID:%.+]] = load i8*, i8** [[ELT_REF]],
@@ -571,7 +571,7 @@
   // CHECK: [[ELT_VOID:%.+]] = load i8*, i8** [[ELT_REF]],
   // CHECK: [[ELT:%.+]] = bitcast i8* [[ELT_VOID]] to i32*
   //
-  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[WARPID]]
+  // CHECK: 

[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-23 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:2858
 TransferMedium = new llvm::GlobalVariable(
-M, Ty, /*isConstant=*/false, llvm::GlobalVariable::CommonLinkage,
-llvm::Constant::getNullValue(Ty), TransferMediumName,
+M, Ty, /*isConstant=*/false, llvm::GlobalVariable::InternalLinkage,
+llvm::UndefValue::get(Ty), TransferMediumName,

jdoerfert wrote:
> ABataev wrote:
> > "Internalization" is not the best option, it increases mem pressure. Common 
> > linkage is a better choice, allows to "squash" the same objects, defined in 
> > different units. Make it arch dependable, maybe?
> > For NVPTX zero initialization is not a problem, it is resolved when PTX is 
> > generated.
> FWIW, if we do not depend on the zero initialization, we should go with undef.
> 
Sure.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-23 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:2858
 TransferMedium = new llvm::GlobalVariable(
-M, Ty, /*isConstant=*/false, llvm::GlobalVariable::CommonLinkage,
-llvm::Constant::getNullValue(Ty), TransferMediumName,
+M, Ty, /*isConstant=*/false, llvm::GlobalVariable::InternalLinkage,
+llvm::UndefValue::get(Ty), TransferMediumName,

ABataev wrote:
> "Internalization" is not the best option, it increases mem pressure. Common 
> linkage is a better choice, allows to "squash" the same objects, defined in 
> different units. Make it arch dependable, maybe?
> For NVPTX zero initialization is not a problem, it is resolved when PTX is 
> generated.
FWIW, if we do not depend on the zero initialization, we should go with undef.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-23 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:2858
 TransferMedium = new llvm::GlobalVariable(
-M, Ty, /*isConstant=*/false, llvm::GlobalVariable::CommonLinkage,
-llvm::Constant::getNullValue(Ty), TransferMediumName,
+M, Ty, /*isConstant=*/false, llvm::GlobalVariable::InternalLinkage,
+llvm::UndefValue::get(Ty), TransferMediumName,

"Internalization" is not the best option, it increases mem pressure. Common 
linkage is a better choice, allows to "squash" the same objects, defined in 
different units. Make it arch dependable, maybe?
For NVPTX zero initialization is not a problem, it is resolved when PTX is 
generated.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:4794
   CGM.getModule(), LLVMStaticTy,
-  /*isConstant=*/false, llvm::GlobalValue::CommonLinkage,
-  llvm::Constant::getNullValue(LLVMStaticTy),
+  /*isConstant=*/false, llvm::GlobalValue::InternalLinkage,
+  llvm::UndefValue::get(LLVMStaticTy),

JonChesterfield wrote:
> Perhaps weak_any + undef?
> 
> Could use internal for symbols that may vary in size and weak_any for those 
> that don't.
Yeah, it is a good idea, I think.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:4794
   CGM.getModule(), LLVMStaticTy,
-  /*isConstant=*/false, llvm::GlobalValue::CommonLinkage,
-  llvm::Constant::getNullValue(LLVMStaticTy),
+  /*isConstant=*/false, llvm::GlobalValue::InternalLinkage,
+  llvm::UndefValue::get(LLVMStaticTy),

Perhaps weak_any + undef?

Could use internal for symbols that may vary in size and weak_any for those 
that don't.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D89994#2348656 , @ABataev wrote:

> In D89994#2348655 , @JonChesterfield 
> wrote:
>
>> The nvptx back end accepts common + zero + shared, but not common + undef + 
>> shared. I think weak_odr is conceptually right here, but given the warning 
>> that nvlink doesn't support weak symbols, internal also seems fine. Can 
>> someone see an advantage to weak over internal? It could be arch specific at 
>> the risk of a lot of test duplication.
>
> IIRC, it supports weak symbols, but does not support weak symbols of 
> different sizes.

That seems a reasonable restriction. Linkers sometimes pick the first weak 
symbol they see. Comdat might mean pick the biggest one, but that's probably 
not a good thing to rely on.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-22 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D89994#2348655 , @JonChesterfield 
wrote:

> The nvptx back end accepts common + zero + shared, but not common + undef + 
> shared. I think weak_odr is conceptually right here, but given the warning 
> that nvlink doesn't support weak symbols, internal also seems fine. Can 
> someone see an advantage to weak over internal? It could be arch specific at 
> the risk of a lot of test duplication.

IIRC, it supports weak symbols, but does not support weak symbols of different 
sizes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

The nvptx back end accepts common + zero + shared, but not common + undef + 
shared. I think weak_odr is conceptually right here, but given the warning that 
nvlink doesn't support weak symbols, internal also seems fine. Can someone see 
an advantage to weak over internal? It could be arch specific at the risk of a 
lot of test duplication.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89994/new/

https://reviews.llvm.org/D89994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89994: [libomptarget][nvptx] Undef, internal shared variables

2020-10-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield created this revision.
JonChesterfield added reviewers: jdoerfert, ABataev, grokos, tianshilei1992, 
ye-luo.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.
JonChesterfield requested review of this revision.

[libomptarget][nvptx] Undef, internal shared variables

Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.

Common is zero initialized, which seems incompatible with shared. Thus
change them to internal, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215

WIP, other tests need to be updated if direction is good


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D89994

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp

Index: clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
===
--- clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
+++ clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
@@ -17,14 +17,14 @@
 // CHECK-DAG: [[TEAMS_REDUCE_UNION_TY:%.+]] = type { [[TEAM1_REDUCE_TY]] }
 // SEQ-DAG: [[MAP_TY:%.+]] = type { [128 x i8] }
 
-// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null
+// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* undef
 // SEQ-DAG: [[KERNEL_SHARED1:@.+]] = internal unnamed_addr constant i16 1
 // SEQ-DAG: [[KERNEL_SHARED2:@.+]] = internal unnamed_addr constant i16 1
 // SEQ-DAG: [[KERNEL_SIZE1:@.+]] = internal unnamed_addr constant i{{64|32}} {{16|8}}
 // SEQ-DAG: [[KERNEL_SIZE2:@.+]] = internal unnamed_addr constant i{{64|32}} 16
 
 // Check for the data transfer medium in shared memory to transfer the reduction list to the first warp.
-// CHECK-DAG: [[TRANSFER_STORAGE:@.+]] = common addrspace([[SHARED_ADDRSPACE:[0-9]+]]) global [32 x i32]
+// CHECK-DAG: [[TRANSFER_STORAGE:@.+]] = internal addrspace([[SHARED_ADDRSPACE:[0-9]+]]) global [32 x i32]
 
 // Check that the execution mode of 2 target regions is set to Non-SPMD and the 3rd is in SPMD.
 // CHECK-DAG: {{@__omp_offloading_.+l44}}_exec_mode = weak constant i8 1
@@ -220,7 +220,7 @@
   // CHECK: [[BASE_ELT:%.+]] = bitcast i8* [[ELT_VOID]] to i32*
   // CHECK: [[ELT:%.+]] = getelementptr i32, i32* [[BASE_ELT]], i32 [[CNT]]
   //
-  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[WARPID]]
+  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[WARPID]]
   // CHECK: [[ELT_VAL:%.+]] = load i32, i32* [[ELT]],
   // CHECK: store volatile i32 [[ELT_VAL]], i32 addrspace([[SHARED_ADDRSPACE]])* [[MEDIUM_ELT]],
   // CHECK: br label {{%?}}[[COPY_CONT:.+]]
@@ -238,7 +238,7 @@
   // CHECK: br i1 [[IS_W0_ACTIVE_THREAD]], label {{%?}}[[DO_READ:.+]], label {{%?}}[[READ_ELSE:.+]]
   //
   // CHECK: [[DO_READ]]
-  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[TID]]
+  // CHECK: [[MEDIUM_ELT:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[TID]]
   // CHECK: [[ELT_REF:%.+]] = getelementptr inbounds [1 x i8*], [1 x i8*]* [[RED_LIST:%.+]], i{{32|64}} 0, i{{32|64}} 0
   // CHECK: [[ELT_VOID:%.+]] = load i8*, i8** [[ELT_REF]],
   // CHECK: [[ELT_BASE:%.+]] = bitcast i8* [[ELT_VOID]] to i32*
@@ -531,7 +531,7 @@
   // CHECK: [[ELT_REF:%.+]] = getelementptr inbounds [2 x i8*], [2 x i8*]* [[RED_LIST]], i{{32|64}} 0, i{{32|64}} 0
   // CHECK: [[ELT_VOID:%.+]] = load i8*, i8** [[ELT_REF]],
   //
-  // CHECK: [[MEDIUM_ELT64:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[WARPID]]
+  // CHECK: [[MEDIUM_ELT64:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[WARPID]]
   // CHECK: [[MEDIUM_ELT:%.+]] = bitcast i32 addrspace([[SHARED_ADDRSPACE]])* [[MEDIUM_ELT64]] to i8 addrspace([[SHARED_ADDRSPACE]])*
   // CHECK: [[ELT_VAL:%.+]] = load i8, i8* [[ELT_VOID]], align
   // CHECK: store volatile i8 [[ELT_VAL]], i8 addrspace([[SHARED_ADDRSPACE]])* [[MEDIUM_ELT]], align
@@ -550,7 +550,7 @@
   // CHECK: br i1 [[IS_W0_ACTIVE_THREAD]], label {{%?}}[[DO_READ:.+]], label {{%?}}[[READ_ELSE:.+]]
   //
   // CHECK: [[DO_READ]]
-  // CHECK: [[MEDIUM_ELT32:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE]], i64 0, i32 [[TID]]
+  // CHECK: [[MEDIUM_ELT32:%.+]] = getelementptr inbounds [32 x i32], [32 x i32] addrspace([[SHARED_ADDRSPACE]])* [[TRANSFER_STORAGE:@.+]], i64 0, i32 [[TID]]
   // CHECK: [[MEDIUM_ELT:%.+]] =