[PATCH] D99762: [OPENMP]Fix PR49777: Clang should not try to specialize orphaned directives in device codegen.

2021-04-27 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

The bug has been fixed by D95976  already, 
I'll update the bug report now.
Also, the tracking of Generic/SPMD mode in clang is about to be finally 
removed, new code depending on that is short-lived.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99762/new/

https://reviews.llvm.org/D99762

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99762: [OPENMP]Fix PR49777: Clang should not try to specialize orphaned directives in device codegen.

2021-04-27 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev updated this revision to Diff 340909.
ABataev added a comment.

Rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99762/new/

https://reviews.llvm.org/D99762

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp

Index: clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
===
--- clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
+++ clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
@@ -54,7 +54,7 @@
 // CHECK1-NEXT:[[IS_ACTIVE:%.*]] = icmp ne i8 [[TMP3]], 0
 // CHECK1-NEXT:br i1 [[IS_ACTIVE]], label [[DOTEXECUTE_PARALLEL:%.*]], label [[DOTBARRIER_PARALLEL:%.*]]
 // CHECK1:   .execute.parallel:
-// CHECK1-NEXT:[[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB1:[0-9]+]])
+// CHECK1-NEXT:[[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB2:[0-9]+]])
 // CHECK1-NEXT:[[TMP5:%.*]] = load i8*, i8** [[WORK_FN]], align 8
 // CHECK1-NEXT:[[WORK_MATCH:%.*]] = icmp eq i8* [[TMP5]], bitcast (void (i16, i32)* @__omp_outlined___wrapper to i8*)
 // CHECK1-NEXT:br i1 [[WORK_MATCH]], label [[DOTEXECUTE_FN:%.*]], label [[DOTCHECK_NEXT:%.*]]
@@ -114,14 +114,14 @@
 // CHECK1-NEXT:[[THREAD_LIMIT6:%.*]] = sub nuw i32 [[NVPTX_NUM_THREADS4]], [[NVPTX_WARP_SIZE5]]
 // CHECK1-NEXT:call void @__kmpc_kernel_init(i32 [[THREAD_LIMIT6]], i16 1)
 // CHECK1-NEXT:call void @__kmpc_data_sharing_init_stack()
-// CHECK1-NEXT:[[TMP6:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB1]])
+// CHECK1-NEXT:[[TMP6:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB2]])
 // CHECK1-NEXT:call void @_Z3usePi(i32* [[TMP0]]) #[[ATTR7:[0-9]+]]
-// CHECK1-NEXT:call void @__kmpc_push_num_threads(%struct.ident_t* @[[GLOB1]], i32 [[TMP6]], i32 2)
+// CHECK1-NEXT:call void @__kmpc_push_num_threads(%struct.ident_t* @[[GLOB2]], i32 [[TMP6]], i32 2)
 // CHECK1-NEXT:[[TMP7:%.*]] = getelementptr inbounds [1 x i8*], [1 x i8*]* [[CAPTURED_VARS_ADDRS]], i64 0, i64 0
 // CHECK1-NEXT:[[TMP8:%.*]] = bitcast i32* [[TMP0]] to i8*
 // CHECK1-NEXT:store i8* [[TMP8]], i8** [[TMP7]], align 8
 // CHECK1-NEXT:[[TMP9:%.*]] = bitcast [1 x i8*]* [[CAPTURED_VARS_ADDRS]] to i8**
-// CHECK1-NEXT:call void @__kmpc_parallel_51(%struct.ident_t* @[[GLOB1]], i32 [[TMP6]], i32 1, i32 -1, i32 -1, i8* bitcast (void (i32*, i32*, i32*)* @__omp_outlined__1 to i8*), i8* bitcast (void (i16, i32)* @__omp_outlined__1_wrapper to i8*), i8** [[TMP9]], i64 1)
+// CHECK1-NEXT:call void @__kmpc_parallel_51(%struct.ident_t* @[[GLOB2]], i32 [[TMP6]], i32 1, i32 -1, i32 -1, i8* bitcast (void (i32*, i32*, i32*)* @__omp_outlined__1 to i8*), i8* bitcast (void (i16, i32)* @__omp_outlined__1_wrapper to i8*), i8** [[TMP9]], i64 1)
 // CHECK1-NEXT:br label [[DOTTERMINATION_NOTIFIER:%.*]]
 // CHECK1:   .termination.notifier:
 // CHECK1-NEXT:call void @__kmpc_kernel_deinit(i16 1)
@@ -136,7 +136,7 @@
 // CHECK1-NEXT:  entry:
 // CHECK1-NEXT:[[C_ADDR:%.*]] = alloca i32*, align 8
 // CHECK1-NEXT:[[CAPTURED_VARS_ADDRS:%.*]] = alloca [1 x i8*], align 8
-// CHECK1-NEXT:[[TMP0:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB1]])
+// CHECK1-NEXT:[[TMP0:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB1:[0-9]+]])
 // CHECK1-NEXT:store i32* [[C]], i32** [[C_ADDR]], align 8
 // CHECK1-NEXT:call void @__kmpc_push_num_threads(%struct.ident_t* @[[GLOB1]], i32 [[TMP0]], i32 2)
 // CHECK1-NEXT:[[TMP1:%.*]] = getelementptr inbounds [1 x i8*], [1 x i8*]* [[CAPTURED_VARS_ADDRS]], i64 0, i64 0
@@ -260,7 +260,7 @@
 // CHECK2-NEXT:[[IS_ACTIVE:%.*]] = icmp ne i8 [[TMP3]], 0
 // CHECK2-NEXT:br i1 [[IS_ACTIVE]], label [[DOTEXECUTE_PARALLEL:%.*]], label [[DOTBARRIER_PARALLEL:%.*]]
 // CHECK2:   .execute.parallel:
-// CHECK2-NEXT:[[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB1:[0-9]+]])
+// CHECK2-NEXT:[[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @[[GLOB2:[0-9]+]])
 // CHECK2-NEXT:[[TMP5:%.*]] = load i8*, i8** [[WORK_FN]], align 4
 // CHECK2-NEXT:[[WORK_MATCH:%.*]] = icmp eq i8* [[TMP5]], bitcast (void (i16, i32)* @__omp_outlined___wrapper to i8*)
 // CHECK2-NEXT:br i1 [[WORK_MATCH]], label [[DOTEXECUTE_FN:%.*]], label [[DOTCHECK_NEXT:%.*]]
@@ -320,14 +320,14 @@
 // CHECK2-NEXT:[[THREAD_LIMIT6:%.*]] = sub nuw i32 [[NVPTX_NUM_THREADS4]], [[NVPTX_WARP_SIZE5]]
 // CHECK2-NEXT:call void @__kmpc_kernel_init(i32 [[THREAD_LIMIT6]], i16 1)
 // CHECK2-NEXT:call void @__kmpc_data_sharing_init_stack()
-// CHECK2-NEXT:[[TMP6:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* 

[PATCH] D99762: [OPENMP]Fix PR49777: Clang should not try to specialize orphaned directives in device codegen.

2021-04-16 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev updated this revision to Diff 338122.
ABataev added a comment.

Rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99762/new/

https://reviews.llvm.org/D99762

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c

Index: clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
===
--- clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
+++ clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
@@ -44,4 +44,4 @@
 }
 
 // expected-remark@* {{OpenMP runtime call __kmpc_global_thread_num moved to}}
-// expected-remark@* {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
+// expected-remark@* 2 {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
Index: clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
===
--- clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
+++ clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
@@ -98,5 +98,5 @@
   }
 }
 
-// all-remark@* 3 {{OpenMP runtime call __kmpc_global_thread_num moved to}}
-// all-remark@* 3 {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
+// all-remark@* 5 {{OpenMP runtime call __kmpc_global_thread_num moved to}}
+// all-remark@* 12 {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
Index: clang/test/OpenMP/declare_target_codegen_globalization.cpp
===
--- clang/test/OpenMP/declare_target_codegen_globalization.cpp
+++ clang/test/OpenMP/declare_target_codegen_globalization.cpp
@@ -37,11 +37,14 @@
 // CHECK: define {{.*}}[[BAR]]()
 // CHECK: alloca i32,
 // CHECK: [[A_LOCAL_ADDR:%.+]] = alloca i32,
+// CHECK: [[PL:%.+]] = call i16 @__kmpc_parallel_level(
+// CHECK: [[IS_IN_PARALLEL:%.+]] = icmp eq i16 [[PL]], 0
 // CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()
 // CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0
 // CHECK: br i1 [[IS_SPMD]], label
 // CHECK: br label
-// CHECK: [[RES:%.+]] = call i8* @__kmpc_data_sharing_coalesced_push_stack(i64 128, i16 0)
+// CHECK: [[SZ:%.+]] = select i1 [[IS_IN_PARALLEL]], i64 4, i64 128
+// CHECK: [[RES:%.+]] = call i8* @__kmpc_data_sharing_coalesced_push_stack(i64 [[SZ]], i16 0)
 // CHECK: [[GLOBALS:%.+]] = bitcast i8* [[RES]] to [[GLOBAL_ST:%.+]]*
 // CHECK: br label
 // CHECK: [[ITEMS:%.+]] = phi [[GLOBAL_ST]]* [ null, {{.+}} ], [ [[GLOBALS]], {{.+}} ]
@@ -49,7 +52,9 @@
 // CHECK: [[TID:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 // CHECK: [[LID:%.+]] = and i32 [[TID]], 31
 // CHECK: [[A_GLOBAL_ADDR:%.+]] = getelementptr inbounds [32 x i32], [32 x i32]* [[A_ADDR]], i32 0, i32 [[LID]]
-// CHECK: [[A_ADDR:%.+]] = select i1 [[IS_SPMD]], i32* [[A_LOCAL_ADDR]], i32* [[A_GLOBAL_ADDR]]
+// CHECK: [[A_GLOBAL_PARALLEL_ADDR:%.+]] = getelementptr inbounds %{{.+}}, %{{.+}}* %{{.+}}, i32 0, i32 0
+// CHECK: [[A_PARALLEL_ADDR:%.+]] = select i1 [[IS_IN_PARALLEL]], i32* [[A_GLOBAL_PARALLEL_ADDR]], i32* [[A_GLOBAL_ADDR]]
+// CHECK: [[A_ADDR:%.+]] = select i1 [[IS_SPMD]], i32* [[A_LOCAL_ADDR]], i32* [[A_PARALLEL_ADDR]]
 // CHECK: call {{.*}}[[FOO]](i32* nonnull align {{[0-9]+}} dereferenceable{{.*}} [[A_ADDR]])
 // CHECK: br i1 [[IS_SPMD]], label
 // CHECK: [[BC:%.+]] = bitcast [[GLOBAL_ST]]* [[ITEMS]] to i8*
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -427,6 +427,20 @@
   /// true if we're definitely in the parallel region.
   bool IsInParallelRegion = false;
 
+  struct StateMode {
+ExecutionMode SavedExecutionMode = EM_Unknown;
+bool SavedIsInTargetMasterThreadRegion = false;
+bool SavedIsInTTDRegion = false;
+bool SavedIsInParallelRegion = false;
+StateMode(ExecutionMode SavedExecutionMode,
+  bool SavedIsInTargetMasterThreadRegion, bool SavedIsInTTDRegion,
+  bool SavedIsInParallelRegion)
+: SavedExecutionMode(SavedExecutionMode),
+  SavedIsInTargetMasterThreadRegion(SavedIsInTargetMasterThreadRegion),
+  SavedIsInTTDRegion(SavedIsInTTDRegion),
+  SavedIsInParallelRegion(SavedIsInParallelRegion) {}
+  };
+  llvm::DenseMap, StateMode> SavedExecutionModes;
   /// Map between an outlined function and its wrapper.
   llvm::DenseMap WrapperFunctionsMap;
 
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -4311,9 +4311,6 @@
 
 void 

[PATCH] D99762: [OPENMP]Fix PR49777: Clang should not try to specialize orphaned directives in device codegen.

2021-04-01 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev created this revision.
ABataev added a reviewer: jdoerfert.
Herald added subscribers: guansong, yaxunl.
ABataev requested review of this revision.
Herald added a subscriber: sstefan1.
Herald added a project: clang.

Compiler supports generic code emission, but in some cases may
erroneously consider the function context as SPMD context or Non-SPMD
parallel context. Need to clear/restore context upon entrance/exit
to/from function to avoid incorrect codegen.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D99762

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c

Index: clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
===
--- clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
+++ clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
@@ -44,4 +44,4 @@
 }
 
 // expected-remark@* {{OpenMP runtime call __kmpc_global_thread_num moved to}}
-// expected-remark@* {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
+// expected-remark@* 2 {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
Index: clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
===
--- clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
+++ clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
@@ -98,5 +98,5 @@
   }
 }
 
-// all-remark@* 3 {{OpenMP runtime call __kmpc_global_thread_num moved to}}
-// all-remark@* 3 {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
+// all-remark@* 5 {{OpenMP runtime call __kmpc_global_thread_num moved to}}
+// all-remark@* 12 {{OpenMP runtime call __kmpc_global_thread_num deduplicated}}
Index: clang/test/OpenMP/declare_target_codegen_globalization.cpp
===
--- clang/test/OpenMP/declare_target_codegen_globalization.cpp
+++ clang/test/OpenMP/declare_target_codegen_globalization.cpp
@@ -37,11 +37,14 @@
 // CHECK: define {{.*}}[[BAR]]()
 // CHECK: alloca i32,
 // CHECK: [[A_LOCAL_ADDR:%.+]] = alloca i32,
+// CHECK: [[PL:%.+]] = call i16 @__kmpc_parallel_level(
+// CHECK: [[IS_IN_PARALLEL:%.+]] = icmp eq i16 [[PL]], 0
 // CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()
 // CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0
 // CHECK: br i1 [[IS_SPMD]], label
 // CHECK: br label
-// CHECK: [[RES:%.+]] = call i8* @__kmpc_data_sharing_coalesced_push_stack(i64 128, i16 0)
+// CHECK: [[SZ:%.+]] = select i1 [[IS_IN_PARALLEL]], i64 4, i64 128
+// CHECK: [[RES:%.+]] = call i8* @__kmpc_data_sharing_coalesced_push_stack(i64 [[SZ]], i16 0)
 // CHECK: [[GLOBALS:%.+]] = bitcast i8* [[RES]] to [[GLOBAL_ST:%.+]]*
 // CHECK: br label
 // CHECK: [[ITEMS:%.+]] = phi [[GLOBAL_ST]]* [ null, {{.+}} ], [ [[GLOBALS]], {{.+}} ]
@@ -49,7 +52,9 @@
 // CHECK: [[TID:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 // CHECK: [[LID:%.+]] = and i32 [[TID]], 31
 // CHECK: [[A_GLOBAL_ADDR:%.+]] = getelementptr inbounds [32 x i32], [32 x i32]* [[A_ADDR]], i32 0, i32 [[LID]]
-// CHECK: [[A_ADDR:%.+]] = select i1 [[IS_SPMD]], i32* [[A_LOCAL_ADDR]], i32* [[A_GLOBAL_ADDR]]
+// CHECK: [[A_GLOBAL_PARALLEL_ADDR:%.+]] = getelementptr inbounds %{{.+}}, %{{.+}}* %{{.+}}, i32 0, i32 0
+// CHECK: [[A_PARALLEL_ADDR:%.+]] = select i1 [[IS_IN_PARALLEL]], i32* [[A_GLOBAL_PARALLEL_ADDR]], i32* [[A_GLOBAL_ADDR]]
+// CHECK: [[A_ADDR:%.+]] = select i1 [[IS_SPMD]], i32* [[A_LOCAL_ADDR]], i32* [[A_PARALLEL_ADDR]]
 // CHECK: call {{.*}}[[FOO]](i32* nonnull align {{[0-9]+}} dereferenceable{{.*}} [[A_ADDR]])
 // CHECK: br i1 [[IS_SPMD]], label
 // CHECK: [[BC:%.+]] = bitcast [[GLOBAL_ST]]* [[ITEMS]] to i8*
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -427,6 +427,20 @@
   /// true if we're definitely in the parallel region.
   bool IsInParallelRegion = false;
 
+  struct StateMode {
+ExecutionMode SavedExecutionMode = EM_Unknown;
+bool SavedIsInTargetMasterThreadRegion = false;
+bool SavedIsInTTDRegion = false;
+bool SavedIsInParallelRegion = false;
+StateMode(ExecutionMode SavedExecutionMode,
+  bool SavedIsInTargetMasterThreadRegion, bool SavedIsInTTDRegion,
+  bool SavedIsInParallelRegion)
+: SavedExecutionMode(SavedExecutionMode),
+  SavedIsInTargetMasterThreadRegion(SavedIsInTargetMasterThreadRegion),
+  SavedIsInTTDRegion(SavedIsInTTDRegion),
+  SavedIsInParallelRegion(SavedIsInParallelRegion) {}
+  };
+  llvm::DenseMap, StateMode> SavedExecutionModes;
   /// Map between an