[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)

2024-10-04 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/104748

>From c5b5369be3d0db31d9ded0eeeb8e28e03d25bd9e Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Fri, 4 Oct 2024 22:45:09 +0900
Subject: [PATCH 1/6] Fix bug and add better clarification comments

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 28 ---
 .../lower-workshare-correct-parallelize.mlir  | 16 +++
 2 files changed, 40 insertions(+), 4 deletions(-)
 create mode 100644 
flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp 
b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 4d8e2a9a067141..84cf5e82167987 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -188,14 +189,19 @@ static bool isTransitivelyUsedOutside(Value v, 
SingleRegion sr) {
 if (isUserOutsideSR(user, parentOp, sr))
   return true;
 
-// Results of nested users cannot be used outside of the SR
+// Now we know user is inside `sr`.
+
+// Results of nested users cannot be used outside of `sr`.
 if (user->getBlock() != srBlock)
   continue;
 
-// A non-safe to parallelize operation will be handled separately
+// A non-safe to parallelize operation will be checked for uses outside
+// separately.
 if (!isSafeToParallelize(user))
   continue;
 
+// For safe to parallelize operations, we need to check if there is a
+// transitive use of `v` through them.
 for (auto res : user->getResults())
   if (isTransitivelyUsedOutside(res, sr))
 return true;
@@ -242,7 +248,21 @@ static void parallelizeRegion(Region &sourceRegion, Region 
&targetRegion,
 for (Operation &op : llvm::make_range(sr.begin, sr.end)) {
   if (isSafeToParallelize(&op)) {
 singleBuilder.clone(op, singleMapping);
-parallelBuilder.clone(op, rootMapping);
+if (llvm::all_of(op.getOperands(), [&](Value opr) {
+  return rootMapping.contains(opr);
+})) {
+  // Safe to parallelize operations which have all operands available 
in
+  // the root parallel block can be executed there.
+  parallelBuilder.clone(op, rootMapping);
+} else {
+  // If any operand was not available, it means that there was no
+  // transitive use of a non-safe-to-parallelize operation outside 
`sr`.
+  // This means that there should be no transitive uses outside `sr` of
+  // `op`.
+  assert(llvm::all_of(op.getResults(), [&](Value v) {
+return !isTransitivelyUsedOutside(v, sr);
+  }));
+}
   } else if (auto alloca = dyn_cast(&op)) {
 auto hoisted =
 cast(allocaBuilder.clone(*alloca, singleMapping));
@@ -252,7 +272,7 @@ static void parallelizeRegion(Region &sourceRegion, Region 
&targetRegion,
   } else {
 singleBuilder.clone(op, singleMapping);
 // Prepare reloaded values for results of operations that cannot be
-// safely parallelized and which are used after the region `sr`
+// safely parallelized and which are used after the region `sr`.
 for (auto res : op.getResults()) {
   if (isTransitivelyUsedOutside(res, sr)) {
 auto alloc = mapReloadedValue(res, allocaBuilder, singleBuilder,
diff --git 
a/flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir 
b/flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir
new file mode 100644
index 00..99ca4fe5a0e212
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir
@@ -0,0 +1,16 @@
+// RUN: fir-opt --lower-workshare --allow-unregistered-dialect %s | FileCheck 
%s
+
+// Check that the safe to parallelize `fir.declare` op will not be parallelized
+// due to its operand %alloc not being reloaded outside the omp.single.
+
+func.func @foo() {
+  %c0 = arith.constant 0 : index
+  omp.workshare {
+%alloc = fir.allocmem !fir.array, %c0 {bindc_name = ".tmp.forall", 
uniq_name = ""}
+%shape = fir.shape %c0 : (index) -> !fir.shape<1>
+%declare = fir.declare %alloc(%shape) {uniq_name = ".tmp.forall"} : 
(!fir.heap>, !fir.shape<1>) -> !fir.heap>
+fir.freemem %alloc : !fir.heap>
+omp.terminator
+  }
+  return
+}

>From 33d6674ca8dfc1adf3b02f45317a7f068a7f7cb3 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 17:33:52 +0900
Subject: [PATCH 2/6] Add workshare loop wrapper lowerings

Bufferize test

Bufferize test

Bufferize test

Add test for should use workshare lowering
---
 .../HLFIR/Transforms/BufferizeHLFIR.cpp   |   4 +-
 .../Transforms/OptimizedBufferization.cpp |  10 +-
 flang/test/HLFIR/bufferize-workshare.fir  |  58 
 .../OpenMP/should-use-workshare-lowering.mlir | 140 +++

[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)

2024-10-04 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101446

>From e56dbd6a0625890fd9a3d6a62675e864ca94a8f5 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 22:06:55 +0900
Subject: [PATCH 01/10] [flang] Lower omp.workshare to other omp constructs

Change to workshare loop wrapper op

Move single op declaration

Schedule pass properly

Correctly handle nested nested loop nests to be parallelized by workshare

Leave comments for shouldUseWorkshareLowering

Use copyprivate to scatter val from omp.single

TODO still need to implement copy function
TODO transitive check for usage outside of omp.single not imiplemented yet

Transitively check for users outisde of single op

TODO need to implement copy func
TODO need to hoist allocas outside of single regions

Add tests

Hoist allocas

More tests

Emit body for copy func

Test the tmp storing logic

Clean up trivially dead ops

Only handle single-block regions for now

Fix tests for custom assembly for loop wrapper

Only run the lower workshare pass if openmp is enabled

Implement some missing functionality

Fix tests

Fix test

Iterate backwards to find all trivially dead ops

Add expalanation comment for createCopyFun

Update test
---
 flang/include/flang/Optimizer/OpenMP/Passes.h |   5 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |   5 +
 flang/include/flang/Tools/CLOptions.inc   |   6 +-
 flang/include/flang/Tools/CrossToolHelpers.h  |   1 +
 flang/lib/Frontend/FrontendActions.cpp|  10 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 446 ++
 flang/test/Fir/basic-program.fir  |   1 +
 .../Transforms/OpenMP/lower-workshare.mlir| 189 
 .../Transforms/OpenMP/lower-workshare2.mlir   |  23 +
 .../Transforms/OpenMP/lower-workshare3.mlir   |  74 +++
 .../Transforms/OpenMP/lower-workshare4.mlir   |  59 +++
 .../Transforms/OpenMP/lower-workshare5.mlir   |  42 ++
 .../Transforms/OpenMP/lower-workshare6.mlir   |  51 ++
 flang/tools/bbc/bbc.cpp   |   5 +-
 flang/tools/tco/tco.cpp   |   1 +
 16 files changed, 915 insertions(+), 4 deletions(-)
 create mode 100644 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare2.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare3.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare4.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare5.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare6.mlir

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.h 
b/flang/include/flang/Optimizer/OpenMP/Passes.h
index 403d79667bf448..feb395f1a12dbd 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.h
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.h
@@ -25,6 +25,11 @@ namespace flangomp {
 #define GEN_PASS_REGISTRATION
 #include "flang/Optimizer/OpenMP/Passes.h.inc"
 
+/// Impelements the logic specified in the 2.8.3  workshare Construct section 
of
+/// the OpenMP standard which specifies what statements or constructs shall be
+/// divided into units of work.
+bool shouldUseWorkshareLowering(mlir::Operation *op);
+
 } // namespace flangomp
 
 #endif // FORTRAN_OPTIMIZER_OPENMP_PASSES_H
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td 
b/flang/include/flang/Optimizer/OpenMP/Passes.td
index 395178e26a5762..041240cad12eb3 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -37,4 +37,9 @@ def FunctionFiltering : Pass<"omp-function-filtering"> {
   ];
 }
 
+// Needs to be scheduled on Module as we create functions in it
+def LowerWorkshare : Pass<"lower-workshare", "::mlir::ModuleOp"> {
+  let summary = "Lower workshare construct";
+}
+
 #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/include/flang/Tools/CLOptions.inc 
b/flang/include/flang/Tools/CLOptions.inc
index 1881e23b00045a..bb00e079008a0b 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -337,7 +337,7 @@ inline void createDefaultFIROptimizerPassPipeline(
 /// \param optLevel - optimization level used for creating FIR optimization
 ///   passes pipeline
 inline void createHLFIRToFIRPassPipeline(
-mlir::PassManager &pm, llvm::OptimizationLevel optLevel = defaultOptLevel) 
{
+mlir::PassManager &pm, bool enableOpenMP, llvm::OptimizationLevel optLevel 
= defaultOptLevel) {
   if (optLevel.isOptimizingForSpeed()) {
 addCanonicalizerPassWithoutRegionSimplification(pm);
 addNestedPassToAllTopLevelOperations(
@@ -354,6 +354,8 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
+  if (enableOpenMP)
+pm.a

[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)

2024-10-04 Thread Ivan R. Ivanov via llvm-branch-commits

ivanradanov wrote:

@Thirumalai-Shaktivel Thank you very much. Fixed. 

`forall` is actually a case which we do not handle yet. You can give it a shot 
if you would like.

https://github.com/llvm/llvm-project/pull/104748
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)

2024-10-04 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101446

>From e56dbd6a0625890fd9a3d6a62675e864ca94a8f5 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 22:06:55 +0900
Subject: [PATCH 01/11] [flang] Lower omp.workshare to other omp constructs

Change to workshare loop wrapper op

Move single op declaration

Schedule pass properly

Correctly handle nested nested loop nests to be parallelized by workshare

Leave comments for shouldUseWorkshareLowering

Use copyprivate to scatter val from omp.single

TODO still need to implement copy function
TODO transitive check for usage outside of omp.single not imiplemented yet

Transitively check for users outisde of single op

TODO need to implement copy func
TODO need to hoist allocas outside of single regions

Add tests

Hoist allocas

More tests

Emit body for copy func

Test the tmp storing logic

Clean up trivially dead ops

Only handle single-block regions for now

Fix tests for custom assembly for loop wrapper

Only run the lower workshare pass if openmp is enabled

Implement some missing functionality

Fix tests

Fix test

Iterate backwards to find all trivially dead ops

Add expalanation comment for createCopyFun

Update test
---
 flang/include/flang/Optimizer/OpenMP/Passes.h |   5 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |   5 +
 flang/include/flang/Tools/CLOptions.inc   |   6 +-
 flang/include/flang/Tools/CrossToolHelpers.h  |   1 +
 flang/lib/Frontend/FrontendActions.cpp|  10 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 446 ++
 flang/test/Fir/basic-program.fir  |   1 +
 .../Transforms/OpenMP/lower-workshare.mlir| 189 
 .../Transforms/OpenMP/lower-workshare2.mlir   |  23 +
 .../Transforms/OpenMP/lower-workshare3.mlir   |  74 +++
 .../Transforms/OpenMP/lower-workshare4.mlir   |  59 +++
 .../Transforms/OpenMP/lower-workshare5.mlir   |  42 ++
 .../Transforms/OpenMP/lower-workshare6.mlir   |  51 ++
 flang/tools/bbc/bbc.cpp   |   5 +-
 flang/tools/tco/tco.cpp   |   1 +
 16 files changed, 915 insertions(+), 4 deletions(-)
 create mode 100644 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare2.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare3.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare4.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare5.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare6.mlir

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.h 
b/flang/include/flang/Optimizer/OpenMP/Passes.h
index 403d79667bf448..feb395f1a12dbd 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.h
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.h
@@ -25,6 +25,11 @@ namespace flangomp {
 #define GEN_PASS_REGISTRATION
 #include "flang/Optimizer/OpenMP/Passes.h.inc"
 
+/// Impelements the logic specified in the 2.8.3  workshare Construct section 
of
+/// the OpenMP standard which specifies what statements or constructs shall be
+/// divided into units of work.
+bool shouldUseWorkshareLowering(mlir::Operation *op);
+
 } // namespace flangomp
 
 #endif // FORTRAN_OPTIMIZER_OPENMP_PASSES_H
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td 
b/flang/include/flang/Optimizer/OpenMP/Passes.td
index 395178e26a5762..041240cad12eb3 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -37,4 +37,9 @@ def FunctionFiltering : Pass<"omp-function-filtering"> {
   ];
 }
 
+// Needs to be scheduled on Module as we create functions in it
+def LowerWorkshare : Pass<"lower-workshare", "::mlir::ModuleOp"> {
+  let summary = "Lower workshare construct";
+}
+
 #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/include/flang/Tools/CLOptions.inc 
b/flang/include/flang/Tools/CLOptions.inc
index 1881e23b00045a..bb00e079008a0b 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -337,7 +337,7 @@ inline void createDefaultFIROptimizerPassPipeline(
 /// \param optLevel - optimization level used for creating FIR optimization
 ///   passes pipeline
 inline void createHLFIRToFIRPassPipeline(
-mlir::PassManager &pm, llvm::OptimizationLevel optLevel = defaultOptLevel) 
{
+mlir::PassManager &pm, bool enableOpenMP, llvm::OptimizationLevel optLevel 
= defaultOptLevel) {
   if (optLevel.isOptimizingForSpeed()) {
 addCanonicalizerPassWithoutRegionSimplification(pm);
 addNestedPassToAllTopLevelOperations(
@@ -354,6 +354,8 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
+  if (enableOpenMP)
+pm.a

[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)

2024-10-04 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/104748

>From 8d0651ff644fa6821e0d0fbc4c47fee36802a15c Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Fri, 4 Oct 2024 22:48:42 +0900
Subject: [PATCH 1/6] Fix message

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp 
b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 84cf5e82167987..a91f64f04a30aa 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -466,8 +466,9 @@ LogicalResult lowerWorkshare(mlir::omp::WorkshareOp wsOp, 
DominanceInfo &di) {
   } else {
 // Otherwise just change the operation to an omp.single.
 
-wsOp->emitWarning("omp workshare with unstructured control flow currently "
-  "unsupported and will be serialized.");
+wsOp->emitWarning(
+"omp workshare with unstructured control flow is currently "
+"unsupported and will be serialized.");
 
 // `shouldUseWorkshareLowering` should have guaranteed that there are no
 // omp.workshare_loop_wrapper's that bind to this omp.workshare.

>From 881067963fea3ce7fa912692e0cca46a68288e85 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 17:33:52 +0900
Subject: [PATCH 2/6] Add workshare loop wrapper lowerings

Bufferize test

Bufferize test

Bufferize test

Add test for should use workshare lowering
---
 .../HLFIR/Transforms/BufferizeHLFIR.cpp   |   4 +-
 .../Transforms/OptimizedBufferization.cpp |  10 +-
 flang/test/HLFIR/bufferize-workshare.fir  |  58 
 .../OpenMP/should-use-workshare-lowering.mlir | 140 ++
 4 files changed, 208 insertions(+), 4 deletions(-)
 create mode 100644 flang/test/HLFIR/bufferize-workshare.fir
 create mode 100644 
flang/test/Transforms/OpenMP/should-use-workshare-lowering.mlir

diff --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp 
b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
index 07794828fce267..1848dbe2c7a2c2 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
@@ -26,6 +26,7 @@
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/Dominance.h"
 #include "mlir/IR/PatternMatch.h"
@@ -792,7 +793,8 @@ struct ElementalOpConversion
 // Generate a loop nest looping around the fir.elemental shape and clone
 // fir.elemental region inside the inner loop.
 hlfir::LoopNest loopNest =
-hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
+hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(),
+   flangomp::shouldUseWorkshareLowering(elemental));
 auto insPt = builder.saveInsertionPoint();
 builder.setInsertionPointToStart(loopNest.body);
 auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp 
b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index 3a0a98dc594463..f014724861e333 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -20,6 +20,7 @@
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/Transforms/Utils.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/IR/Dominance.h"
@@ -482,7 +483,8 @@ llvm::LogicalResult 
ElementalAssignBufferization::matchAndRewrite(
   // Generate a loop nest looping around the hlfir.elemental shape and clone
   // hlfir.elemental region inside the inner loop
   hlfir::LoopNest loopNest =
-  hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
+  hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(),
+ flangomp::shouldUseWorkshareLowering(elemental));
   builder.setInsertionPointToStart(loopNest.body);
   auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
 loopNest.oneBasedIndices);
@@ -553,7 +555,8 @@ llvm::LogicalResult 
BroadcastAssignBufferization::matchAndRewrite(
   llvm::SmallVector extents =
   hlfir::getIndexExtents(loc, builder, shape);
   hlfir::LoopNest loopNest =
-  hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
+  hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true,
+ flangomp::shouldUseWorkshareLowering(assign));
   builder.setInsertionPointToStart(loopNest.body);
   auto arrayElement =
 

[llvm-branch-commits] [llvm] [AMDGPU] Serialize WWM_REG vreg flag (PR #110229)

2024-10-04 Thread Matt Arsenault via llvm-branch-commits


@@ -578,3 +578,18 @@ body: |
 SI_RETURN
 
 ...
+---

arsenm wrote:

Should add an error test where the flag name is unrecognized 

https://github.com/llvm/llvm-project/pull/110229
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)

2024-10-04 Thread Thirumalai Shaktivel via llvm-branch-commits

Thirumalai-Shaktivel wrote:

Thanks for the quick fix. Yes, it works fine.

Here is another MRE that crashes:
```fortran
subroutine test_workshare_02()
real :: x(10)
integer :: i
call random_number(x)
x = 2
!$omp workshare
forall(i=1: 10) x(i) = x(i) * 2
!$omp end workshare
end subroutine test_workshare_02
```

The same crash happens for the where statement as well.

https://github.com/llvm/llvm-project/pull/104748
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 92b4baa - Revert "[CFIFixup] Factor CFI remember/restore insertion into a helper (NFC) …"

2024-10-04 Thread via llvm-branch-commits

Author: Daniel Hoekwater
Date: 2024-10-04T10:32:49-04:00
New Revision: 92b4baaa5d8ce3aad74c070ae7f1bdc393fae751

URL: 
https://github.com/llvm/llvm-project/commit/92b4baaa5d8ce3aad74c070ae7f1bdc393fae751
DIFF: 
https://github.com/llvm/llvm-project/commit/92b4baaa5d8ce3aad74c070ae7f1bdc393fae751.diff

LOG: Revert "[CFIFixup] Factor CFI remember/restore insertion into a helper 
(NFC) …"

This reverts commit 47c8b95daeec8e6cb012344ed037024528a73295.

Added: 


Modified: 
llvm/lib/CodeGen/CFIFixup.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/CFIFixup.cpp b/llvm/lib/CodeGen/CFIFixup.cpp
index e881a62522f0cb..61888a42666524 100644
--- a/llvm/lib/CodeGen/CFIFixup.cpp
+++ b/llvm/lib/CodeGen/CFIFixup.cpp
@@ -116,32 +116,6 @@ findPrologueEnd(MachineFunction &MF, 
MachineBasicBlock::iterator &PrologueEnd) {
   return nullptr;
 }
 
-// Inserts a `.cfi_remember_state` instruction before PrologueEnd and a
-// `.cfi_restore_state` instruction before DstInsertPt. Returns an iterator
-// to the first instruction after the inserted `.cfi_restore_state` 
instruction.
-static MachineBasicBlock::iterator
-insertRememberRestorePair(MachineBasicBlock::iterator RememberInsertPt,
-  MachineBasicBlock::iterator RestoreInsertPt) {
-  MachineBasicBlock *RememberMBB = RememberInsertPt->getParent();
-  MachineBasicBlock *RestoreMBB = RestoreInsertPt->getParent();
-  MachineFunction &MF = *RememberMBB->getParent();
-  const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
-
-  // Insert the `.cfi_remember_state` instruction.
-  unsigned CFIIndex =
-  MF.addFrameInst(MCCFIInstruction::createRememberState(nullptr));
-  BuildMI(*RememberMBB, RememberInsertPt, DebugLoc(),
-  TII.get(TargetOpcode::CFI_INSTRUCTION))
-  .addCFIIndex(CFIIndex);
-
-  // Insert the `.cfi_restore_state` instruction.
-  CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestoreState(nullptr));
-  BuildMI(*RestoreMBB, RestoreInsertPt, DebugLoc(),
-  TII.get(TargetOpcode::CFI_INSTRUCTION))
-  .addCFIIndex(CFIIndex);
-  return RestoreInsertPt;
-}
-
 bool CFIFixup::runOnMachineFunction(MachineFunction &MF) {
   const TargetFrameLowering &TFL = *MF.getSubtarget().getFrameLowering();
   if (!TFL.enableCFIFixup(MF))
@@ -200,10 +174,12 @@ bool CFIFixup::runOnMachineFunction(MachineFunction &MF) {
   // Every block inherits the frame state (as recorded in the unwind tables)
   // of the previous block. If the intended frame state is 
diff erent, insert
   // compensating CFI instructions.
+  const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
   bool Change = false;
   // `InsertPt` always points to the point in a preceding block where we have 
to
   // insert a `.cfi_remember_state`, in the case that the current block needs a
   // `.cfi_restore_state`.
+  MachineBasicBlock *InsertMBB = PrologueBlock;
   MachineBasicBlock::iterator InsertPt = PrologueEnd;
 
   assert(InsertPt != PrologueBlock->begin() &&
@@ -234,10 +210,20 @@ bool CFIFixup::runOnMachineFunction(MachineFunction &MF) {
 if (!Info.StrongNoFrameOnEntry && Info.HasFrameOnEntry && !HasFrame) {
   // Reset to the "after prologue" state.
 
-  // There's an earlier block known to have a stack frame. Insert a
-  // `.cfi_remember_state` instruction into that block and a
-  // `.cfi_restore_state` instruction at the beginning of the current 
block.
-  InsertPt = insertRememberRestorePair(InsertPt, CurrBB->begin());
+  // Insert a `.cfi_remember_state` into the last block known to have a
+  // stack frame.
+  unsigned CFIIndex =
+  MF.addFrameInst(MCCFIInstruction::createRememberState(nullptr));
+  BuildMI(*InsertMBB, InsertPt, DebugLoc(),
+  TII.get(TargetOpcode::CFI_INSTRUCTION))
+  .addCFIIndex(CFIIndex);
+  // Insert a `.cfi_restore_state` at the beginning of the current block.
+  CFIIndex = 
MF.addFrameInst(MCCFIInstruction::createRestoreState(nullptr));
+  InsertPt = BuildMI(*CurrBB, CurrBB->begin(), DebugLoc(),
+ TII.get(TargetOpcode::CFI_INSTRUCTION))
+ .addCFIIndex(CFIIndex);
+  ++InsertPt;
+  InsertMBB = &*CurrBB;
   Change = true;
 } else if ((Info.StrongNoFrameOnEntry || !Info.HasFrameOnEntry) &&
HasFrame) {



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #111192)

2024-10-04 Thread via llvm-branch-commits

agozillon wrote:

This portion of the PR stack underwent the most changes, as it's the most 
"complex" part of the PR stack, it's primarily underwent some additional 
complexity in the map lowering for member mapping which now attempts to 
incorporate bounds where possible amongst a some other things, it also 
incorporates the last iteration of reviewer comments on the previous PR stack 
alongside additional tests checking the new changes work as expected (and 
continue to).

https://github.com/llvm/llvm-project/pull/92
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #111191)

2024-10-04 Thread via llvm-branch-commits

agozillon wrote:

This portion of the PR stack has remained largely unchanged, the only 
alteration is incorporating the final set of comments from the last iteration 
of the PR stack. 

https://github.com/llvm/llvm-project/pull/91
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-10-04 Thread via llvm-branch-commits

https://github.com/agozillon closed 
https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-10-04 Thread via llvm-branch-commits

agozillon wrote:

Going to close this current version of the PR stack and open a new one with the 
changes requested incorporated alongside some newer additions to add support 
for indexing members at arbitrary depths and some other general fixes, 
unfortunately lost track of what is in the PR stack and what isn't after 
working downstream on it for so long, so easier to start fresh to make sure 
nothing is missed. Incredibly sorry for the bother.

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #111191)

2024-10-04 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-flang-openmp

Author: None (agozillon)


Changes

This is one of 3 PRs in a PR stack that aims to add support for explicit 
mapping of
allocatable members in derived types.

The primary changes in this PR are the OpenMPToLLVMIRTranslation.cpp changes,
which are small and seek to alter the current member mapping to add an
additional map insertion for pointers. Effectively, if the member is a pointer
(currently indicated by having a varPtrPtr field) we add an additional map for
the pointer and then alter the subsequent mapping of the member (the data)
to utilise the member rather than the parents base pointer. This appears to be
necessary in certain cases when mapping pointer data within record types to
avoid segfaulting on device (due to incorrect data mapping). In general this
record type mapping may be simplifiable in the future.

There are also additions of tests which should help to showcase the affect
of the changes above.


---

Patch is 30.32 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/91.diff


6 Files Affected:

- (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+1-1) 
- (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+19-39) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+58-35) 
- (added) 
mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-record-type-mapping-host.mlir
 (+66) 
- (modified) 
mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir (+41-31) 
- (modified) 
mlir/test/Target/LLVMIR/omptarget-nested-record-type-mapping-host.mlir (+1-1) 


``diff
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td 
b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 66f63fc02fe2f3..60acf59a7d93e0 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -879,7 +879,7 @@ def MapInfoOp : OpenMP_Op<"map.info", 
[AttrSizedOperandSegments]> {
TypeAttr:$var_type,
Optional:$var_ptr_ptr,
Variadic:$members,
-   OptionalAttr:$members_index,
+   OptionalAttr:$members_index,
Variadic:$bounds, /* rank-0 to 
rank-{n-1} */
OptionalAttr:$map_type,
OptionalAttr:$map_capture_type,
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp 
b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index d516c8d9e0be6c..f9024dd93ae144 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1392,16 +1392,15 @@ static void printMapClause(OpAsmPrinter &p, Operation 
*op,
 }
 
 static ParseResult parseMembersIndex(OpAsmParser &parser,
- DenseIntElementsAttr &membersIdx) {
-  SmallVector values;
+ ArrayAttr &membersIdx) {
+  SmallVector values, memberIdxs;
   int64_t value;
-  int64_t shape[2] = {0, 0};
-  unsigned shapeTmp = 0;
+
   auto parseIndices = [&]() -> ParseResult {
 if (parser.parseInteger(value))
   return failure();
-shapeTmp++;
-values.push_back(APInt(32, value));
+values.push_back(IntegerAttr::get(parser.getBuilder().getIntegerType(64),
+  mlir::APInt(64, value)));
 return success();
   };
 
@@ -1415,51 +1414,32 @@ static ParseResult parseMembersIndex(OpAsmParser 
&parser,
 if (failed(parser.parseRSquare()))
   return failure();
 
-// Only set once, if any indices are not the same size
-// we error out in the next check as that's unsupported
-if (shape[1] == 0)
-  shape[1] = shapeTmp;
-
-// Verify that the recently parsed list is equal to the
-// first one we parsed, they must be equal lengths to
-// keep the rectangular shape DenseIntElementsAttr
-// requires
-if (shapeTmp != shape[1])
-  return failure();
-
-shapeTmp = 0;
-shape[0]++;
+memberIdxs.push_back(ArrayAttr::get(parser.getContext(), values));
+values.clear();
   } while (succeeded(parser.parseOptionalComma()));
 
-  if (!values.empty()) {
-ShapedType valueType =
-VectorType::get(shape, IntegerType::get(parser.getContext(), 32));
-membersIdx = DenseIntElementsAttr::get(valueType, values);
-  }
+  if (!memberIdxs.empty())
+membersIdx = ArrayAttr::get(parser.getContext(), memberIdxs);
 
   return success();
 }
 
 static void printMembersIndex(OpAsmPrinter &p, MapInfoOp op,
-  DenseIntElementsAttr membersIdx) {
-  llvm::ArrayRef shape = membersIdx.getShapedType().getShape();
-  assert(shape.size() <= 2);
-
+  ArrayAttr membersIdx) {
   if (!membersIdx)
 return;
 
-  for (int i = 0; i < shape[0]; ++i) {
+  SmallVector idxs;
+  for (auto [i, v] : llvm::enumerate(membersIdx)) {
+auto memberIdx = mlir::

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #111191)

2024-10-04 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir-llvm

Author: None (agozillon)


Changes

This is one of 3 PRs in a PR stack that aims to add support for explicit 
mapping of
allocatable members in derived types.

The primary changes in this PR are the OpenMPToLLVMIRTranslation.cpp changes,
which are small and seek to alter the current member mapping to add an
additional map insertion for pointers. Effectively, if the member is a pointer
(currently indicated by having a varPtrPtr field) we add an additional map for
the pointer and then alter the subsequent mapping of the member (the data)
to utilise the member rather than the parents base pointer. This appears to be
necessary in certain cases when mapping pointer data within record types to
avoid segfaulting on device (due to incorrect data mapping). In general this
record type mapping may be simplifiable in the future.

There are also additions of tests which should help to showcase the affect
of the changes above.


---

Patch is 30.32 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/91.diff


6 Files Affected:

- (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+1-1) 
- (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+19-39) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+58-35) 
- (added) 
mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-record-type-mapping-host.mlir
 (+66) 
- (modified) 
mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir (+41-31) 
- (modified) 
mlir/test/Target/LLVMIR/omptarget-nested-record-type-mapping-host.mlir (+1-1) 


``diff
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td 
b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 66f63fc02fe2f3..60acf59a7d93e0 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -879,7 +879,7 @@ def MapInfoOp : OpenMP_Op<"map.info", 
[AttrSizedOperandSegments]> {
TypeAttr:$var_type,
Optional:$var_ptr_ptr,
Variadic:$members,
-   OptionalAttr:$members_index,
+   OptionalAttr:$members_index,
Variadic:$bounds, /* rank-0 to 
rank-{n-1} */
OptionalAttr:$map_type,
OptionalAttr:$map_capture_type,
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp 
b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index d516c8d9e0be6c..f9024dd93ae144 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1392,16 +1392,15 @@ static void printMapClause(OpAsmPrinter &p, Operation 
*op,
 }
 
 static ParseResult parseMembersIndex(OpAsmParser &parser,
- DenseIntElementsAttr &membersIdx) {
-  SmallVector values;
+ ArrayAttr &membersIdx) {
+  SmallVector values, memberIdxs;
   int64_t value;
-  int64_t shape[2] = {0, 0};
-  unsigned shapeTmp = 0;
+
   auto parseIndices = [&]() -> ParseResult {
 if (parser.parseInteger(value))
   return failure();
-shapeTmp++;
-values.push_back(APInt(32, value));
+values.push_back(IntegerAttr::get(parser.getBuilder().getIntegerType(64),
+  mlir::APInt(64, value)));
 return success();
   };
 
@@ -1415,51 +1414,32 @@ static ParseResult parseMembersIndex(OpAsmParser 
&parser,
 if (failed(parser.parseRSquare()))
   return failure();
 
-// Only set once, if any indices are not the same size
-// we error out in the next check as that's unsupported
-if (shape[1] == 0)
-  shape[1] = shapeTmp;
-
-// Verify that the recently parsed list is equal to the
-// first one we parsed, they must be equal lengths to
-// keep the rectangular shape DenseIntElementsAttr
-// requires
-if (shapeTmp != shape[1])
-  return failure();
-
-shapeTmp = 0;
-shape[0]++;
+memberIdxs.push_back(ArrayAttr::get(parser.getContext(), values));
+values.clear();
   } while (succeeded(parser.parseOptionalComma()));
 
-  if (!values.empty()) {
-ShapedType valueType =
-VectorType::get(shape, IntegerType::get(parser.getContext(), 32));
-membersIdx = DenseIntElementsAttr::get(valueType, values);
-  }
+  if (!memberIdxs.empty())
+membersIdx = ArrayAttr::get(parser.getContext(), memberIdxs);
 
   return success();
 }
 
 static void printMembersIndex(OpAsmPrinter &p, MapInfoOp op,
-  DenseIntElementsAttr membersIdx) {
-  llvm::ArrayRef shape = membersIdx.getShapedType().getShape();
-  assert(shape.size() <= 2);
-
+  ArrayAttr membersIdx) {
   if (!membersIdx)
 return;
 
-  for (int i = 0; i < shape[0]; ++i) {
+  SmallVector idxs;
+  for (auto [i, v] : llvm::enumerate(membersIdx)) {
+auto memberIdx = mlir::cast(v);
 p << "[";
-

[llvm-branch-commits] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #111192)

2024-10-04 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: None (agozillon)


Changes

This PR is one of 3 in a PR stack, this is the primary change set which seeks
to extend the current derived type explicit member mapping support to
handle descriptor member mapping at arbitrary levels of nesting. The PR
stack seems to do this reasonably (from testing so far) but as you can
create quite complex mappings with derived types (in particular when adding
allocatable derived types or arrays of allocatable derived types) I imagine
there will be hiccups, which I am more than happy to address. There will
also be further extensions to this work to handle the implicit auto-magical
mapping of descriptor members in derived types and a few other changes
planned for the future (with some ideas on optimizing things).

The changes in this PR primarily occur in the OpenMP lowering and
the OMPMapInfoFinalization pass.

In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of
these types or do a "deep copy" for lack of a better term, where we map
both the base address and the descriptor as without the copying of both
of these we lack the information in the case of the descriptor to access the
member or attach the pointers data to the pointer and in the latter case we
require the base address to map the chunk of data. Currently we do not
segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future as
it adds a fair bit of a performance penalty to having nestings of allocatable
derived types as an example. The process of mapping all intermediate
descriptor members in a members path only occurs if a member has
an allocatable or object parent in its symbol path or the member itself
is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also
generate the appropriate address for the allocatable member
within the derived type to use as a the varPtr field of the map (for
intermediate allocatable maps and final allocatable mappings). In
this case it's necessary as we can't utilise the usual Fortran::lower
functionality such as gatherDataOperandAddrAndBounds without
causing issues later in the lowering due to extra allocas being spawned
which seem to affect the pointer attachment (at least this is my
current assumption, it results in memory access errors on the device
due to incorrect map information generation). This is similar
to why we do not use the MLIR value generated for this and utilise
the original symbol provided when mapping descriptor types external
to derived types. Hopefully this can be rectified in the future so this
function can be simplified and more closely aligned to the other type
mappings. We also make use of fir::CoordinateOp as opposed to the
HLFIR version as the HLFIR version doesn't support the appropriate
lowering to FIR necessary at the moment, we also cannot use a
single CoordinateOp (similarly to a single GEP) as when we index
through a descriptor operation (BoxType) we encounter issues later
in the lowering, however in either case we need access to intermediate
descriptors so individual CoordinateOp's aid this (although, being
able to compress them into a smaller amount of CoordinateOp's may
simplify the IR and perhaps result in a better end product, something
to consider for the future).

The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box
types (or multiple nestings of box types) within derived types, or box
type derived types. This requires expanding each BoxType mapping
from one into two maps and then modifying all of the existing
member indices of the overarching parent mapping to account for
the addition of these new members alongside adjusting the existing
member indices to support the addition of these new maps which
extend the original member indices (as a base address of a box type
is currently considered a member of the box type at a position of
0 as when lowered to LLVM-IR it's a pointer contained at this position
in the descriptor type, however, this means extending mapped children
of this expanded descriptor type to additionally incorporate the new
member index in the correct location in its own index list). I believe
there is a reasonable amount of comments that should aid in
understanding this better, alongside the test alterations for the pass.

A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr
containing the member indices shareable across the lowering and
OMPMapInfoFinalization, this required moving some functions to t

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #111191)

2024-10-04 Thread via llvm-branch-commits

https://github.com/agozillon created 
https://github.com/llvm/llvm-project/pull/91

This is one of 3 PRs in a PR stack that aims to add support for explicit 
mapping of
allocatable members in derived types.

The primary changes in this PR are the OpenMPToLLVMIRTranslation.cpp changes,
which are small and seek to alter the current member mapping to add an
additional map insertion for pointers. Effectively, if the member is a pointer
(currently indicated by having a varPtrPtr field) we add an additional map for
the pointer and then alter the subsequent mapping of the member (the data)
to utilise the member rather than the parents base pointer. This appears to be
necessary in certain cases when mapping pointer data within record types to
avoid segfaulting on device (due to incorrect data mapping). In general this
record type mapping may be simplifiable in the future.

There are also additions of tests which should help to showcase the affect
of the changes above.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #111192)

2024-10-04 Thread via llvm-branch-commits

https://github.com/agozillon created 
https://github.com/llvm/llvm-project/pull/92

This PR is one of 3 in a PR stack, this is the primary change set which seeks
to extend the current derived type explicit member mapping support to
handle descriptor member mapping at arbitrary levels of nesting. The PR
stack seems to do this reasonably (from testing so far) but as you can
create quite complex mappings with derived types (in particular when adding
allocatable derived types or arrays of allocatable derived types) I imagine
there will be hiccups, which I am more than happy to address. There will
also be further extensions to this work to handle the implicit auto-magical
mapping of descriptor members in derived types and a few other changes
planned for the future (with some ideas on optimizing things).

The changes in this PR primarily occur in the OpenMP lowering and
the OMPMapInfoFinalization pass.

In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of
these types or do a "deep copy" for lack of a better term, where we map
both the base address and the descriptor as without the copying of both
of these we lack the information in the case of the descriptor to access the
member or attach the pointers data to the pointer and in the latter case we
require the base address to map the chunk of data. Currently we do not
segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future as
it adds a fair bit of a performance penalty to having nestings of allocatable
derived types as an example. The process of mapping all intermediate
descriptor members in a members path only occurs if a member has
an allocatable or object parent in its symbol path or the member itself
is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also
generate the appropriate address for the allocatable member
within the derived type to use as a the varPtr field of the map (for
intermediate allocatable maps and final allocatable mappings). In
this case it's necessary as we can't utilise the usual Fortran::lower
functionality such as gatherDataOperandAddrAndBounds without
causing issues later in the lowering due to extra allocas being spawned
which seem to affect the pointer attachment (at least this is my
current assumption, it results in memory access errors on the device
due to incorrect map information generation). This is similar
to why we do not use the MLIR value generated for this and utilise
the original symbol provided when mapping descriptor types external
to derived types. Hopefully this can be rectified in the future so this
function can be simplified and more closely aligned to the other type
mappings. We also make use of fir::CoordinateOp as opposed to the
HLFIR version as the HLFIR version doesn't support the appropriate
lowering to FIR necessary at the moment, we also cannot use a
single CoordinateOp (similarly to a single GEP) as when we index
through a descriptor operation (BoxType) we encounter issues later
in the lowering, however in either case we need access to intermediate
descriptors so individual CoordinateOp's aid this (although, being
able to compress them into a smaller amount of CoordinateOp's may
simplify the IR and perhaps result in a better end product, something
to consider for the future).

The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box
types (or multiple nestings of box types) within derived types, or box
type derived types. This requires expanding each BoxType mapping
from one into two maps and then modifying all of the existing
member indices of the overarching parent mapping to account for
the addition of these new members alongside adjusting the existing
member indices to support the addition of these new maps which
extend the original member indices (as a base address of a box type
is currently considered a member of the box type at a position of
0 as when lowered to LLVM-IR it's a pointer contained at this position
in the descriptor type, however, this means extending mapped children
of this expanded descriptor type to additionally incorporate the new
member index in the correct location in its own index list). I believe
there is a reasonable amount of comments that should aid in
understanding this better, alongside the test alterations for the pass.

A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr
containing the member indices shareable across the lowering and
OMPMapInfoFinalization, this required moving some functions to the
Lo

[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)

2024-10-04 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/104748

>From b11ddd76c7fa12b071e0e6b0afd4c3ebbc9ee363 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sat, 5 Oct 2024 12:57:48 +0900
Subject: [PATCH 1/6] Fix tests

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp| 7 ++-
 .../OpenMP/lower-workshare-correct-parallelize.mlir  | 9 +
 .../Transforms/OpenMP/lower-workshare-todo-cfg-dom.mlir  | 2 +-
 .../test/Transforms/OpenMP/lower-workshare-todo-cfg.mlir | 2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp 
b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index a91f64f04a30aa..aa4371b3af6f7d 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -249,7 +249,12 @@ static void parallelizeRegion(Region &sourceRegion, Region 
&targetRegion,
   if (isSafeToParallelize(&op)) {
 singleBuilder.clone(op, singleMapping);
 if (llvm::all_of(op.getOperands(), [&](Value opr) {
-  return rootMapping.contains(opr);
+  // Either we have already remapped it
+  bool remapped = rootMapping.contains(opr);
+  // Or it is available because it dominates `sr`
+  bool dominates =
+  di.properlyDominates(opr.getDefiningOp(), &*sr.begin);
+  return remapped || dominates;
 })) {
   // Safe to parallelize operations which have all operands available 
in
   // the root parallel block can be executed there.
diff --git 
a/flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir 
b/flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir
index 99ca4fe5a0e212..31db8213b5f001 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare-correct-parallelize.mlir
@@ -14,3 +14,12 @@ func.func @foo() {
   }
   return
 }
+
+// CHECK:omp.single nowait
+// CHECK:  fir.allocmem
+// CHECK:  fir.shape
+// CHECK:  fir.declare
+// CHECK:  fir.freemem
+// CHECK:  omp.terminator
+// CHECK:}
+// CHECK:omp.barrier
diff --git a/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg-dom.mlir 
b/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg-dom.mlir
index 96dc878bed0c99..83c49cd635d082 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg-dom.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg-dom.mlir
@@ -1,6 +1,6 @@
 // RUN: fir-opt --lower-workshare --allow-unregistered-dialect %s 2>&1 | 
FileCheck %s
 
-// CHECK: warning: omp workshare with unstructured control flow currently 
unsupported and will be serialized.
+// CHECK: warning: omp workshare with unstructured control flow is currently 
unsupported and will be serialized.
 
 // CHECK: omp.parallel
 // CHECK-NEXT: omp.single
diff --git a/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg.mlir 
b/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg.mlir
index ce8a4eb96982be..a27cf880694014 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare-todo-cfg.mlir
@@ -1,6 +1,6 @@
 // RUN: fir-opt --lower-workshare --allow-unregistered-dialect %s 2>&1 | 
FileCheck %s
 
-// CHECK: warning: omp workshare with unstructured control flow currently 
unsupported and will be serialized.
+// CHECK: warning: omp workshare with unstructured control flow is currently 
unsupported and will be serialized.
 
 // CHECK: omp.parallel
 // CHECK-NEXT: omp.single

>From 1ce816ce0d6b56a132b56f6a1d25c91cefecfe57 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 17:33:52 +0900
Subject: [PATCH 2/6] Add workshare loop wrapper lowerings

Bufferize test

Bufferize test

Bufferize test

Add test for should use workshare lowering
---
 .../HLFIR/Transforms/BufferizeHLFIR.cpp   |   4 +-
 .../Transforms/OptimizedBufferization.cpp |  10 +-
 flang/test/HLFIR/bufferize-workshare.fir  |  58 
 .../OpenMP/should-use-workshare-lowering.mlir | 140 ++
 4 files changed, 208 insertions(+), 4 deletions(-)
 create mode 100644 flang/test/HLFIR/bufferize-workshare.fir
 create mode 100644 
flang/test/Transforms/OpenMP/should-use-workshare-lowering.mlir

diff --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp 
b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
index 07794828fce267..1848dbe2c7a2c2 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
@@ -26,6 +26,7 @@
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/Dominance.h"
 #include "mlir/IR/PatternMatch.

[llvm-branch-commits] [libcxx] [release/19.x][libc++] Follow-up to "Poison Pills are Too Toxic" (PR #109291)

2024-10-04 Thread A. Jiang via llvm-branch-commits

https://github.com/frederick-vs-ja approved this pull request.

Looks like that we should merge this now.

https://github.com/llvm/llvm-project/pull/109291
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)

2024-10-04 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101446

>From e56dbd6a0625890fd9a3d6a62675e864ca94a8f5 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 22:06:55 +0900
Subject: [PATCH 01/12] [flang] Lower omp.workshare to other omp constructs

Change to workshare loop wrapper op

Move single op declaration

Schedule pass properly

Correctly handle nested nested loop nests to be parallelized by workshare

Leave comments for shouldUseWorkshareLowering

Use copyprivate to scatter val from omp.single

TODO still need to implement copy function
TODO transitive check for usage outside of omp.single not imiplemented yet

Transitively check for users outisde of single op

TODO need to implement copy func
TODO need to hoist allocas outside of single regions

Add tests

Hoist allocas

More tests

Emit body for copy func

Test the tmp storing logic

Clean up trivially dead ops

Only handle single-block regions for now

Fix tests for custom assembly for loop wrapper

Only run the lower workshare pass if openmp is enabled

Implement some missing functionality

Fix tests

Fix test

Iterate backwards to find all trivially dead ops

Add expalanation comment for createCopyFun

Update test
---
 flang/include/flang/Optimizer/OpenMP/Passes.h |   5 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |   5 +
 flang/include/flang/Tools/CLOptions.inc   |   6 +-
 flang/include/flang/Tools/CrossToolHelpers.h  |   1 +
 flang/lib/Frontend/FrontendActions.cpp|  10 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 446 ++
 flang/test/Fir/basic-program.fir  |   1 +
 .../Transforms/OpenMP/lower-workshare.mlir| 189 
 .../Transforms/OpenMP/lower-workshare2.mlir   |  23 +
 .../Transforms/OpenMP/lower-workshare3.mlir   |  74 +++
 .../Transforms/OpenMP/lower-workshare4.mlir   |  59 +++
 .../Transforms/OpenMP/lower-workshare5.mlir   |  42 ++
 .../Transforms/OpenMP/lower-workshare6.mlir   |  51 ++
 flang/tools/bbc/bbc.cpp   |   5 +-
 flang/tools/tco/tco.cpp   |   1 +
 16 files changed, 915 insertions(+), 4 deletions(-)
 create mode 100644 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare2.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare3.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare4.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare5.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare6.mlir

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.h 
b/flang/include/flang/Optimizer/OpenMP/Passes.h
index 403d79667bf448..feb395f1a12dbd 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.h
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.h
@@ -25,6 +25,11 @@ namespace flangomp {
 #define GEN_PASS_REGISTRATION
 #include "flang/Optimizer/OpenMP/Passes.h.inc"
 
+/// Impelements the logic specified in the 2.8.3  workshare Construct section 
of
+/// the OpenMP standard which specifies what statements or constructs shall be
+/// divided into units of work.
+bool shouldUseWorkshareLowering(mlir::Operation *op);
+
 } // namespace flangomp
 
 #endif // FORTRAN_OPTIMIZER_OPENMP_PASSES_H
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td 
b/flang/include/flang/Optimizer/OpenMP/Passes.td
index 395178e26a5762..041240cad12eb3 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -37,4 +37,9 @@ def FunctionFiltering : Pass<"omp-function-filtering"> {
   ];
 }
 
+// Needs to be scheduled on Module as we create functions in it
+def LowerWorkshare : Pass<"lower-workshare", "::mlir::ModuleOp"> {
+  let summary = "Lower workshare construct";
+}
+
 #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/include/flang/Tools/CLOptions.inc 
b/flang/include/flang/Tools/CLOptions.inc
index 1881e23b00045a..bb00e079008a0b 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -337,7 +337,7 @@ inline void createDefaultFIROptimizerPassPipeline(
 /// \param optLevel - optimization level used for creating FIR optimization
 ///   passes pipeline
 inline void createHLFIRToFIRPassPipeline(
-mlir::PassManager &pm, llvm::OptimizationLevel optLevel = defaultOptLevel) 
{
+mlir::PassManager &pm, bool enableOpenMP, llvm::OptimizationLevel optLevel 
= defaultOptLevel) {
   if (optLevel.isOptimizingForSpeed()) {
 addCanonicalizerPassWithoutRegionSimplification(pm);
 addNestedPassToAllTopLevelOperations(
@@ -354,6 +354,8 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
+  if (enableOpenMP)
+pm.a

[llvm-branch-commits] [compiler-rt] [llvm] [Coverage] Make SingleByteCoverage work consistent to merging (PR #110972)

2024-10-04 Thread NAKAMURA Takumi via llvm-branch-commits

https://github.com/chapuni updated 
https://github.com/llvm/llvm-project/pull/110972

>From aacb50ddf87d96b4a0644c7ef5d0a86dc94f069b Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi 
Date: Wed, 2 Oct 2024 23:25:52 +0900
Subject: [PATCH 1/2] [Coverage] Make SingleByteCoverage work consistent to
 merging

- Round `Counts` as 1/0
- Confirm both `ExecutionCount` and `AltExecutionCount` are in range.
---
 compiler-rt/test/profile/instrprof-block-coverage.c | 2 +-
 compiler-rt/test/profile/instrprof-entry-coverage.c | 2 +-
 llvm/include/llvm/ProfileData/InstrProf.h   | 5 -
 llvm/lib/ProfileData/Coverage/CoverageMapping.cpp   | 3 +++
 llvm/lib/ProfileData/InstrProf.cpp  | 2 +-
 llvm/lib/ProfileData/InstrProfReader.cpp| 1 +
 6 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/compiler-rt/test/profile/instrprof-block-coverage.c 
b/compiler-rt/test/profile/instrprof-block-coverage.c
index 829d5af8dc3f9e..8d924e1cac64d8 100644
--- a/compiler-rt/test/profile/instrprof-block-coverage.c
+++ b/compiler-rt/test/profile/instrprof-block-coverage.c
@@ -49,4 +49,4 @@ int main(int argc, char *argv[]) {
 
 // CHECK-ERROR-NOT: warning: {{.*}}: Found inconsistent block coverage
 
-// COUNTS: Maximum function count: 4
+// COUNTS: Maximum function count: 1
diff --git a/compiler-rt/test/profile/instrprof-entry-coverage.c 
b/compiler-rt/test/profile/instrprof-entry-coverage.c
index 1c6816ba01964b..b93a4e0c43ccd6 100644
--- a/compiler-rt/test/profile/instrprof-entry-coverage.c
+++ b/compiler-rt/test/profile/instrprof-entry-coverage.c
@@ -36,4 +36,4 @@ int main(int argc, char *argv[]) {
 // CHECK-DAG: foo
 // CHECK-DAG: bar
 
-// COUNTS: Maximum function count: 2
+// COUNTS: Maximum function count: 1
diff --git a/llvm/include/llvm/ProfileData/InstrProf.h 
b/llvm/include/llvm/ProfileData/InstrProf.h
index b0b2258735e2ae..df9e76966bf42b 100644
--- a/llvm/include/llvm/ProfileData/InstrProf.h
+++ b/llvm/include/llvm/ProfileData/InstrProf.h
@@ -830,6 +830,7 @@ struct InstrProfValueSiteRecord {
 /// Profiling information for a single function.
 struct InstrProfRecord {
   std::vector Counts;
+  bool SingleByteCoverage = false;
   std::vector BitmapBytes;
 
   InstrProfRecord() = default;
@@ -839,13 +840,15 @@ struct InstrProfRecord {
   : Counts(std::move(Counts)), BitmapBytes(std::move(BitmapBytes)) {}
   InstrProfRecord(InstrProfRecord &&) = default;
   InstrProfRecord(const InstrProfRecord &RHS)
-  : Counts(RHS.Counts), BitmapBytes(RHS.BitmapBytes),
+  : Counts(RHS.Counts), SingleByteCoverage(RHS.SingleByteCoverage),
+BitmapBytes(RHS.BitmapBytes),
 ValueData(RHS.ValueData
   ? std::make_unique(*RHS.ValueData)
   : nullptr) {}
   InstrProfRecord &operator=(InstrProfRecord &&) = default;
   InstrProfRecord &operator=(const InstrProfRecord &RHS) {
 Counts = RHS.Counts;
+SingleByteCoverage = RHS.SingleByteCoverage;
 BitmapBytes = RHS.BitmapBytes;
 if (!RHS.ValueData) {
   ValueData = nullptr;
diff --git a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp 
b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
index a02136d5b0386d..bc765c59381718 100644
--- a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
+++ b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
@@ -874,6 +874,9 @@ Error CoverageMapping::loadFunctionRecord(
   consumeError(std::move(E));
   return Error::success();
 }
+assert(!SingleByteCoverage ||
+   (0 <= *ExecutionCount && *ExecutionCount <= 1 &&
+0 <= *AltExecutionCount && *AltExecutionCount <= 1));
 Function.pushRegion(Region, *ExecutionCount, *AltExecutionCount);
 
 // Record ExpansionRegion.
diff --git a/llvm/lib/ProfileData/InstrProf.cpp 
b/llvm/lib/ProfileData/InstrProf.cpp
index b9937c9429b77d..0f6677b4d35718 100644
--- a/llvm/lib/ProfileData/InstrProf.cpp
+++ b/llvm/lib/ProfileData/InstrProf.cpp
@@ -952,7 +952,7 @@ void InstrProfRecord::merge(InstrProfRecord &Other, 
uint64_t Weight,
   Value = getInstrMaxCountValue();
   Overflowed = true;
 }
-Counts[I] = Value;
+Counts[I] = (SingleByteCoverage && Value != 0 ? 1 : Value);
 if (Overflowed)
   Warn(instrprof_error::counter_overflow);
   }
diff --git a/llvm/lib/ProfileData/InstrProfReader.cpp 
b/llvm/lib/ProfileData/InstrProfReader.cpp
index b90617c74f6d13..a07d7f573275ba 100644
--- a/llvm/lib/ProfileData/InstrProfReader.cpp
+++ b/llvm/lib/ProfileData/InstrProfReader.cpp
@@ -743,6 +743,7 @@ Error RawInstrProfReader::readRawCounts(
 
   Record.Counts.clear();
   Record.Counts.reserve(NumCounters);
+  Record.SingleByteCoverage = hasSingleByteCoverage();
   for (uint32_t I = 0; I < NumCounters; I++) {
 const char *Ptr =
 CountersStart + CounterBaseOffset + I * getCounterTypeSize();

>From b9bbc7cac3076594cd326ffa7f2d4fc4a92fabb9 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi 
Date: Sat, 5 Oct 2024 10:43:26 +0900
Subject: [PATCH 2/2] Rework. (Also revert

[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)

2024-10-04 Thread Tom Eccles via llvm-branch-commits


@@ -96,6 +94,12 @@ bool shouldUseWorkshareLowering(Operation *op) {
   if (isNestedIn(parentWorkshare, op))
 return false;
 
+  if (parentWorkshare.getRegion().getBlocks().size() != 1) {
+parentWorkshare->emitWarning(
+"omp workshare with unstructured control flow currently unsupported.");

tblah wrote:

nit
```suggestion
"omp workshare with unstructured control flow is currently 
unsupported.");
```

https://github.com/llvm/llvm-project/pull/101446
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (PR #110705)

2024-10-04 Thread Daniil Kovalev via llvm-branch-commits

kovdan01 wrote:

@atrosinenko With this patch applied, the following tests in test-suite 
compiled in Release mode become failed with segmentation fault:

1. SingleSource/Benchmarks/Misc-C++-EH/spirit.test
2. MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test
3. MultiSource/Benchmarks/Prolangs-C++/shapes/shapes.test
4. MultiSource/Benchmarks/Prolangs-C++/ocean/ocean.test
5. MultiSource/Benchmarks/Bullet/bullet.test
6. MultiSource/Applications/lambda-0.1.3/lambda.test
7. MultiSource/Applications/kimwitu++/kc.test

I'm now preparing a minimal reproducer for that. Stay tuned.



https://github.com/llvm/llvm-project/pull/110705
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-10-04 Thread via llvm-branch-commits

agozillon wrote:

Going to close this current version of the PR stack and open a new one with the 
changes requested incorporated alongside some newer additions to add support 
for indexing members at arbitrary depths and some other general fixes, 
unfortunately lost track of what is in the PR stack and what isn't after 
working downstream on it for so long, so easier to start fresh to make sure 
nothing is missed. Incredibly sorry for the bother.

https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-10-04 Thread via llvm-branch-commits

https://github.com/agozillon closed 
https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-04 Thread Farzon Lotfi via llvm-branch-commits


@@ -177,6 +177,107 @@ define float @tan(float %x) #0 {
   ret float %result
 }
 
+define float @acos(float %x) #0 {

farzonl wrote:

@efriedma-quic I'm looking at the commit history of this file:
https://github.com/llvm/llvm-project/commits/release/19.x/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll

The auto cherry pick failed because I added the test for msvc out of order in 
https://github.com/llvm/llvm-project/commit/378fe2fc23fa56181577d411fe6d51fa531cd860

That commit added some vectorizations nothing that should impact the fix.

https://github.com/llvm/llvm-project/pull/111218
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-04 Thread Farzon Lotfi via llvm-branch-commits

https://github.com/farzonl created 
https://github.com/llvm/llvm-project/pull/111218

Windows does not support float C89 math functions like:
- acosf
- asinf
- atanf
- coshf
- sinhf
- tanhf These 6 libfuncs need to be type promoted.

This PR fixes the bug introduced by 
https://github.com/llvm/llvm-project/pull/98949

>From 558e053c74e1ffa0db0674ecaa500023296ccd46 Mon Sep 17 00:00:00 2001
From: Farzon Lotfi 
Date: Tue, 30 Jul 2024 19:53:07 -0400
Subject: [PATCH] [x86][Windows] Fix chromium build break Windows does not
 support float C89 math functions like: - acosf - asinf - atanf - coshf -
 sinhf - tanhf These 6 libfuncs need to be type promoted.

This PR fixes the bug introduced by 
https://github.com/llvm/llvm-project/pull/98949
---
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  10 +-
 .../CodeGen/X86/fp-strict-libcalls-msvc32.ll  | 107 ++
 2 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 45989bcd07d37e..10f269f8037784 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2475,8 +2475,12 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
   (Subtarget.isTargetWindowsMSVC() || Subtarget.isTargetWindowsItanium()))
 // clang-format off
for (ISD::NodeType Op :
- {ISD::FCEIL,  ISD::STRICT_FCEIL,
+ {ISD::FACOS,  ISD::STRICT_FACOS,
+  ISD::FASIN,  ISD::STRICT_FASIN,
+  ISD::FATAN,  ISD::STRICT_FATAN,
+  ISD::FCEIL,  ISD::STRICT_FCEIL,
   ISD::FCOS,   ISD::STRICT_FCOS,
+  ISD::FCOSH,  ISD::STRICT_FCOSH,
   ISD::FEXP,   ISD::STRICT_FEXP,
   ISD::FFLOOR, ISD::STRICT_FFLOOR,
   ISD::FREM,   ISD::STRICT_FREM,
@@ -2484,7 +2488,9 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
   ISD::FLOG10, ISD::STRICT_FLOG10,
   ISD::FPOW,   ISD::STRICT_FPOW,
   ISD::FSIN,   ISD::STRICT_FSIN,
-  ISD::FTAN,   ISD::STRICT_FTAN})
+  ISD::FSINH,  ISD::STRICT_FSINH,
+  ISD::FTAN,   ISD::STRICT_FTAN,
+  ISD::FTANH,  ISD::STRICT_FTANH})
   if (isOperationExpand(Op, MVT::f32))
 setOperationAction(Op, MVT::f32, Promote);
   // clang-format on
diff --git a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll 
b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
index cfec52c0e68863..5d4e86afc8aceb 100644
--- a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
@@ -177,6 +177,107 @@ define float @tan(float %x) #0 {
   ret float %result
 }
 
+define float @acos(float %x) #0 {
+; CHECK-LABEL: acos:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _acos
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.acos.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @asin(float %x) #0 {
+; CHECK-LABEL: asin:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _asin
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.asin.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @atan(float %x) #0 {
+; CHECK-LABEL: atan:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _atan
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.atan.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @cosh(float %x) #0 {
+; CHECK-LABEL: cosh:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _cosh
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.cosh.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @sinh(float %x) #0 {
+; CHECK-LABEL: sinh:
+; CHECK:   # %bb.0:
+; CHECK

[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-04 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: Farzon Lotfi (farzonl)


Changes

Windows does not support float C89 math functions like:
- acosf
- asinf
- atanf
- coshf
- sinhf
- tanhf These 6 libfuncs need to be type promoted.

This PR fixes the bug introduced by 
https://github.com/llvm/llvm-project/pull/98949

---
Full diff: https://github.com/llvm/llvm-project/pull/111218.diff


2 Files Affected:

- (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+8-2) 
- (modified) llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll (+107) 


``diff
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 45989bcd07d37e..10f269f8037784 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2475,8 +2475,12 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
   (Subtarget.isTargetWindowsMSVC() || Subtarget.isTargetWindowsItanium()))
 // clang-format off
for (ISD::NodeType Op :
- {ISD::FCEIL,  ISD::STRICT_FCEIL,
+ {ISD::FACOS,  ISD::STRICT_FACOS,
+  ISD::FASIN,  ISD::STRICT_FASIN,
+  ISD::FATAN,  ISD::STRICT_FATAN,
+  ISD::FCEIL,  ISD::STRICT_FCEIL,
   ISD::FCOS,   ISD::STRICT_FCOS,
+  ISD::FCOSH,  ISD::STRICT_FCOSH,
   ISD::FEXP,   ISD::STRICT_FEXP,
   ISD::FFLOOR, ISD::STRICT_FFLOOR,
   ISD::FREM,   ISD::STRICT_FREM,
@@ -2484,7 +2488,9 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
   ISD::FLOG10, ISD::STRICT_FLOG10,
   ISD::FPOW,   ISD::STRICT_FPOW,
   ISD::FSIN,   ISD::STRICT_FSIN,
-  ISD::FTAN,   ISD::STRICT_FTAN})
+  ISD::FSINH,  ISD::STRICT_FSINH,
+  ISD::FTAN,   ISD::STRICT_FTAN,
+  ISD::FTANH,  ISD::STRICT_FTANH})
   if (isOperationExpand(Op, MVT::f32))
 setOperationAction(Op, MVT::f32, Promote);
   // clang-format on
diff --git a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll 
b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
index cfec52c0e68863..5d4e86afc8aceb 100644
--- a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
@@ -177,6 +177,107 @@ define float @tan(float %x) #0 {
   ret float %result
 }
 
+define float @acos(float %x) #0 {
+; CHECK-LABEL: acos:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _acos
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.acos.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @asin(float %x) #0 {
+; CHECK-LABEL: asin:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _asin
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.asin.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @atan(float %x) #0 {
+; CHECK-LABEL: atan:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _atan
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.atan.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @cosh(float %x) #0 {
+; CHECK-LABEL: cosh:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _cosh
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constrained.cosh.f32(float %x, 
metadata !"round.dynamic", metadata !"fpexcept.strict") #0
+  ret float %result
+}
+
+define float @sinh(float %x) #0 {
+; CHECK-LABEL: sinh:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:subl $12, %esp
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:fstpl (%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:calll _sinh
+; CHECK-NEXT:fstps {{[0-9]+}}(%esp)
+; CHECK-NEXT:flds {{[0-9]+}}(%esp)
+; CHECK-NEXT:wait
+; CHECK-NEXT:addl $12, %esp
+; CHECK-NEXT:retl
+  %result = call float @llvm.experimental.constr

[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-04 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic approved this pull request.

LGTM.  Fixes a regression, and should be low-risk (the fix only affects the 
exact functions on the exact target with the issue).

https://github.com/llvm/llvm-project/pull/111218
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-04 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic edited 
https://github.com/llvm/llvm-project/pull/111218
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-04 Thread Eli Friedman via llvm-branch-commits


@@ -177,6 +177,107 @@ define float @tan(float %x) #0 {
   ret float %result
 }
 
+define float @acos(float %x) #0 {

efriedma-quic wrote:

Okay that's fine.

https://github.com/llvm/llvm-project/pull/111218
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-04 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic milestoned 
https://github.com/llvm/llvm-project/pull/111218
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits