[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2026-03-24 Thread Andrei Elovikov via llvm-branch-commits

eas wrote:

Tiny bit of feedback if there is still interest in it. I think this would 
benefit from a VPlan-dump-based test under `test/LoopVectorize/VPlan`, probably 
extending the printing of the recipe to include information if the store is 
compressed or not.

For negative tests, it would help to add a comment to explain if it's 
fundamentally wrong to vectorize or just a TODO for a future improvement (and 
maybe some of the positive could benefit from extra comments as well). I also 
didn't see a negative test with an "invalid" extra use of a phi in the middle 
of the phi-chain (maybe I just missed).

Another potentially interesting test would be

```
  if (condition) {
a[idx] = ...;
b[idx] = ...;  // multiple use
idx += 1;
  }
```

Github shows "no conflicts with base branch" but that's probably because of the 
stacked PRs.


https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-11 Thread Luke Lau via llvm-branch-commits


@@ -8430,6 +8479,46 @@ VPlanPtr 
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // bring the VPlan to its final state.
   // 
---
 
+  // Adjust the recipes for any monotonic phis.
+  for (VPRecipeBase &R : HeaderVPBB->phis()) {
+auto *MonotonicPhi = dyn_cast(&R);

lukel97 wrote:

I was thinking VPPhi which is any VPInstruction with an opcode of 
Instruction::Phi, which you can create from VPBuilder::createScalarPhi. It 
should let you model everything in VPlan without needing to create a dedicate 
recipe.

I wouldn't worry about trying to derive from VPHeaderPHIRecipe, we use VPPhi 
for the EVL tail folding stuff even though it's placed in the header. Other 
transforms know how to handle VPPhi recipes if that's what your'e wondering.

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-11 Thread Sergey Kachkov via llvm-branch-commits


@@ -8430,6 +8479,46 @@ VPlanPtr 
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // bring the VPlan to its final state.
   // 
---
 
+  // Adjust the recipes for any monotonic phis.
+  for (VPRecipeBase &R : HeaderVPBB->phis()) {
+auto *MonotonicPhi = dyn_cast(&R);

skachkov-sc wrote:

Do you mean VPIRPhi recipe? I think it's possible, but I'm slightly concerned 
that semantically  VPMonotonicPHIRecipe models loop header phi so it should be 
derived from VPHeaderPHIRecipe. But yes, we can distinguish monotonic phis 
using LoopVectorizationLegality

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-11 Thread Sergey Kachkov via llvm-branch-commits

https://github.com/skachkov-sc edited 
https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-11 Thread Luke Lau via llvm-branch-commits


@@ -8430,6 +8479,46 @@ VPlanPtr 
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // bring the VPlan to its final state.
   // 
---
 
+  // Adjust the recipes for any monotonic phis.
+  for (VPRecipeBase &R : HeaderVPBB->phis()) {
+auto *MonotonicPhi = dyn_cast(&R);

lukel97 wrote:

If you use a regular VPInstruction::PHI instead of VPMonotonicPHIRecipe and set 
the underlying value to the monotonic phi, could you avoid the need for an 
extra recipe?

Would you be able to detect the monotonic phis by just calling 
`Legal->getMonotonicPHIs().find(Phi->getUnderlyingValue())`?

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-11 Thread Sergey Kachkov via llvm-branch-commits

https://github.com/skachkov-sc edited 
https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Sergey Kachkov via llvm-branch-commits


@@ -3193,6 +3239,9 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public 
VPRecipeBase,
   /// Whether the consecutive accessed addresses are in reverse order.
   bool Reverse;
 
+  /// Whether the consecutive accessed addresses are compressed with mask 
value.
+  bool Compressed;
+

skachkov-sc wrote:

Changes that are related to the extension of VPWidenMemoryRecipe are splitted 
into separate PR: https://github.com/llvm/llvm-project/pull/166956 (hopefully 
it will be easier to review)

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Sergey Kachkov via llvm-branch-commits

https://github.com/skachkov-sc edited 
https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Sergey Kachkov via llvm-branch-commits

https://github.com/skachkov-sc updated 
https://github.com/llvm/llvm-project/pull/140723

>From 9697cd806947ab6ebd021cb7919acd62cc2e29a0 Mon Sep 17 00:00:00 2001
From: Sergey Kachkov 
Date: Fri, 7 Nov 2025 18:09:56 +0300
Subject: [PATCH 1/3] [VPlan] Implement compressed widening of memory
 instructions

---
 .../llvm/Analysis/TargetTransformInfo.h   |  1 +
 .../Transforms/Vectorize/LoopVectorize.cpp| 24 ++
 llvm/lib/Transforms/Vectorize/VPlan.h | 32 ---
 .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 23 +
 .../Transforms/Vectorize/VPlanTransforms.cpp  | 11 ---
 5 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 0f17312b03827..e8769f5860c77 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1442,6 +1442,7 @@ class TargetTransformInfo {
 Normal,///< The cast is used with a normal load/store.
 Masked,///< The cast is used with a masked load/store.
 GatherScatter, ///< The cast is used with a gather/scatter.
+Compressed,///< The cast is used with an expand load/compress store.
 Interleave,///< The cast is used with an interleaved load/store.
 Reversed,  ///< The cast is used with a reversed load/store.
   };
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 914018591d832..25e8a63eae9cd 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1027,6 +1027,7 @@ class LoopVectorizationCostModel {
 CM_Widen_Reverse, // For consecutive accesses with stride -1.
 CM_Interleave,
 CM_GatherScatter,
+CM_Compressed,
 CM_Scalarize,
 CM_VectorCall,
 CM_IntrinsicCall
@@ -3108,9 +3109,9 @@ void 
LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
 if (IsUniformMemOpUse(I))
   return true;
 
-return (WideningDecision == CM_Widen ||
-WideningDecision == CM_Widen_Reverse ||
-WideningDecision == CM_Interleave);
+return (
+WideningDecision == CM_Widen || WideningDecision == CM_Widen_Reverse ||
+WideningDecision == CM_Interleave || WideningDecision == 
CM_Compressed);
   };
 
   // Returns true if Ptr is the pointer operand of a memory access instruction
@@ -5191,12 +5192,17 @@ InstructionCost 
LoopVectorizationCostModel::getConsecutiveMemOpCost(
 Instruction *I, ElementCount VF, InstWidening Decision) {
   Type *ValTy = getLoadStoreType(I);
   auto *VectorTy = cast(toVectorTy(ValTy, VF));
+  const Align Alignment = getLoadStoreAlignment(I);
   unsigned AS = getLoadStoreAddressSpace(I);
   enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
 
+  if (Decision == CM_Compressed)
+return TTI.getExpandCompressMemoryOpCost(I->getOpcode(), VectorTy,
+ /*VariableMask*/ true, Alignment,
+ CostKind, I);
+
   assert((Decision == CM_Widen || Decision == CM_Widen_Reverse) &&
  "Expected widen decision.");
-  const Align Alignment = getLoadStoreAlignment(I);
   InstructionCost Cost = 0;
   if (Legal->isMaskRequired(I)) {
 Cost += TTI.getMaskedMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,
@@ -6299,6 +6305,8 @@ 
LoopVectorizationCostModel::getInstructionCost(Instruction *I,
   switch (getWideningDecision(I, VF)) {
   case LoopVectorizationCostModel::CM_GatherScatter:
 return TTI::CastContextHint::GatherScatter;
+  case LoopVectorizationCostModel::CM_Compressed:
+return TTI::CastContextHint::Compressed;
   case LoopVectorizationCostModel::CM_Interleave:
 return TTI::CastContextHint::Interleave;
   case LoopVectorizationCostModel::CM_Scalarize:
@@ -7514,8 +7522,9 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, 
ArrayRef Operands,
   LoopVectorizationCostModel::InstWidening Decision =
   CM.getWideningDecision(I, Range.Start);
   bool Reverse = Decision == LoopVectorizationCostModel::CM_Widen_Reverse;
+  bool Compressed = Decision == LoopVectorizationCostModel::CM_Compressed;
   bool Consecutive =
-  Reverse || Decision == LoopVectorizationCostModel::CM_Widen;
+  Reverse || Compressed || Decision == 
LoopVectorizationCostModel::CM_Widen;
 
   VPValue *Ptr = isa(I) ? Operands[0] : Operands[1];
   if (Consecutive) {
@@ -7545,11 +7554,12 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, 
ArrayRef Operands,
   }
   if (LoadInst *Load = dyn_cast(I))
 return new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
- VPIRMetadata(*Load, LVer), I->getDebugLoc());
+ Compressed, VPIRMetadata(*Load, LVer),
+ I->getDebugLoc());
 
   StoreInst *Stor

[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Sergey Kachkov via llvm-branch-commits


@@ -4442,6 +4495,29 @@ void VPReductionPHIRecipe::print(raw_ostream &O, const 
Twine &Indent,
 }
 #endif
 
+void VPMonotonicPHIRecipe::execute(VPTransformState &State) {
+  assert(getParent()->getPlan()->getUF() == 1 && "Expected unroll factor 1.");
+  Value *Start = getStartValue()->getLiveInIRValue();
+  BasicBlock *VectorPH =
+  State.CFG.VPBB2IRBB.at(getParent()->getCFGPredecessor(0));
+  PHINode *MonotonicPHI =
+  State.Builder.CreatePHI(Start->getType(), 2, "monotonic.iv");
+  MonotonicPHI->addIncoming(Start, VectorPH);
+  MonotonicPHI->setDebugLoc(getDebugLoc());
+  State.set(this, MonotonicPHI, /*IsScalar=*/true);
+}

skachkov-sc wrote:

The only rational for new recipe was to simplify "adjusting" of VPlan: we need 
to insert VPInstruction::ComputeMonotonicRecipe at the backedge of such phi 
(that will incement phi value on ctpop(mask)). This looks similar to handling 
of reductions: VPMonotonicPHIRecipe/VPInstruction::ComputeMonotonicResult is 
symmetric to VPReductionPHIRecipe/VPInstruction::ComputeReductionResult. 
Probably VPWidenPHI recipe can be used there, but there are some places in code 
when we want to distinguish "monotonic" header phis ftom the others.

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Sergey Kachkov via llvm-branch-commits


@@ -3193,6 +3239,9 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public 
VPRecipeBase,
   /// Whether the consecutive accessed addresses are in reverse order.
   bool Reverse;
 
+  /// Whether the consecutive accessed addresses are compressed with mask 
value.
+  bool Compressed;
+

skachkov-sc wrote:

There is no intrinsic before loop vectorization here; we have plain load/store 
instruction that is placed under some predicate in the original loop, so it 
become masked in LoopVectorizer. The difference with ordinary masked 
loads/stores is that "compressed" loads/stores read or write the memory 
consecutively (the number of elements == the number of set mask bits), and then 
broadcast the elements in the masked positions

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Sergey Kachkov via llvm-branch-commits

skachkov-sc wrote:

> I think you can probably make this independent of #140721 by first just 
> supporting cases where to compressed store does not alias any of the other 
> memory accesses?

Yes, the changes in LAA are fully independent, we can skip them for now.

> Curious if you already have any runtime performance numbers you could share?

We've benchmarked the following loop pattern:
```
// benchmark() is run 32 times

template
void benchmark(T *dst, const T *src) {
  size_t idx = 0;
  for(size_t i = 0; i < 1024; ++i) {
T cur = src[i];
if (cur != static_cast(0))
  dst[idx++] = cur;
  }
  dst[idx] = static_cast(0);
}
```
On SpacemiT-X60 core (RISC-V CPU with VLEN=256) the results are following:

| Type| cycles (scalar) | cycles (vector) | speedup |
| -|-|--|-|
| int16_t | 189151   | 56795   | 3.33x  |
| int32_t | 205712   | 87196   | 2.36x  |
| int64_t | 205757   | 150115 | 1.37x  |

There were no branch mispredicts for `if (cur != static_cast(0))` branch in 
scalar case here (due to the specifics of data in src array), so I think the 
speedup can be even bigger for more random inputs. We haven't observed any 
significant changes on SPECs though.

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Florian Hahn via llvm-branch-commits


@@ -4442,6 +4495,29 @@ void VPReductionPHIRecipe::print(raw_ostream &O, const 
Twine &Indent,
 }
 #endif
 
+void VPMonotonicPHIRecipe::execute(VPTransformState &State) {
+  assert(getParent()->getPlan()->getUF() == 1 && "Expected unroll factor 1.");
+  Value *Start = getStartValue()->getLiveInIRValue();
+  BasicBlock *VectorPH =
+  State.CFG.VPBB2IRBB.at(getParent()->getCFGPredecessor(0));
+  PHINode *MonotonicPHI =
+  State.Builder.CreatePHI(Start->getType(), 2, "monotonic.iv");
+  MonotonicPHI->addIncoming(Start, VectorPH);
+  MonotonicPHI->setDebugLoc(getDebugLoc());
+  State.set(this, MonotonicPHI, /*IsScalar=*/true);
+}

fhahn wrote:

This looks just like a plain VPWidenPHI, can you use that here or do we need a 
new recipe? If so, please document the rational, probably good to do in the 
description

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn commented:

I think you can probably make this independent of 
https://github.com/llvm/llvm-project/pull/140721 by first just supporting cases 
where to compressed store does not alias any of the other memory accesses?

Curious if you already have any runtime performance numbers you could share?

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Florian Hahn via llvm-branch-commits


@@ -3193,6 +3239,9 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public 
VPRecipeBase,
   /// Whether the consecutive accessed addresses are in reverse order.
   bool Reverse;
 
+  /// Whether the consecutive accessed addresses are compressed with mask 
value.
+  bool Compressed;
+

fhahn wrote:

This corresponds 1-1 to an intrinsic, right? Can we just use 
VPWidenIntrinsicRecipe?

https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions cpp,h -- 
llvm/include/llvm/Analysis/TargetTransformInfo.h 
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h 
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
llvm/lib/Transforms/Vectorize/VPlan.cpp llvm/lib/Transforms/Vectorize/VPlan.h 
llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp 
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp 
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp 
llvm/lib/Transforms/Vectorize/VPlanValue.h 
llvm/unittests/Transforms/Vectorize/VPlanTest.cpp --diff_from_common_commit
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp 
b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 114b2dbe4..0346fc3b9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -4208,7 +4208,7 @@ narrowInterleaveGroupOp(VPValue *V, 
SmallPtrSetImpl &NarrowedOps) {
 auto *LI = cast(LoadGroup->getInterleaveGroup()->getInsertPos());
 auto *L = new VPWidenLoadRecipe(
 *LI, LoadGroup->getAddr(), LoadGroup->getMask(), /*Consecutive=*/true,
-/*Reverse=*/false, /*Compressed*/false, {}, LoadGroup->getDebugLoc());
+/*Reverse=*/false, /*Compressed*/ false, {}, LoadGroup->getDebugLoc());
 L->insertBefore(LoadGroup);
 NarrowedOps.insert(L);
 return L;

``




https://github.com/llvm/llvm-project/pull/140723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

2025-11-07 Thread Sergey Kachkov via llvm-branch-commits

https://github.com/skachkov-sc updated 
https://github.com/llvm/llvm-project/pull/140723

>From 2f9baaf83b414b8d2cad73a4ada7efe800a02809 Mon Sep 17 00:00:00 2001
From: Sergey Kachkov 
Date: Wed, 15 Jan 2025 16:09:16 +0300
Subject: [PATCH 1/2] [LoopVectorize][NFC] Add pre-commit tests

---
 .../LoopVectorize/compress-idioms.ll  | 480 ++
 1 file changed, 480 insertions(+)
 create mode 100644 llvm/test/Transforms/LoopVectorize/compress-idioms.ll

diff --git a/llvm/test/Transforms/LoopVectorize/compress-idioms.ll 
b/llvm/test/Transforms/LoopVectorize/compress-idioms.ll
new file mode 100644
index 0..1390092e40387
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/compress-idioms.ll
@@ -0,0 +1,480 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt < %s -mtriple=riscv64 -mattr=+v -passes=loop-vectorize 
-force-vector-interleave=1 -force-vector-width=4 -S 2>&1 | FileCheck %s
+
+define void @test_store_with_pointer(ptr writeonly %dst, ptr readonly %src, 
i32 %c, i32 %n) {
+; CHECK-LABEL: define void @test_store_with_pointer(
+; CHECK-SAME: ptr writeonly [[DST:%.*]], ptr readonly [[SRC:%.*]], i32 
[[C:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:[[CMP8:%.*]] = icmp sgt i32 [[N]], 0
+; CHECK-NEXT:br i1 [[CMP8]], label %[[FOR_BODY_PREHEADER:.*]], label 
%[[FOR_COND_CLEANUP:.*]]
+; CHECK:   [[FOR_BODY_PREHEADER]]:
+; CHECK-NEXT:[[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; CHECK-NEXT:br label %[[FOR_BODY:.*]]
+; CHECK:   [[FOR_COND_CLEANUP_LOOPEXIT:.*]]:
+; CHECK-NEXT:br label %[[FOR_COND_CLEANUP]]
+; CHECK:   [[FOR_COND_CLEANUP]]:
+; CHECK-NEXT:ret void
+; CHECK:   [[FOR_BODY]]:
+; CHECK-NEXT:[[INDVARS_IV:%.*]] = phi i64 [ 0, %[[FOR_BODY_PREHEADER]] ], 
[ [[INDVARS_IV_NEXT:%.*]], %[[FOR_INC:.*]] ]
+; CHECK-NEXT:[[DST_ADDR_09:%.*]] = phi ptr [ [[DST]], 
%[[FOR_BODY_PREHEADER]] ], [ [[DST_ADDR_1:%.*]], %[[FOR_INC]] ]
+; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[SRC]], 
i64 [[INDVARS_IV]]
+; CHECK-NEXT:[[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:[[CMP1:%.*]] = icmp slt i32 [[TMP0]], [[C]]
+; CHECK-NEXT:br i1 [[CMP1]], label %[[IF_THEN:.*]], label %[[FOR_INC]]
+; CHECK:   [[IF_THEN]]:
+; CHECK-NEXT:[[INCDEC_PTR:%.*]] = getelementptr inbounds i8, ptr 
[[DST_ADDR_09]], i64 4
+; CHECK-NEXT:store i32 [[TMP0]], ptr [[DST_ADDR_09]], align 4
+; CHECK-NEXT:br label %[[FOR_INC]]
+; CHECK:   [[FOR_INC]]:
+; CHECK-NEXT:[[DST_ADDR_1]] = phi ptr [ [[INCDEC_PTR]], %[[IF_THEN]] ], [ 
[[DST_ADDR_09]], %[[FOR_BODY]] ]
+; CHECK-NEXT:[[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:[[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 
[[WIDE_TRIP_COUNT]]
+; CHECK-NEXT:br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], 
label %[[FOR_BODY]]
+;
+entry:
+  %cmp8 = icmp sgt i32 %n, 0
+  br i1 %cmp8, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+  %wide.trip.count = zext nneg i32 %n to i64
+  br label %for.body
+
+for.cond.cleanup.loopexit:
+  br label %for.cond.cleanup
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, 
%for.inc ]
+  %dst.addr.09 = phi ptr [ %dst, %for.body.preheader ], [ %dst.addr.1, 
%for.inc ]
+  %arrayidx = getelementptr inbounds i32, ptr %src, i64 %indvars.iv
+  %0 = load i32, ptr %arrayidx, align 4
+  %cmp1 = icmp slt i32 %0, %c
+  br i1 %cmp1, label %if.then, label %for.inc
+
+if.then:
+  %incdec.ptr = getelementptr inbounds i8, ptr %dst.addr.09, i64 4
+  store i32 %0, ptr %dst.addr.09, align 4
+  br label %for.inc
+
+for.inc:
+  %dst.addr.1 = phi ptr [ %incdec.ptr, %if.then ], [ %dst.addr.09, %for.body ]
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
+}
+
+define void @test_store_with_index(ptr writeonly %dst, ptr readonly %src, i32 
%c, i32 %n) {
+; CHECK-LABEL: define void @test_store_with_index(
+; CHECK-SAME: ptr writeonly [[DST:%.*]], ptr readonly [[SRC:%.*]], i32 
[[C:%.*]], i32 [[N:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:[[CMP11:%.*]] = icmp sgt i32 [[N]], 0
+; CHECK-NEXT:br i1 [[CMP11]], label %[[FOR_BODY_PREHEADER:.*]], label 
%[[FOR_COND_CLEANUP:.*]]
+; CHECK:   [[FOR_BODY_PREHEADER]]:
+; CHECK-NEXT:[[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; CHECK-NEXT:br label %[[FOR_BODY:.*]]
+; CHECK:   [[FOR_COND_CLEANUP_LOOPEXIT:.*]]:
+; CHECK-NEXT:br label %[[FOR_COND_CLEANUP]]
+; CHECK:   [[FOR_COND_CLEANUP]]:
+; CHECK-NEXT:ret void
+; CHECK:   [[FOR_BODY]]:
+; CHECK-NEXT:[[INDVARS_IV:%.*]] = phi i64 [ 0, %[[FOR_BODY_PREHEADER]] ], 
[ [[INDVARS_IV_NEXT:%.*]], %[[FOR_INC:.*]] ]