[llvm-branch-commits] [mlir] [mlir][Interfaces][NFC] `ValueBoundsConstraintSet`: Delete dead code (PR #86098)
https://github.com/MacDue approved this pull request. LGTM :+1: https://github.com/llvm/llvm-project/pull/86098 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][Interfaces][NFC] `ValueBoundsConstraintSet`: Pass stop condition in the constructor (PR #86099)
@@ -316,6 +317,9 @@ class ValueBoundsConstraintSet { /// Builder for constructing affine expressions. Builder builder; + + /// The current stop condition function. + StopConditionFn stopCondition = nullptr; MacDue wrote: Just wondering if this should be a `std::function` instead? `function_ref` begin non-owning could lead to some surprises. E.g. by doing `ValueBoundsConstraintSet cstr(..., /*stopCondition=*/[&]{ ... })`. https://github.com/llvm/llvm-project/pull/86099 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][Interfaces][NFC] `ValueBoundsConstraintSet`: Pass stop condition in the constructor (PR #86099)
https://github.com/MacDue approved this pull request. https://github.com/llvm/llvm-project/pull/86099 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][Interfaces] `ValueBoundsOpInterface`: Fix typo (PR #87976)
https://github.com/MacDue approved this pull request. https://github.com/llvm/llvm-project/pull/87976 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -1416,14 +1466,14 @@ void VPlanTransforms::addActiveLaneMask( auto *FoundWidenCanonicalIVUser = find_if(Plan.getCanonicalIV()->users(), [](VPUser *U) { return isa(U); }); - assert(FoundWidenCanonicalIVUser && + assert(FoundWidenCanonicalIVUser && *FoundWidenCanonicalIVUser && MacDue wrote: This looks a little odd. Doesn't `find_if` return an iterator? ```suggestion auto IVUsers = Plan.getCanonicalIV()->users(); /// ... assert(FoundWidenCanonicalIVUser != IVUsers.end() && "Must have widened canonical IV when tail folding!"); ``` https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -77,9 +77,13 @@ struct VPlanTransforms { /// creation) and instead it is handled using active-lane-mask. \p /// DataAndControlFlowWithoutRuntimeCheck implies \p /// UseActiveLaneMaskForControlFlow. + /// RTChecks refers to the pointer pairs that need aliasing elements to be + /// masked off each loop iteration. MacDue wrote: No docs for PSE? https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -3073,6 +3075,56 @@ struct VPWidenStoreEVLRecipe final : public VPWidenMemoryRecipe { } }; +// Given a pointer A that is being stored to, and pointer B that is being +// read from, both with unknown lengths, create a mask that disables +// elements which could overlap across a loop iteration. For example, if A +// is X and B is X + 2 with VF being 4, only the final two elements of the +// loaded vector can be stored since they don't overlap with the stored +// vector. %b.vec = load %b ; = [s, t, u, v] +// [...] +// store %a, %b.vec ; only u and v can be stored as their addresses don't +// overlap with %a + (VF - 1) MacDue wrote: This is specifically RAW? Of something like: ``` store A[x] load A[x + 2] ``` Perhaps I'm muddled on what "final two elements" means, but isn't the first two elements store that is valid (so it won't overwrite the elements for the load)? https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -427,6 +428,29 @@ Value *VPInstruction::generate(VPTransformState &State) { {PredTy, ScalarTC->getType()}, {VIVElem0, ScalarTC}, nullptr, Name); } + // Count the number of bits set in each lane and reduce the result to a scalar + case VPInstruction::PopCount: { +Value *Op = State.get(getOperand(0)); +auto *VT = Op->getType(); MacDue wrote: nit: Spell out type if it's not present on the RHS. ```suggestion Type *VT = Op->getType(); ``` https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -1300,14 +1301,38 @@ static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch( cast(CanonicalIVPHI->getBackedgeValue()); // TODO: Check if dropping the flags is needed if // !DataAndControlFlowWithoutRuntimeCheck. + VPValue *IncVal = CanonicalIVIncrement->getOperand(1); + assert(IncVal != CanonicalIVPHI && "Unexpected operand order"); + CanonicalIVIncrement->dropPoisonGeneratingFlags(); DebugLoc DL = CanonicalIVIncrement->getDebugLoc(); + // We can't use StartV directly in the ActiveLaneMask VPInstruction, since // we have to take unrolling into account. Each part needs to start at // Part * VF auto *VecPreheader = Plan.getVectorPreheader(); VPBuilder Builder(VecPreheader); + // Create an alias mask for each possibly-aliasing pointer pair. If there + // are multiple they are combined together with ANDs. + VPValue *AliasMask = nullptr; + + for (auto C : RTChecks) { +// FIXME: How to pass this info back? +//HasAliasMask = true; MacDue wrote: This FIXME is a little unclear. Does it mean `HasAliasMask` should be set here but it's not? https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -195,6 +195,13 @@ enum class TailFoldingStyle { DataWithEVL, }; +enum class RTCheckStyle { + /// Branch to scalar loop if checks fails at runtime. + ScalarFallback, + /// Form a mask based on elements which won't be a WAR or RAW hazard MacDue wrote: ultra nit: One of these comments ends with a full-stop and the other does not. ```suggestion /// Branch to scalar loop if checks fails at runtime. ScalarFallback, /// Form a mask based on elements which won't be a WAR or RAW hazard. ``` https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
https://github.com/MacDue edited https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -1331,14 +1356,37 @@ static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch( "index.part.next"); // Create the active lane mask instruction in the VPlan preheader. - auto *EntryALM = + VPValue *Mask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {EntryIncrement, TC}, DL, "active.lane.mask.entry"); // Now create the ActiveLaneMaskPhi recipe in the main loop using the // preheader ActiveLaneMask instruction. - auto *LaneMaskPhi = new VPActiveLaneMaskPHIRecipe(EntryALM, DebugLoc()); + auto *LaneMaskPhi = new VPActiveLaneMaskPHIRecipe(Mask, DebugLoc()); LaneMaskPhi->insertAfter(CanonicalIVPHI); + VPValue *LaneMask = LaneMaskPhi; + if (AliasMask) { +// Increment phi by correct amount. +Builder.setInsertPoint(CanonicalIVIncrement); + +VPValue *IncrementBy = Builder.createNaryOp(VPInstruction::PopCount, +{AliasMask}, DL, "popcount"); +Type *IVType = CanonicalIVPHI->getScalarType(); + +if (IVType->getScalarSizeInBits() < 64) { + auto *Cast = + new VPScalarCastRecipe(Instruction::Trunc, IncrementBy, IVType); + Cast->insertAfter(IncrementBy->getDefiningRecipe()); + IncrementBy = Cast; +} +CanonicalIVIncrement->setOperand(1, IncrementBy); + +// And the alias mask so the iteration only processes non-aliasing lanes +Builder.setInsertPoint(CanonicalIVPHI->getParent(), + CanonicalIVPHI->getParent()->getFirstNonPhi()); +LaneMask = Builder.createNaryOp(Instruction::BinaryOps::And, +{LaneMaskPhi, AliasMask}, DL); MacDue wrote: Do we know this AND won't be all-false? https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -3235,6 +3263,36 @@ void VPWidenPointerInductionRecipe::print(raw_ostream &O, const Twine &Indent, } #endif +void VPAliasLaneMaskRecipe::execute(VPTransformState &State) { + IRBuilderBase Builder = State.Builder; + Value *SinkValue = State.get(getSinkValue(), true); + Value *SourceValue = State.get(getSourceValue(), true); + + auto *Type = SinkValue->getType(); + Value *AliasMask = Builder.CreateIntrinsic( + Intrinsic::experimental_get_alias_lane_mask, + {VectorType::get(Builder.getInt1Ty(), State.VF), Type, + Builder.getInt64Ty()}, + {SourceValue, SinkValue, Builder.getInt64(getAccessedElementSize()), + Builder.getInt1(WriteAfterRead)}, + nullptr, "alias.lane.mask"); + State.set(this, AliasMask, /*IsScalar=*/false); +} + +#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) +void VPAliasLaneMaskRecipe::print(raw_ostream &O, const Twine &Indent, + VPSlotTracker &SlotTracker) const { + O << Indent << "EMIT "; + getVPSingleValue()->printAsOperand(O, SlotTracker); + O << " = alias lane mask "; MacDue wrote: nit: These seem more commonly printed in all caps with hyphens. ```suggestion O << " = ALIAS-LANE-MASK "; ``` https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -952,7 +952,6 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV, IRBuilder<> Builder(State.CFG.PrevBB->getTerminator()); // FIXME: Model VF * UF computation completely in VPlan. - assert(VFxUF.getNumUsers() && "VFxUF expected to always have users"); MacDue wrote: How does removing this assert relate to these changes? https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -3235,6 +3263,36 @@ void VPWidenPointerInductionRecipe::print(raw_ostream &O, const Twine &Indent, } #endif +void VPAliasLaneMaskRecipe::execute(VPTransformState &State) { + IRBuilderBase Builder = State.Builder; + Value *SinkValue = State.get(getSinkValue(), true); + Value *SourceValue = State.get(getSourceValue(), true); + + auto *Type = SinkValue->getType(); MacDue wrote: nit: ```suggestion Type *PtrType = SinkValue->getType(); ``` https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
https://github.com/MacDue commented: A bunch of little comments (mostly just nitpicks from a pass over the PR) :slightly_smiling_face: https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -418,7 +418,13 @@ class LoopVectorizationPlanner { /// Build VPlans for the specified \p UserVF and \p UserIC if they are /// non-zero or all applicable candidate VFs otherwise. If vectorization and /// interleaving should be avoided up-front, no plans are generated. - void plan(ElementCount UserVF, unsigned UserIC); + /// RTChecks is a list of pointer pairs that should be checked for aliasing, + /// setting HasAliasMask to true in the case that an alias mask is generated MacDue wrote: Outdated comment? Is this `DiffChecks` now? https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] [AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (PR #126503)
https://github.com/MacDue edited https://github.com/llvm/llvm-project/pull/126503 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] [AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (PR #126503)
https://github.com/MacDue edited https://github.com/llvm/llvm-project/pull/126503 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] [AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (PR #126503)
https://github.com/MacDue milestoned https://github.com/llvm/llvm-project/pull/126503 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] [AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (PR #126503)
https://github.com/MacDue edited https://github.com/llvm/llvm-project/pull/126503 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -253,38 +253,38 @@ define i64 @not_dotp_i8_to_i64_has_neon_dotprod(ptr readonly %a, ptr readonly %b ; CHECK-MAXBW-SAME: ptr readonly [[A:%.*]], ptr readonly [[B:%.*]]) #[[ATTR1:[0-9]+]] { ; CHECK-MAXBW-NEXT: entry: ; CHECK-MAXBW-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-MAXBW-NEXT:[[TMP1:%.*]] = mul i64 [[TMP0]], 8 +; CHECK-MAXBW-NEXT:[[TMP1:%.*]] = mul i64 [[TMP0]], 16 ; CHECK-MAXBW-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] ; CHECK-MAXBW: vector.ph: ; CHECK-MAXBW-NEXT:[[TMP2:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-MAXBW-NEXT:[[TMP3:%.*]] = mul i64 [[TMP2]], 8 +; CHECK-MAXBW-NEXT:[[TMP3:%.*]] = mul i64 [[TMP2]], 16 ; CHECK-MAXBW-NEXT:[[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]] ; CHECK-MAXBW-NEXT:[[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]] ; CHECK-MAXBW-NEXT:[[TMP4:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-MAXBW-NEXT:[[TMP5:%.*]] = mul i64 [[TMP4]], 8 +; CHECK-MAXBW-NEXT:[[TMP5:%.*]] = mul i64 [[TMP4]], 16 ; CHECK-MAXBW-NEXT:[[TMP6:%.*]] = getelementptr i8, ptr [[A]], i64 [[N_VEC]] ; CHECK-MAXBW-NEXT:[[TMP7:%.*]] = getelementptr i8, ptr [[B]], i64 [[N_VEC]] ; CHECK-MAXBW-NEXT:br label [[VECTOR_BODY:%.*]] ; CHECK-MAXBW: vector.body: ; CHECK-MAXBW-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-MAXBW-NEXT:[[VEC_PHI:%.*]] = phi [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP15:%.*]], [[VECTOR_BODY]] ] +; CHECK-MAXBW-NEXT:[[VEC_PHI:%.*]] = phi [ zeroinitializer, [[VECTOR_PH]] ], [ [[PARTIAL_REDUCE:%.*]], [[VECTOR_BODY]] ] ; CHECK-MAXBW-NEXT:[[TMP8:%.*]] = add i64 [[INDEX]], 0 ; CHECK-MAXBW-NEXT:[[NEXT_GEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP8]] ; CHECK-MAXBW-NEXT:[[TMP9:%.*]] = add i64 [[INDEX]], 0 ; CHECK-MAXBW-NEXT:[[NEXT_GEP1:%.*]] = getelementptr i8, ptr [[B]], i64 [[TMP9]] ; CHECK-MAXBW-NEXT:[[TMP10:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i32 0 -; CHECK-MAXBW-NEXT:[[WIDE_LOAD:%.*]] = load , ptr [[TMP10]], align 1 -; CHECK-MAXBW-NEXT:[[TMP11:%.*]] = zext [[WIDE_LOAD]] to +; CHECK-MAXBW-NEXT:[[WIDE_LOAD:%.*]] = load , ptr [[TMP10]], align 1 ; CHECK-MAXBW-NEXT:[[TMP12:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i32 0 -; CHECK-MAXBW-NEXT:[[WIDE_LOAD2:%.*]] = load , ptr [[TMP12]], align 1 -; CHECK-MAXBW-NEXT:[[TMP13:%.*]] = zext [[WIDE_LOAD2]] to -; CHECK-MAXBW-NEXT:[[TMP14:%.*]] = mul nuw nsw [[TMP13]], [[TMP11]] -; CHECK-MAXBW-NEXT:[[TMP15]] = add [[TMP14]], [[VEC_PHI]] +; CHECK-MAXBW-NEXT:[[WIDE_LOAD2:%.*]] = load , ptr [[TMP12]], align 1 +; CHECK-MAXBW-NEXT:[[TMP15:%.*]] = zext [[WIDE_LOAD2]] to +; CHECK-MAXBW-NEXT:[[TMP13:%.*]] = zext [[WIDE_LOAD]] to +; CHECK-MAXBW-NEXT:[[TMP14:%.*]] = mul nuw nsw [[TMP15]], [[TMP13]] +; CHECK-MAXBW-NEXT:[[PARTIAL_REDUCE]] = call @llvm.experimental.vector.partial.reduce.add.nxv2i64.nxv16i64( [[VEC_PHI]], [[TMP14]]) MacDue wrote: This test is called "not_dotp" but now looks like it's dotp :slightly_smiling_face: IIRC this won't map directly a dot product instruction (as `nxv16i64` to `nxv2i64` is not supported at the moment). https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2376,6 +2327,59 @@ class VPReductionRecipe : public VPRecipeWithIRFlags { } }; +/// A recipe for forming partial reductions. In the loop, an accumulator and +/// vector operand are added together and passed to the next iteration as the +/// next accumulator. After the loop body, the accumulator is reduced to a +/// scalar value. +class VPPartialReductionRecipe : public VPReductionRecipe { MacDue wrote: Should the `classof` for `VPReductionRecipe` now include `VPPartialReductionRecipe`? https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
https://github.com/MacDue edited https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)
https://github.com/MacDue created https://github.com/llvm/llvm-project/pull/137683 This cherry-picks https://github.com/llvm/llvm-project/pull/132722 and https://github.com/llvm/llvm-project/pull/136726 (the latter is based on the former). These patches are needed to prevent invalid codegen as attempting to store ZT0 without ZA enabled results in a SIGILL. >From c2e81b014aebc262b4db59eb7fbdde2b1376a39a Mon Sep 17 00:00:00 2001 From: Benjamin Maxwell Date: Tue, 25 Mar 2025 10:09:25 + Subject: [PATCH 1/2] [AArch64][SME2] Don't preserve ZT0 around SME ABI routines (#132722) This caused ZT0 to be preserved around `__arm_tpidr2_save` in functions with "aarch64_new_zt0". The block in which `__arm_tpidr2_save` is called is added by the SMEABIPass and may be reachable in cases where ZA has not been enabled* (so using `str zt0` is invalid). * (when za_save_buffer is null and num_za_save_slices is zero) --- .../AArch64/Utils/AArch64SMEAttributes.h | 3 +- .../AArch64/sme-disable-gisel-fisel.ll| 9 +-- llvm/test/CodeGen/AArch64/sme-zt0-state.ll| 61 +-- 3 files changed, 46 insertions(+), 27 deletions(-) diff --git a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h index fb093da70c46b..a3ebf764a6e0c 100644 --- a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h +++ b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h @@ -133,7 +133,8 @@ class SMEAttrs { bool hasZT0State() const { return isNewZT0() || sharesZT0(); } bool requiresPreservingZT0(const SMEAttrs &Callee) const { return hasZT0State() && !Callee.sharesZT0() && - !Callee.hasAgnosticZAInterface(); + !Callee.hasAgnosticZAInterface() && + !(Callee.Bitmask & SME_ABI_Routine); } bool requiresDisablingZABeforeCall(const SMEAttrs &Callee) const { return hasZT0State() && !hasZAState() && Callee.hasPrivateZAInterface() && diff --git a/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll b/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll index 33d08beae2ca7..4a52bf27a7591 100644 --- a/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll +++ b/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll @@ -475,16 +475,12 @@ declare double @zt0_shared_callee(double) "aarch64_inout_zt0" define double @zt0_new_caller_to_zt0_shared_callee(double %x) nounwind noinline optnone "aarch64_new_zt0" { ; CHECK-COMMON-LABEL: zt0_new_caller_to_zt0_shared_callee: ; CHECK-COMMON: // %bb.0: // %prelude -; CHECK-COMMON-NEXT:sub sp, sp, #80 -; CHECK-COMMON-NEXT:str x30, [sp, #64] // 8-byte Folded Spill +; CHECK-COMMON-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill ; CHECK-COMMON-NEXT:mrs x8, TPIDR2_EL0 ; CHECK-COMMON-NEXT:cbz x8, .LBB13_2 ; CHECK-COMMON-NEXT:b .LBB13_1 ; CHECK-COMMON-NEXT: .LBB13_1: // %save.za -; CHECK-COMMON-NEXT:mov x8, sp -; CHECK-COMMON-NEXT:str zt0, [x8] ; CHECK-COMMON-NEXT:bl __arm_tpidr2_save -; CHECK-COMMON-NEXT:ldr zt0, [x8] ; CHECK-COMMON-NEXT:msr TPIDR2_EL0, xzr ; CHECK-COMMON-NEXT:b .LBB13_2 ; CHECK-COMMON-NEXT: .LBB13_2: // %entry @@ -495,8 +491,7 @@ define double @zt0_new_caller_to_zt0_shared_callee(double %x) nounwind noinline ; CHECK-COMMON-NEXT:fmov d1, x8 ; CHECK-COMMON-NEXT:fadd d0, d0, d1 ; CHECK-COMMON-NEXT:smstop za -; CHECK-COMMON-NEXT:ldr x30, [sp, #64] // 8-byte Folded Reload -; CHECK-COMMON-NEXT:add sp, sp, #80 +; CHECK-COMMON-NEXT:ldr x30, [sp], #16 // 8-byte Folded Reload ; CHECK-COMMON-NEXT:ret entry: %call = call double @zt0_shared_callee(double %x) diff --git a/llvm/test/CodeGen/AArch64/sme-zt0-state.ll b/llvm/test/CodeGen/AArch64/sme-zt0-state.ll index 312537630e77a..500fff4eb20db 100644 --- a/llvm/test/CodeGen/AArch64/sme-zt0-state.ll +++ b/llvm/test/CodeGen/AArch64/sme-zt0-state.ll @@ -112,7 +112,7 @@ define void @za_zt0_shared_caller_za_zt0_shared_callee() "aarch64_inout_za" "aar ret void; } -; New-ZA Callee +; New-ZT0 Callee ; Expect spill & fill of ZT0 around call ; Expect smstop/smstart za around call @@ -134,6 +134,39 @@ define void @zt0_in_caller_zt0_new_callee() "aarch64_in_zt0" nounwind { ret void; } +; New-ZT0 Callee + +; Expect commit of lazy-save if ZA is dormant +; Expect smstart ZA & clear ZT0 +; Expect spill & fill of ZT0 around call +; Before return, expect smstop ZA +define void @zt0_new_caller_zt0_new_callee() "aarch64_new_zt0" nounwind { +; CHECK-LABEL: zt0_new_caller_zt0_new_callee: +; CHECK: // %bb.0: // %prelude +; CHECK-NEXT:sub sp, sp, #80 +; CHECK-NEXT:stp x30, x19, [sp, #64] // 16-byte Folded Spill +; CHECK-NEXT:mrs x8, TPIDR2_EL0 +; CHECK-NEXT:cbz x8, .LBB6_2 +; CHECK-NEXT: // %bb.1: // %save.za +; CHECK-NEXT:bl __arm_tpidr2_save +; CHECK-NEXT:msr TPIDR2_EL0, xzr +; CHECK-NEXT: .LBB6_2: +; CHECK-NEXT:smstart za +; CHECK-NEXT:zero { zt0 } +; CHECK-NEXT:mov x19, sp +; CHECK-NEXT:
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)
https://github.com/MacDue milestoned https://github.com/llvm/llvm-project/pull/137683 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64][SME] Prevent spills of ZT0 when ZA is not enabled (PR #137683)
MacDue wrote: @sdesmalen-arm What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/137683 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) MacDue wrote: If we allow this case make sure to rename the test from "not_dotp" to "dotp". https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) MacDue wrote: It's due to: https://github.com/llvm/llvm-project/pull/136173#discussion_r2053920360 https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [SDAG] Ensure load is included in output chain of sincos expansion (#140525) (PR #140703)
MacDue wrote: Not sure why the bot is asking me (I think it's fine, but I requested the backport). cc @arsenm, @RKSimon https://github.com/llvm/llvm-project/pull/140703 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits