[llvm-branch-commits] [llvm] release/19.x: [LSR] Fix matching vscale immediates (#100080) (PR #100359)

2024-07-24 Thread Paul Walker via llvm-branch-commits

https://github.com/paulwalker-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/100359
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AArch64] Avoid inlining if ZT0 needs preserving. (#101343) (PR #101932)

2024-08-05 Thread Paul Walker via llvm-branch-commits

https://github.com/paulwalker-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/101932
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-21 Thread Paul Walker via llvm-branch-commits


@@ -869,6 +870,18 @@ TargetTransformInfo::getOperandInfo(const Value *V) {
   return {OpInfo, OpProps};
 }
 
+InstructionCost TargetTransformInfo::getVecLibCallCost(
+const int OpCode, const TargetLibraryInfo *TLI, VectorType *VecTy,
+TTI::TargetCostKind CostKind) {
+  Type *ScalarTy = VecTy->getScalarType();
+  LibFunc Func;
+  if (TLI->getLibFunc(OpCode, ScalarTy, Func) &&
+  TLI->isFunctionVectorizable(TLI->getName(Func), 
VecTy->getElementCount()))

paulwalker-arm wrote:

This seems to be surreptitiously adding another mechanism to check for the 
presence of a vector math routine. Under what circumstance do you need to check 
for the cost of something that might not exist? I would expect TLI to be 
queried directly as part of a transformation and once they've concluded a 
vector math call exists then they'd simply query the cost of the call.

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-21 Thread Paul Walker via llvm-branch-commits


@@ -869,6 +870,18 @@ TargetTransformInfo::getOperandInfo(const Value *V) {
   return {OpInfo, OpProps};
 }
 
+InstructionCost TargetTransformInfo::getVecLibCallCost(
+const int OpCode, const TargetLibraryInfo *TLI, VectorType *VecTy,
+TTI::TargetCostKind CostKind) {
+  Type *ScalarTy = VecTy->getScalarType();
+  LibFunc Func;
+  if (TLI->getLibFunc(OpCode, ScalarTy, Func) &&
+  TLI->isFunctionVectorizable(TLI->getName(Func), 
VecTy->getElementCount()))
+return getCallInstrCost(nullptr, VecTy, {ScalarTy, ScalarTy}, CostKind);

paulwalker-arm wrote:

This is hardwiring two scalar parameters when costing the call?

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits


@@ -869,6 +870,18 @@ TargetTransformInfo::getOperandInfo(const Value *V) {
   return {OpInfo, OpProps};
 }
 
+InstructionCost TargetTransformInfo::getVecLibCallCost(
+const int OpCode, const TargetLibraryInfo *TLI, VectorType *VecTy,
+TTI::TargetCostKind CostKind) {
+  Type *ScalarTy = VecTy->getScalarType();
+  LibFunc Func;
+  if (TLI->getLibFunc(OpCode, ScalarTy, Func) &&
+  TLI->isFunctionVectorizable(TLI->getName(Func), 
VecTy->getElementCount()))

paulwalker-arm wrote:

TTI should be costing known entities and not trying to second guess what a 
transformation is doing.  Whilst TLI provides a route for target specific 
queries relating to the availability of vector math routines, that doesn’t mean 
those mapping will be used.  The only source of truth is the transformation 
pass itself and thus it needs to ask the TTI the correct question based on this 
source of truth (i.e. whether it intends to use TLI mappings[1]). Both TTI and 
TLI abstract away the target specific queries relating to their design so I 
don’t understand what is the target specific “magic” you are worried about.

[1] This is very important because if the vector function is not used then a 
scalable vector FREM must have an invalid cost because there is no code 
generation support for it.

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits


@@ -869,6 +870,18 @@ TargetTransformInfo::getOperandInfo(const Value *V) {
   return {OpInfo, OpProps};
 }
 
+InstructionCost TargetTransformInfo::getVecLibCallCost(
+const int OpCode, const TargetLibraryInfo *TLI, VectorType *VecTy,
+TTI::TargetCostKind CostKind) {
+  Type *ScalarTy = VecTy->getScalarType();
+  LibFunc Func;
+  if (TLI->getLibFunc(OpCode, ScalarTy, Func) &&
+  TLI->isFunctionVectorizable(TLI->getName(Func), 
VecTy->getElementCount()))

paulwalker-arm wrote:

OK, I see what you're saying.  I guess the problem here is that we're relying 
on a subsequent pass (ReplaceWithVecLib) to be present in the pipeline in order 
to correctly cost the FREM. With neither SLP nor TTI having this knowledge?

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits

https://github.com/paulwalker-arm edited 
https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits


@@ -869,6 +870,18 @@ TargetTransformInfo::getOperandInfo(const Value *V) {
   return {OpInfo, OpProps};
 }
 
+InstructionCost TargetTransformInfo::getVecLibCallCost(
+const int OpCode, const TargetLibraryInfo *TLI, VectorType *VecTy,
+TTI::TargetCostKind CostKind) {
+  Type *ScalarTy = VecTy->getScalarType();
+  LibFunc Func;
+  if (TLI->getLibFunc(OpCode, ScalarTy, Func) &&
+  TLI->isFunctionVectorizable(TLI->getName(Func), 
VecTy->getElementCount()))

paulwalker-arm wrote:

I think it's more subtle than this.  Sure there's some target specific 
behaviour but that is hidden behind TLI as it should be, and I have no problem 
passing TLI into a TTI cost function.  The problem with TTI is that it needs to 
give a different answer based on whether it is called before the 
ReplaceWithVecLib pass or after (or when ReplaceWithVecLib is never run).  I 
don't think TTI has access to such information? Not would I want this to be 
something all cost functions need to worry about.

Personally I'm happy with keeping this nuance outside of TTI but if we really 
want this captured within TTI then I think it's time to break FREM into its own 
cost function (i.e. implement `getFRemInstrCost`.  That way 
getArithmeticInstrCost can work as it does today and the new function can be 
documented to highlight it's assumption that if a TLI is passed in and a vector 
mapping is present then the return value is only valid based on it's assumption 
that vector FREM instructions will be transformed by a following transformation 
pass.  I prefer this to say, adding TLI to getArithmeticInstrCost, because I'd 
rather users of `getFRemInstrCost` to explicitly enter into this contract.


https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits

https://github.com/paulwalker-arm commented:

I've review the patch from both side so most of the comment will be void if you 
opt for the new TTI hook.  That advantage of the TTI hook is that because it is 
specific to FREM you can hardware things like numbers of operands, which should 
streamline the implementation.

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits


@@ -410,6 +410,14 @@ bool maskIsAllOneOrUndef(Value *Mask);
 /// for each lane which may be active.
 APInt possiblyDemandedEltsInMask(Value *Mask);
 
+/// Returns the cost of a call when a target has a vector library function for
+/// the given \p VecTy, otherwise an invalid cost.

paulwalker-arm wrote:

I think this misses a crucial point as to why it exists.  Perhaps something 
like:

"Returns the cost of a vector instruction based on the assumption that it'll be 
later transformed (by ReplaceWithVecLib) into a call to a platform specific 
vector math function.  Instructions unsupported by ReplaceWithVecLib will 
return InstructionCost::getInvalid()."

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits


@@ -1031,6 +1032,22 @@ APInt llvm::possiblyDemandedEltsInMask(Value *Mask) {
   return DemandedElts;
 }
 
+InstructionCost
+llvm::getVecLibCallCost(const Instruction *I, const TargetTransformInfo *TTI,
+const TargetLibraryInfo *TLI, VectorType *VecTy,
+TargetTransformInfo::TargetCostKind CostKind) {
+  SmallVector OpTypes;
+  for (auto &Op : I->operands())
+OpTypes.push_back(Op->getType());

paulwalker-arm wrote:

Should this be `VecTy`? I'm assuming `I` can be a scalar instruction?

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits


@@ -8362,9 +8362,12 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef VectorizedVals,
   unsigned OpIdx = isa(VL0) ? 0 : 1;
   TTI::OperandValueInfo Op1Info = getOperandInfo(E->getOperand(0));
   TTI::OperandValueInfo Op2Info = getOperandInfo(E->getOperand(OpIdx));
-  return TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind, Op1Info,
- Op2Info) +
- CommonCost;
+  InstructionCost VecInstrCost = TTI->getArithmeticInstrCost(
+  ShuffleOrOp, VecTy, CostKind, Op1Info, Op2Info);
+  // Some targets can replace frem with vector library calls.
+  InstructionCost VecCallCost =
+  getVecLibCallCost(VL0, TTI, TLI, VecTy, CostKind);
+  return std::min(VecInstrCost, VecCallCost) + CommonCost;

paulwalker-arm wrote:

This seems a little arbitrary/complex given we know only one of the costs is 
valid at any time.  Perhaps it suggests you really want to implement 
`getVecLibAwareArithmeticInstrCost`?

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits


@@ -1031,6 +1032,22 @@ APInt llvm::possiblyDemandedEltsInMask(Value *Mask) {
   return DemandedElts;
 }
 
+InstructionCost
+llvm::getVecLibCallCost(const Instruction *I, const TargetTransformInfo *TTI,
+const TargetLibraryInfo *TLI, VectorType *VecTy,
+TargetTransformInfo::TargetCostKind CostKind) {
+  SmallVector OpTypes;
+  for (auto &Op : I->operands())
+OpTypes.push_back(Op->getType());
+
+  LibFunc Func;
+  if (TLI->getLibFunc(I->getOpcode(), I->getType(), Func) &&
+  TLI->isFunctionVectorizable(TLI->getName(Func), 
VecTy->getElementCount()))
+return TTI->getCallInstrCost(nullptr, VecTy, OpTypes, CostKind);
+
+  return InstructionCost::getInvalid();

paulwalker-arm wrote:

Given you've pulled this from LoopVectorize so the "hack" is easier to track I 
would rather you also update LoopVectorize to use it and thus verify existing 
behaviour is honoured.

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] SLP can vectorize frem (PR #82488)

2024-02-22 Thread Paul Walker via llvm-branch-commits

paulwalker-arm wrote:

Changing `getArithmeticInstrCost` is just too dangerous.  What if one opcode 
needs TLI for a different reason? all of a sudden all existing callers are 
entered into the contract (FREM is guaranteed to be transformed into a math 
call) without ensuring that's actually the case.

https://github.com/llvm/llvm-project/pull/82488
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 5148501 - [NFC][LLVM][SVE] Refactor predicate register ASM constraint parsing to use std::optional.

2023-11-03 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2023-11-03T12:10:48Z
New Revision: 51485019fb34a48dc6226bfa42d7449091e3f03d

URL: 
https://github.com/llvm/llvm-project/commit/51485019fb34a48dc6226bfa42d7449091e3f03d
DIFF: 
https://github.com/llvm/llvm-project/commit/51485019fb34a48dc6226bfa42d7449091e3f03d.diff

LOG: [NFC][LLVM][SVE] Refactor predicate register ASM constraint parsing to use 
std::optional.

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 291f0c8c5d991c6..94901c2d1a65688 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -10163,14 +10163,15 @@ const char 
*AArch64TargetLowering::LowerXConstraint(EVT ConstraintVT) const {
   return "r";
 }
 
-enum PredicateConstraint { Uph, Upl, Upa, Invalid };
+enum class PredicateConstraint { Uph, Upl, Upa };
 
-static PredicateConstraint parsePredicateConstraint(StringRef Constraint) {
-  return StringSwitch(Constraint)
+static std::optional
+parsePredicateConstraint(StringRef Constraint) {
+  return StringSwitch>(Constraint)
   .Case("Uph", PredicateConstraint::Uph)
   .Case("Upl", PredicateConstraint::Upl)
   .Case("Upa", PredicateConstraint::Upa)
-  .Default(PredicateConstraint::Invalid);
+  .Default(std::nullopt);
 }
 
 static const TargetRegisterClass *
@@ -10180,8 +10181,6 @@ getPredicateRegisterClass(PredicateConstraint 
Constraint, EVT VT) {
 return nullptr;
 
   switch (Constraint) {
-  default:
-return nullptr;
   case PredicateConstraint::Uph:
 return VT == MVT::aarch64svcount ? &AArch64::PNR_p8to15RegClass
  : &AArch64::PPR_p8to15RegClass;
@@ -10192,6 +10191,8 @@ getPredicateRegisterClass(PredicateConstraint 
Constraint, EVT VT) {
 return VT == MVT::aarch64svcount ? &AArch64::PNRRegClass
  : &AArch64::PPRRegClass;
   }
+
+  llvm_unreachable("Missing PredicateConstraint!");
 }
 
 // The set of cc code supported is from
@@ -10289,9 +10290,8 @@ AArch64TargetLowering::getConstraintType(StringRef 
Constraint) const {
 case 'S': // A symbolic address
   return C_Other;
 }
-  } else if (parsePredicateConstraint(Constraint) !=
- PredicateConstraint::Invalid)
-  return C_RegisterClass;
+  } else if (parsePredicateConstraint(Constraint))
+return C_RegisterClass;
   else if (parseConstraintCode(Constraint) != AArch64CC::Invalid)
 return C_Other;
   return TargetLowering::getConstraintType(Constraint);
@@ -10325,7 +10325,7 @@ AArch64TargetLowering::getSingleConstraintMatchWeight(
 weight = CW_Constant;
 break;
   case 'U':
-if (parsePredicateConstraint(constraint) != PredicateConstraint::Invalid)
+if (parsePredicateConstraint(constraint))
   weight = CW_Register;
 break;
   }
@@ -10382,9 +10382,9 @@ AArch64TargetLowering::getRegForInlineAsmConstraint(
   break;
 }
   } else {
-PredicateConstraint PC = parsePredicateConstraint(Constraint);
-if (const TargetRegisterClass *RegClass = getPredicateRegisterClass(PC, 
VT))
-  return std::make_pair(0U, RegClass);
+if (const auto PC = parsePredicateConstraint(Constraint))
+  if (const auto *RegClass = getPredicateRegisterClass(*PC, VT))
+return std::make_pair(0U, RegClass);
   }
   if (StringRef("{cc}").equals_insensitive(Constraint) ||
   parseConstraintCode(Constraint) != AArch64CC::Invalid)



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Add release note about ABI implementation changes for _BitInt on Arm (PR #105659)

2024-08-22 Thread Paul Walker via llvm-branch-commits

https://github.com/paulwalker-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/105659
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Paul Walker via llvm-branch-commits

https://github.com/paulwalker-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 2b8db40 - [SVE] Restrict the usage of REINTERPRET_CAST.

2021-01-15 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2021-01-15T11:32:13Z
New Revision: 2b8db40c92186731effd8948049919db8cf37dee

URL: 
https://github.com/llvm/llvm-project/commit/2b8db40c92186731effd8948049919db8cf37dee
DIFF: 
https://github.com/llvm/llvm-project/commit/2b8db40c92186731effd8948049919db8cf37dee.diff

LOG: [SVE] Restrict the usage of REINTERPRET_CAST.

In order to limit the number of combinations of REINTERPRET_CAST,
whilst at the same time prevent overlap with BITCAST, this patch
establishes the following rules:

1. The operand and result element types must be the same.
2. The operand and/or result type must be an unpacked type.

Differential Revision: https://reviews.llvm.org/D94593

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.h
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index adfe492d6181..d72eee5abc26 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -144,6 +144,25 @@ static inline EVT getPackedSVEVectorVT(EVT VT) {
 return MVT::nxv4f32;
   case MVT::f64:
 return MVT::nxv2f64;
+  case MVT::bf16:
+return MVT::nxv8bf16;
+  }
+}
+
+// NOTE: Currently there's only a need to return integer vector types. If this
+// changes then just add an extra "type" parameter.
+static inline EVT getPackedSVEVectorVT(ElementCount EC) {
+  switch (EC.getKnownMinValue()) {
+  default:
+llvm_unreachable("unexpected element count for vector");
+  case 16:
+return MVT::nxv16i8;
+  case 8:
+return MVT::nxv8i16;
+  case 4:
+return MVT::nxv4i32;
+  case 2:
+return MVT::nxv2i64;
   }
 }
 
@@ -3988,14 +4007,10 @@ SDValue AArch64TargetLowering::LowerMGATHER(SDValue Op,
   !static_cast(DAG.getSubtarget()).hasBF16())
 return SDValue();
 
-  // Handle FP data
+  // Handle FP data by using an integer gather and casting the result.
   if (VT.isFloatingPoint()) {
-ElementCount EC = VT.getVectorElementCount();
-auto ScalarIntVT =
-MVT::getIntegerVT(AArch64::SVEBitsPerBlock / EC.getKnownMinValue());
-PassThru = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL,
-   MVT::getVectorVT(ScalarIntVT, EC), PassThru);
-
+EVT PassThruVT = getPackedSVEVectorVT(VT.getVectorElementCount());
+PassThru = getSVESafeBitCast(PassThruVT, PassThru, DAG);
 InputVT = DAG.getValueType(MemVT.changeVectorElementTypeToInteger());
   }
 
@@ -4015,7 +4030,7 @@ SDValue AArch64TargetLowering::LowerMGATHER(SDValue Op,
   SDValue Gather = DAG.getNode(Opcode, DL, VTs, Ops);
 
   if (VT.isFloatingPoint()) {
-SDValue Cast = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL, VT, Gather);
+SDValue Cast = getSVESafeBitCast(VT, Gather, DAG);
 return DAG.getMergeValues({Cast, Gather}, DL);
   }
 
@@ -4052,15 +4067,10 @@ SDValue AArch64TargetLowering::LowerMSCATTER(SDValue Op,
   !static_cast(DAG.getSubtarget()).hasBF16())
 return SDValue();
 
-  // Handle FP data
+  // Handle FP data by casting the data so an integer scatter can be used.
   if (VT.isFloatingPoint()) {
-VT = VT.changeVectorElementTypeToInteger();
-ElementCount EC = VT.getVectorElementCount();
-auto ScalarIntVT =
-MVT::getIntegerVT(AArch64::SVEBitsPerBlock / EC.getKnownMinValue());
-StoreVal = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL,
-   MVT::getVectorVT(ScalarIntVT, EC), StoreVal);
-
+EVT StoreValVT = getPackedSVEVectorVT(VT.getVectorElementCount());
+StoreVal = getSVESafeBitCast(StoreValVT, StoreVal, DAG);
 InputVT = DAG.getValueType(MemVT.changeVectorElementTypeToInteger());
   }
 
@@ -17157,3 +17167,40 @@ SDValue 
AArch64TargetLowering::LowerFixedLengthVectorSetccToSVE(
   auto Promote = DAG.getBoolExtOrTrunc(Cmp, DL, PromoteVT, InVT);
   return convertFromScalableVector(DAG, Op.getValueType(), Promote);
 }
+
+SDValue AArch64TargetLowering::getSVESafeBitCast(EVT VT, SDValue Op,
+ SelectionDAG &DAG) const {
+  SDLoc DL(Op);
+  EVT InVT = Op.getValueType();
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+
+  assert(VT.isScalableVector() && TLI.isTypeLegal(VT) &&
+ InVT.isScalableVector() && TLI.isTypeLegal(InVT) &&
+ "Only expect to cast between legal scalable vector types!");
+  assert((VT.getVectorElementType() == MVT::i1) ==
+ (InVT.getVectorElementType() == MVT::i1) &&
+ "Cannot cast between data and predicate scalable vector types!");
+
+  if (InVT == VT)
+return Op;
+
+  if (VT.getVectorElementType() == MVT::i1)
+return DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL, VT, Op);
+
+  EVT PackedVT = getPackedSVEVectorVT(VT.getVectorElementType());
+  EVT PackedInVT = getPackedSVEVectorVT(I

[llvm-branch-commits] [llvm] eba6dea - [SVE] Lower vector CTLZ, CTPOP and CTTZ operations.

2021-01-05 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2021-01-05T10:42:35Z
New Revision: eba6deab22b576004a209b3f42ccc5e58f7603bf

URL: 
https://github.com/llvm/llvm-project/commit/eba6deab22b576004a209b3f42ccc5e58f7603bf
DIFF: 
https://github.com/llvm/llvm-project/commit/eba6deab22b576004a209b3f42ccc5e58f7603bf.diff

LOG: [SVE] Lower vector CTLZ, CTPOP and CTTZ operations.

CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively.

CTTZ is not a native SVE operation but is instead lowered to:
  CTTZ(V) => CTLZ(BITREVERSE(V))

In the case of fixed-length support using SVE we also lower CTTZ
operating on NEON sized vectors because of its reliance on
BITREVERSE which is also lowered to SVE intructions at these lengths.

Differential Revision: https://reviews.llvm.org/D93607

Added: 
llvm/test/CodeGen/AArch64/sve-bit-counting.ll
llvm/test/CodeGen/AArch64/sve-fixed-length-bit-counting.ll

Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.h
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
llvm/lib/Target/AArch64/SVEInstrFormats.td

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2012f1247a0f..faed7c64a15e 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -184,6 +184,8 @@ static bool isMergePassthruOpcode(unsigned Opc) {
 return false;
   case AArch64ISD::BITREVERSE_MERGE_PASSTHRU:
   case AArch64ISD::BSWAP_MERGE_PASSTHRU:
+  case AArch64ISD::CTLZ_MERGE_PASSTHRU:
+  case AArch64ISD::CTPOP_MERGE_PASSTHRU:
   case AArch64ISD::DUP_MERGE_PASSTHRU:
   case AArch64ISD::FNEG_MERGE_PASSTHRU:
   case AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU:
@@ -1070,6 +1072,9 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
 for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
   setOperationAction(ISD::BITREVERSE, VT, Custom);
   setOperationAction(ISD::BSWAP, VT, Custom);
+  setOperationAction(ISD::CTLZ, VT, Custom);
+  setOperationAction(ISD::CTPOP, VT, Custom);
+  setOperationAction(ISD::CTTZ, VT, Custom);
   setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
   setOperationAction(ISD::UINT_TO_FP, VT, Custom);
   setOperationAction(ISD::SINT_TO_FP, VT, Custom);
@@ -1188,6 +1193,9 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
 
   // These operations are not supported on NEON but SVE can do them.
   setOperationAction(ISD::BITREVERSE, MVT::v1i64, Custom);
+  setOperationAction(ISD::CTLZ, MVT::v1i64, Custom);
+  setOperationAction(ISD::CTLZ, MVT::v2i64, Custom);
+  setOperationAction(ISD::CTTZ, MVT::v1i64, Custom);
   setOperationAction(ISD::MUL, MVT::v1i64, Custom);
   setOperationAction(ISD::MUL, MVT::v2i64, Custom);
   setOperationAction(ISD::SDIV, MVT::v8i8, Custom);
@@ -1223,6 +1231,7 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
   for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
   MVT::v2i32, MVT::v4i32, MVT::v2i64}) {
 setOperationAction(ISD::BITREVERSE, VT, Custom);
+setOperationAction(ISD::CTTZ, VT, Custom);
 setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
 setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
 setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
@@ -1338,6 +1347,9 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT 
VT) {
   setOperationAction(ISD::ANY_EXTEND, VT, Custom);
   setOperationAction(ISD::BITREVERSE, VT, Custom);
   setOperationAction(ISD::BSWAP, VT, Custom);
+  setOperationAction(ISD::CTLZ, VT, Custom);
+  setOperationAction(ISD::CTPOP, VT, Custom);
+  setOperationAction(ISD::CTTZ, VT, Custom);
   setOperationAction(ISD::FADD, VT, Custom);
   setOperationAction(ISD::FCEIL, VT, Custom);
   setOperationAction(ISD::FDIV, VT, Custom);
@@ -1944,6 +1956,8 @@ const char 
*AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
 MAKE_CASE(AArch64ISD::STNP)
 MAKE_CASE(AArch64ISD::BITREVERSE_MERGE_PASSTHRU)
 MAKE_CASE(AArch64ISD::BSWAP_MERGE_PASSTHRU)
+MAKE_CASE(AArch64ISD::CTLZ_MERGE_PASSTHRU)
+MAKE_CASE(AArch64ISD::CTPOP_MERGE_PASSTHRU)
 MAKE_CASE(AArch64ISD::DUP_MERGE_PASSTHRU)
 MAKE_CASE(AArch64ISD::INDEX_VECTOR)
 MAKE_CASE(AArch64ISD::UABD)
@@ -3577,6 +3591,17 @@ SDValue 
AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
   case Intrinsic::aarch64_sve_ptrue:
 return DAG.getNode(AArch64ISD::PTRUE, dl, Op.getValueType(),
Op.getOperand(1));
+  case Intrinsic::aarch64_sve_clz:
+return DAG.getNode(AArch64ISD::CTLZ_MERGE_PASSTHRU, dl, Op.getValueType(),
+   Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
+  case Intrinsic::aarch64_sve_cnt: {
+

[llvm-branch-commits] [llvm] 6d35bd1 - [CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors.

2020-12-15 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2020-12-15T10:57:51Z
New Revision: 6d35bd1d48e9fdde38483e6b22a900daa7e3d46a

URL: 
https://github.com/llvm/llvm-project/commit/6d35bd1d48e9fdde38483e6b22a900daa7e3d46a
DIFF: 
https://github.com/llvm/llvm-project/commit/6d35bd1d48e9fdde38483e6b22a900daa7e3d46a.diff

LOG: [CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors.

optimizeGatherScatterInst does nothing specific to fixed length vectors
but uses FixedVectorType to extract the number of elements.  This patch
simply updates the code to use VectorType and getElementCount instead.

For testing I just copied Transforms/CodeGenPrepare/X86/gather-scatter-opt.ll
replacing `<4 x ` with `https://reviews.llvm.org/D92572

Added: 
llvm/test/Transforms/CodeGenPrepare/AArch64/gather-scatter-opt.ll

Modified: 
llvm/lib/CodeGen/CodeGenPrepare.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp 
b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 9b44b30e7a1b..8fe5cb9faba4 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -5304,14 +5304,10 @@ bool CodeGenPrepare::optimizeMemoryInst(Instruction 
*MemoryInst, Value *Addr,
 ///
 /// If the final index isn't a vector or is a splat, we can emit a scalar GEP
 /// followed by a GEP with an all zeroes vector index. This will enable
-/// SelectionDAGBuilder to use a the scalar GEP as the uniform base and have a
+/// SelectionDAGBuilder to use the scalar GEP as the uniform base and have a
 /// zero index.
 bool CodeGenPrepare::optimizeGatherScatterInst(Instruction *MemoryInst,
Value *Ptr) {
-  // FIXME: Support scalable vectors.
-  if (isa(Ptr->getType()))
-return false;
-
   Value *NewAddr;
 
   if (const auto *GEP = dyn_cast(Ptr)) {
@@ -5370,7 +5366,7 @@ bool 
CodeGenPrepare::optimizeGatherScatterInst(Instruction *MemoryInst,
 if (!RewriteGEP && Ops.size() == 2)
   return false;
 
-unsigned NumElts = cast(Ptr->getType())->getNumElements();
+auto NumElts = cast(Ptr->getType())->getElementCount();
 
 IRBuilder<> Builder(MemoryInst);
 
@@ -5380,7 +5376,7 @@ bool 
CodeGenPrepare::optimizeGatherScatterInst(Instruction *MemoryInst,
 // and a vector GEP with all zeroes final index.
 if (!Ops[FinalIndex]->getType()->isVectorTy()) {
   NewAddr = Builder.CreateGEP(Ops[0], makeArrayRef(Ops).drop_front());
-  auto *IndexTy = FixedVectorType::get(ScalarIndexTy, NumElts);
+  auto *IndexTy = VectorType::get(ScalarIndexTy, NumElts);
   NewAddr = Builder.CreateGEP(NewAddr, Constant::getNullValue(IndexTy));
 } else {
   Value *Base = Ops[0];
@@ -5403,13 +5399,13 @@ bool 
CodeGenPrepare::optimizeGatherScatterInst(Instruction *MemoryInst,
 if (!V)
   return false;
 
-unsigned NumElts = cast(Ptr->getType())->getNumElements();
+auto NumElts = cast(Ptr->getType())->getElementCount();
 
 IRBuilder<> Builder(MemoryInst);
 
 // Emit a vector GEP with a scalar pointer and all 0s vector index.
 Type *ScalarIndexTy = DL->getIndexType(V->getType()->getScalarType());
-auto *IndexTy = FixedVectorType::get(ScalarIndexTy, NumElts);
+auto *IndexTy = VectorType::get(ScalarIndexTy, NumElts);
 NewAddr = Builder.CreateGEP(V, Constant::getNullValue(IndexTy));
   } else {
 // Constant, SelectionDAGBuilder knows to check if its a splat.

diff  --git a/llvm/test/Transforms/CodeGenPrepare/AArch64/gather-scatter-opt.ll 
b/llvm/test/Transforms/CodeGenPrepare/AArch64/gather-scatter-opt.ll
new file mode 100644
index ..08011b6b5b6a
--- /dev/null
+++ b/llvm/test/Transforms/CodeGenPrepare/AArch64/gather-scatter-opt.ll
@@ -0,0 +1,113 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -codegenprepare < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+%struct.a = type { i32, i32 }
+@c = external dso_local global %struct.a, align 4
+@glob_array = internal unnamed_addr constant [16 x i32] [i32 1, i32 1, i32 2, 
i32 3, i32 5, i32 8, i32 13, i32 21, i32 34, i32 55, i32 89, i32 144, i32 233, 
i32 377, i32 610, i32 987], align 16
+
+define  @splat_base(i32* %base,  %index, 
 %mask) #0 {
+; CHECK-LABEL: @splat_base(
+; CHECK-NEXT:[[TMP1:%.*]] = getelementptr i32, i32* [[BASE:%.*]],  [[INDEX:%.*]]
+; CHECK-NEXT:[[RES:%.*]] = call  
@llvm.masked.gather.nxv4i32.nxv4p0i32( [[TMP1]], i32 4, 
 [[MASK:%.*]],  undef)
+; CHECK-NEXT:ret  [[RES]]
+;
+  %broadcast.splatinsert = insertelement  undef, i32* 
%base, i32 0
+  %broadcast.splat = shufflevector  %broadcast.splatinsert, 
 undef,  zeroinitializer
+  %gep = getelementptr i32,  %broadcast.splat,  %index
+  %res = call  @llvm.masked.gather.nxv4i32.nxv4p0i32( %gep, i32 4,  %mask,  undef)
+  ret  %res
+}
+
+define  @splat_struct(%struct.a* %base,  
%mask) #0 {
+; CHECK-LABEL: @splat_struct(
+; CHECK-NEXT:[[TMP1:%.*]] = getel

[llvm-branch-commits] [llvm] b74c4db - [SVE] Move INT_TO_FP i1 promotion into custom lowering.

2020-12-15 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2020-12-15T11:57:07Z
New Revision: b74c4dbb9634f6210c6539fb4c09b0b68cb3cf0a

URL: 
https://github.com/llvm/llvm-project/commit/b74c4dbb9634f6210c6539fb4c09b0b68cb3cf0a
DIFF: 
https://github.com/llvm/llvm-project/commit/b74c4dbb9634f6210c6539fb4c09b0b68cb3cf0a.diff

LOG: [SVE] Move INT_TO_FP i1 promotion into custom lowering.

AddPromotedToType is being used to legalise INT_TO_FP operations
when the source is a predicate. The point where this introduces
vector extends might cause problems in the future so this patch
falls back to manual promotion within custom lowering.

Differential Revision: https://reviews.llvm.org/D90093

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 0d4dba6dcecf..f7ba135ad946 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -147,7 +147,7 @@ static inline EVT getPackedSVEVectorVT(EVT VT) {
   }
 }
 
-static inline MVT getPromotedVTForPredicate(MVT VT) {
+static inline EVT getPromotedVTForPredicate(EVT VT) {
   assert(VT.isScalableVector() && (VT.getVectorElementType() == MVT::i1) &&
  "Expected scalable predicate vector type!");
   switch (VT.getVectorMinNumElements()) {
@@ -1113,10 +1113,8 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
 
   // There are no legal MVT::nxv16f## based types.
   if (VT != MVT::nxv16i1) {
-setOperationAction(ISD::SINT_TO_FP, VT, Promote);
-AddPromotedToType(ISD::SINT_TO_FP, VT, getPromotedVTForPredicate(VT));
-setOperationAction(ISD::UINT_TO_FP, VT, Promote);
-AddPromotedToType(ISD::UINT_TO_FP, VT, getPromotedVTForPredicate(VT));
+setOperationAction(ISD::SINT_TO_FP, VT, Custom);
+setOperationAction(ISD::UINT_TO_FP, VT, Custom);
   }
 }
 
@@ -3179,11 +3177,20 @@ SDValue 
AArch64TargetLowering::LowerVectorINT_TO_FP(SDValue Op,
   SDLoc dl(Op);
   SDValue In = Op.getOperand(0);
   EVT InVT = In.getValueType();
+  unsigned Opc = Op.getOpcode();
+  bool IsSigned = Opc == ISD::SINT_TO_FP || Opc == ISD::STRICT_SINT_TO_FP;
 
   if (VT.isScalableVector()) {
-unsigned Opcode = Op.getOpcode() == ISD::UINT_TO_FP
-  ? AArch64ISD::UINT_TO_FP_MERGE_PASSTHRU
-  : AArch64ISD::SINT_TO_FP_MERGE_PASSTHRU;
+if (InVT.getVectorElementType() == MVT::i1) {
+  // We can't directly extend an SVE predicate; extend it first.
+  unsigned CastOpc = IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
+  EVT CastVT = getPromotedVTForPredicate(InVT);
+  In = DAG.getNode(CastOpc, dl, CastVT, In);
+  return DAG.getNode(Opc, dl, VT, In);
+}
+
+unsigned Opcode = IsSigned ? AArch64ISD::SINT_TO_FP_MERGE_PASSTHRU
+   : AArch64ISD::UINT_TO_FP_MERGE_PASSTHRU;
 return LowerToPredicatedOp(Op, DAG, Opcode);
   }
 
@@ -3193,16 +3200,15 @@ SDValue 
AArch64TargetLowering::LowerVectorINT_TO_FP(SDValue Op,
 MVT CastVT =
 MVT::getVectorVT(MVT::getFloatingPointVT(InVT.getScalarSizeInBits()),
  InVT.getVectorNumElements());
-In = DAG.getNode(Op.getOpcode(), dl, CastVT, In);
+In = DAG.getNode(Opc, dl, CastVT, In);
 return DAG.getNode(ISD::FP_ROUND, dl, VT, In, DAG.getIntPtrConstant(0, 
dl));
   }
 
   if (VTSize > InVTSize) {
-unsigned CastOpc =
-Op.getOpcode() == ISD::SINT_TO_FP ? ISD::SIGN_EXTEND : 
ISD::ZERO_EXTEND;
+unsigned CastOpc = IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
 EVT CastVT = VT.changeVectorElementTypeToInteger();
 In = DAG.getNode(CastOpc, dl, CastVT, In);
-return DAG.getNode(Op.getOpcode(), dl, VT, In);
+return DAG.getNode(Opc, dl, VT, In);
   }
 
   return Op;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 632f4d2 - [NFC] Fix a few SVEInstrInfo related stylistic issues.

2020-12-15 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2020-12-15T16:10:38Z
New Revision: 632f4d2747f0777157d10456dd431d8f4cece845

URL: 
https://github.com/llvm/llvm-project/commit/632f4d2747f0777157d10456dd431d8f4cece845
DIFF: 
https://github.com/llvm/llvm-project/commit/632f4d2747f0777157d10456dd431d8f4cece845.diff

LOG: [NFC] Fix a few SVEInstrInfo related stylistic issues.

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64.td
llvm/lib/Target/AArch64/AArch64InstrFormats.td
llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
llvm/lib/Target/AArch64/SVEInstrFormats.td

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64.td 
b/llvm/lib/Target/AArch64/AArch64.td
index 41abfa32da62f..5bafe430a1b4c 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -437,7 +437,6 @@ def HasV8_5aOps : SubtargetFeature<
 
 def HasV8_6aOps : SubtargetFeature<
   "v8.6a", "HasV8_6aOps", "true", "Support ARM v8.6a instructions",
-
   [HasV8_5aOps, FeatureAMVS, FeatureBF16, FeatureFineGrainedTraps,
FeatureEnhancedCounterVirtualization, FeatureMatMulInt8]>;
 

diff  --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td 
b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index 6d17b283231a2..8e01a8cf7beb9 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -325,7 +325,7 @@ def simm9 : Operand, ImmLeaf= 
-256 && Imm < 256; }]> {
 }
 
 def SImm8Operand : SImmOperand<8>;
-def simm8 : Operand, ImmLeaf= -128 && Imm < 127; }]> 
{
+def simm8 : Operand, ImmLeaf= -128 && Imm < 128; }]> 
{
   let ParserMatchClass = SImm8Operand;
   let DecoderMethod = "DecodeSImm<8>";
 }

diff  --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 22f78ce61128f..86cfdf8f7cf97 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -558,11 +558,11 @@ void AArch64RegisterInfo::resolveFrameIndex(MachineInstr 
&MI, Register BaseReg,
   StackOffset Off = StackOffset::getFixed(Offset);
 
   unsigned i = 0;
-
   while (!MI.getOperand(i).isFI()) {
 ++i;
 assert(i < MI.getNumOperands() && "Instr doesn't have FrameIndex 
operand!");
   }
+
   const MachineFunction *MF = MI.getParent()->getParent();
   const AArch64InstrInfo *TII =
   MF->getSubtarget().getInstrInfo();
@@ -604,7 +604,6 @@ void 
AArch64RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
   const AArch64InstrInfo *TII =
   MF.getSubtarget().getInstrInfo();
   const AArch64FrameLowering *TFI = getFrameLowering(MF);
-
   int FrameIndex = MI.getOperand(FIOperandNum).getIndex();
   bool Tagged =
   MI.getOperand(FIOperandNum).getTargetFlags() & AArch64II::MO_TAGGED;

diff  --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td 
b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index bdf5d1d771c79..adbace24ee6c5 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -575,7 +575,7 @@ let Predicates = [HasSVE] in {
   }
 
   // Select elements from either vector (predicated)
-  defm SEL_ZPZZ: sve_int_sel_vvv<"sel", vselect>;
+  defm SEL_ZPZZ   : sve_int_sel_vvv<"sel", vselect>;
 
   defm SPLICE_ZPZ : sve_int_perm_splice<"splice", int_aarch64_sve_splice>;
 
@@ -1062,7 +1062,7 @@ let Predicates = [HasSVE] in {
   def PRFS_PRR : sve_mem_prfm_ss<0b101, "prfw", GPR64NoXZRshifted32>;
   def PRFD_PRR : sve_mem_prfm_ss<0b111, "prfd", GPR64NoXZRshifted64>;
 
-multiclass sve_prefetch {
+  multiclass sve_prefetch {
 // reg + imm
 let AddedComplexity = 2 in {
   def _reg_imm : Pat<(prefetch (PredTy PPR_3b:$gp), (am_sve_indexed_s6 
GPR64sp:$base, simm6s1:$offset), (i32 sve_prfop:$prfop)),
@@ -1735,7 +1735,6 @@ multiclass sve_prefetch;
 def : Pat<(nxv2f64 (bitconvert (nxv8f16 ZPR:$src))), (nxv2f64 ZPR:$src)>;
 def : Pat<(nxv2f64 (bitconvert (nxv4f32 ZPR:$src))), (nxv2f64 ZPR:$src)>;
-
   }
 
   let Predicates = [IsLE, HasBF16, HasSVE] in {
@@ -2434,6 +2433,7 @@ let Predicates = [HasSVE2] in {
 (UMULH_ZZZ_S $Op1, $Op2)>;
   def : Pat<(nxv2i64 (int_aarch64_sve_umulh (nxv2i1 (AArch64ptrue 31)), 
nxv2i64:$Op1, nxv2i64:$Op2)),
 (UMULH_ZZZ_D $Op1, $Op2)>;
+
   // SVE2 complex integer dot product (indexed)
   defm CDOT_ZZZI : sve2_cintx_dot_by_indexed_elem<"cdot", 
int_aarch64_sve_cdot_lane>;
 

diff  --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td 
b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index c86b425422580..0db00247cd01b 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -1012,8 +1012,8 @@ multiclass sve_int_perm_dup_i {
   (!cast(NAME # _Q) ZPR128:$Zd, FPR128asZPR:$Qn, 
0), 2>;
 }
 
-class sve_int_perm_tbl sz8_64, bits<2> opc, string asm,
-

[llvm-branch-commits] [llvm] c0bc169 - [NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions.

2020-12-18 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2020-12-18T13:20:41Z
New Revision: c0bc169cb17397e981952dad7321b263756ddaa0

URL: 
https://github.com/llvm/llvm-project/commit/c0bc169cb17397e981952dad7321b263756ddaa0
DIFF: 
https://github.com/llvm/llvm-project/commit/c0bc169cb17397e981952dad7321b263756ddaa0.diff

LOG: [NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions.

During isel there's no need to protect illegal types. Patch also
adds a missing unit test for tbl2 intrinsic using bfloat types.

Differential Revision: https://reviews.llvm.org/D93404

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
llvm/lib/Target/AArch64/SVEInstrFormats.td
llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td 
b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index fbe24460d51f..f28c55ae22e6 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -373,10 +373,6 @@ let Predicates = [HasSVE] in {
   defm CLZ_ZPmZ  : sve_int_un_pred_arit_1<   0b001, "clz",  
int_aarch64_sve_clz>;
   defm CNT_ZPmZ  : sve_int_un_pred_arit_1<   0b010, "cnt",  
int_aarch64_sve_cnt>;
 
- let Predicates = [HasSVE, HasBF16] in {
-  def : SVE_3_Op_Pat(CNT_ZPmZ_H)>;
- }
-
   defm CNOT_ZPmZ : sve_int_un_pred_arit_1<   0b011, "cnot", 
int_aarch64_sve_cnot>;
   defm NOT_ZPmZ  : sve_int_un_pred_arit_1<   0b110, "not",  
int_aarch64_sve_not>;
   defm FABS_ZPmZ : sve_int_un_pred_arit_1_fp<0b100, "fabs", AArch64fabs_mt>;
@@ -514,11 +510,6 @@ let Predicates = [HasSVE] in {
   defm CPY_ZPmR : sve_int_perm_cpy_r<"cpy", AArch64dup_mt>;
   defm CPY_ZPmV : sve_int_perm_cpy_v<"cpy", AArch64dup_mt>;
 
-  let Predicates = [HasSVE, HasBF16] in {
-def : Pat<(nxv8bf16 (AArch64dup_mt nxv8i1:$pg, bf16:$splat, 
nxv8bf16:$passthru)),
-  (CPY_ZPmV_H $passthru, $pg, $splat)>;
-  }
-
   // Duplicate FP scalar into all vector elements
   def : Pat<(nxv8f16 (AArch64dup (f16 FPR16:$src))),
 (DUP_ZZI_H (INSERT_SUBREG (IMPLICIT_DEF), FPR16:$src, hsub), 0)>;
@@ -532,10 +523,8 @@ let Predicates = [HasSVE] in {
 (DUP_ZZI_S (INSERT_SUBREG (IMPLICIT_DEF), FPR32:$src, ssub), 0)>;
   def : Pat<(nxv2f64 (AArch64dup (f64 FPR64:$src))),
 (DUP_ZZI_D (INSERT_SUBREG (IMPLICIT_DEF), FPR64:$src, dsub), 0)>;
-  let Predicates = [HasSVE, HasBF16] in {
-def : Pat<(nxv8bf16 (AArch64dup (bf16 FPR16:$src))),
-  (DUP_ZZI_H (INSERT_SUBREG (IMPLICIT_DEF), FPR16:$src, hsub), 0)>;
-  }
+  def : Pat<(nxv8bf16 (AArch64dup (bf16 FPR16:$src))),
+(DUP_ZZI_H (INSERT_SUBREG (IMPLICIT_DEF), FPR16:$src, hsub), 0)>;
 
   // Duplicate +0.0 into all vector elements
   def : Pat<(nxv8f16 (AArch64dup (f16 fpimm0))), (DUP_ZI_H 0, 0)>;
@@ -544,9 +533,7 @@ let Predicates = [HasSVE] in {
   def : Pat<(nxv4f32 (AArch64dup (f32 fpimm0))), (DUP_ZI_S 0, 0)>;
   def : Pat<(nxv2f32 (AArch64dup (f32 fpimm0))), (DUP_ZI_S 0, 0)>;
   def : Pat<(nxv2f64 (AArch64dup (f64 fpimm0))), (DUP_ZI_D 0, 0)>;
-  let Predicates = [HasSVE, HasBF16] in {
-def : Pat<(nxv8bf16 (AArch64dup (bf16 fpimm0))), (DUP_ZI_H 0, 0)>;
-  }
+  def : Pat<(nxv8bf16 (AArch64dup (bf16 fpimm0))), (DUP_ZI_H 0, 0)>;
 
   // Duplicate Int immediate into all vector elements
   def : Pat<(nxv16i8 (AArch64dup (i32 (SVE8BitLslImm i32:$a, i32:$b,
@@ -579,20 +566,11 @@ let Predicates = [HasSVE] in {
 
   defm SPLICE_ZPZ : sve_int_perm_splice<"splice", int_aarch64_sve_splice>;
 
-  let Predicates = [HasSVE, HasBF16] in {
-def : SVE_3_Op_Pat;
-def : SVE_3_Op_Pat;
-  }
-
   defm COMPACT_ZPZ : sve_int_perm_compact<"compact", int_aarch64_sve_compact>;
   defm INSR_ZR : sve_int_perm_insrs<"insr", AArch64insr>;
   defm INSR_ZV : sve_int_perm_insrv<"insr", AArch64insr>;
   defm EXT_ZZI : sve_int_perm_extract_i<"ext", AArch64ext>;
 
-  let Predicates = [HasSVE, HasBF16] in {
-def : SVE_2_Op_Pat;
-  }
-
   defm RBIT_ZPmZ : sve_int_perm_rev_rbit<"rbit", int_aarch64_sve_rbit>;
   defm REVB_ZPmZ : sve_int_perm_rev_revb<"revb", int_aarch64_sve_revb, bswap>;
   defm REVH_ZPmZ : sve_int_perm_rev_revh<"revh", int_aarch64_sve_revh>;
@@ -601,10 +579,6 @@ let Predicates = [HasSVE] in {
   defm REV_PP : sve_int_perm_reverse_p<"rev", AArch64rev>;
   defm REV_ZZ : sve_int_perm_reverse_z<"rev", AArch64rev>;
 
-  let Predicates = [HasSVE, HasBF16] in {
-def : SVE_1_Op_Pat;
-  }
-
   defm SUNPKLO_ZZ : sve_int_perm_unpk<0b00, "sunpklo", AArch64sunpklo>;
   defm SUNPKHI_ZZ : sve_int_perm_unpk<0b01, "sunpkhi", AArch64sunpkhi>;
   defm UUNPKLO_ZZ : sve_int_perm_unpk<0b10, "uunpklo", AArch64uunpklo>;
@@ -661,23 +635,11 @@ let Predicates = [HasSVE] in {
   defm CLASTA_ZPZ : sve_int_perm_clast_zz<0, "clasta", int_aarch64_sve_clasta>;
   defm CLASTB_ZPZ : sve_int_perm_clast_zz<1, "clastb", int_aarch64_sve_clastb>;
 
-  let Predicates = [HasSVE, HasBF16] in

[llvm-branch-commits] [llvm] fc712eb - [AArch64] Fix Copy Elemination for negative values

2020-12-18 Thread Paul Walker via llvm-branch-commits

Author: Tomas Matheson
Date: 2020-12-18T13:30:46Z
New Revision: fc712eb7aa00aabcdafda54776038efdc486d570

URL: 
https://github.com/llvm/llvm-project/commit/fc712eb7aa00aabcdafda54776038efdc486d570
DIFF: 
https://github.com/llvm/llvm-project/commit/fc712eb7aa00aabcdafda54776038efdc486d570.diff

LOG: [AArch64] Fix Copy Elemination for negative values

Redundant Copy Elimination was eliminating a MOVi32imm -1 when it
determined that the value of the destination register is already -1.
However, it didn't take into account that the MOVi32imm zeroes the upper
32 bits (which are ) and therefore cannot be eliminated.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D93100

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
llvm/test/CodeGen/AArch64/machine-copy-remove.mir

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp 
b/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
index 0d75ab7ac8a9..019220e3a527 100644
--- a/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
@@ -408,6 +408,11 @@ bool 
AArch64RedundantCopyElimination::optimizeBlock(MachineBasicBlock *MBB) {
  O.getReg() != CmpReg;
 }))
   continue;
+
+// Don't remove a move immediate that implicitly defines the upper
+// bits as 
diff erent.
+if (TRI->isSuperRegister(DefReg, KnownReg.Reg) && KnownReg.Imm < 0)
+  continue;
   }
 
   if (IsCopy)

diff  --git a/llvm/test/CodeGen/AArch64/machine-copy-remove.mir 
b/llvm/test/CodeGen/AArch64/machine-copy-remove.mir
index 4e3cb3c12806..b2fc40a4d255 100644
--- a/llvm/test/CodeGen/AArch64/machine-copy-remove.mir
+++ b/llvm/test/CodeGen/AArch64/machine-copy-remove.mir
@@ -536,13 +536,13 @@ body: |
   bb.2:
 RET_ReallyLR
 ...
-# Eliminate redundant MOVi32imm -1 in bb.1
+# Don't eliminate redundant MOVi32imm -1 in bb.1: the upper bits are nonzero.
 # Note: 64-bit compare/32-bit move imm
 # Kill marker should be removed from compare.
 # CHECK-LABEL: name: test21
-# CHECK: ADDSXri $x0, 1, 0, implicit-def $nzcv
+# CHECK: ADDSXri killed $x0, 1, 0, implicit-def $nzcv
 # CHECK: bb.1:
-# CHECK-NOT: MOVi32imm
+# CHECK: MOVi32imm
 name:test21
 tracksRegLiveness: true
 body: |



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 8eec729 - [SVE] Lower vector BITREVERSE and BSWAP operations.

2020-12-22 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2020-12-22T16:49:50Z
New Revision: 8eec7294fea87273215592a2dc5bee6afd47d456

URL: 
https://github.com/llvm/llvm-project/commit/8eec7294fea87273215592a2dc5bee6afd47d456
DIFF: 
https://github.com/llvm/llvm-project/commit/8eec7294fea87273215592a2dc5bee6afd47d456.diff

LOG: [SVE] Lower vector BITREVERSE and BSWAP operations.

These operations are lowered to RBIT and REVB instructions
respectively.  In the case of fixed-length support using SVE we
also lower BITREVERSE operating on NEON sized vectors as this
results in fewer instructions.

Differential Revision: https://reviews.llvm.org/D93606

Added: 
llvm/test/CodeGen/AArch64/sve-fixed-length-rev.ll
llvm/test/CodeGen/AArch64/sve-rev.ll

Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.h
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
llvm/lib/Target/AArch64/SVEInstrFormats.td
llvm/test/CodeGen/AArch64/sve-intrinsics-reversal.ll

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e74bc739ddaf..48fbea840bad 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -182,6 +182,8 @@ static bool isMergePassthruOpcode(unsigned Opc) {
   switch (Opc) {
   default:
 return false;
+  case AArch64ISD::BITREVERSE_MERGE_PASSTHRU:
+  case AArch64ISD::BSWAP_MERGE_PASSTHRU:
   case AArch64ISD::DUP_MERGE_PASSTHRU:
   case AArch64ISD::FNEG_MERGE_PASSTHRU:
   case AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU:
@@ -1066,6 +1068,8 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
 // splat of 0 or undef) once vector selects supported in SVE codegen. See
 // D68877 for more details.
 for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
+  setOperationAction(ISD::BITREVERSE, VT, Custom);
+  setOperationAction(ISD::BSWAP, VT, Custom);
   setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
   setOperationAction(ISD::UINT_TO_FP, VT, Custom);
   setOperationAction(ISD::SINT_TO_FP, VT, Custom);
@@ -1183,6 +1187,7 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
 setOperationAction(ISD::FP_ROUND, VT, Expand);
 
   // These operations are not supported on NEON but SVE can do them.
+  setOperationAction(ISD::BITREVERSE, MVT::v1i64, Custom);
   setOperationAction(ISD::MUL, MVT::v1i64, Custom);
   setOperationAction(ISD::MUL, MVT::v2i64, Custom);
   setOperationAction(ISD::SDIV, MVT::v8i8, Custom);
@@ -1217,6 +1222,7 @@ AArch64TargetLowering::AArch64TargetLowering(const 
TargetMachine &TM,
   // Int operations with no NEON support.
   for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
   MVT::v2i32, MVT::v4i32, MVT::v2i64}) {
+setOperationAction(ISD::BITREVERSE, VT, Custom);
 setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
 setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
 setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
@@ -1330,6 +1336,8 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT 
VT) {
   setOperationAction(ISD::ADD, VT, Custom);
   setOperationAction(ISD::AND, VT, Custom);
   setOperationAction(ISD::ANY_EXTEND, VT, Custom);
+  setOperationAction(ISD::BITREVERSE, VT, Custom);
+  setOperationAction(ISD::BSWAP, VT, Custom);
   setOperationAction(ISD::FADD, VT, Custom);
   setOperationAction(ISD::FCEIL, VT, Custom);
   setOperationAction(ISD::FDIV, VT, Custom);
@@ -1934,6 +1942,8 @@ const char 
*AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
 MAKE_CASE(AArch64ISD::LDP)
 MAKE_CASE(AArch64ISD::STP)
 MAKE_CASE(AArch64ISD::STNP)
+MAKE_CASE(AArch64ISD::BITREVERSE_MERGE_PASSTHRU)
+MAKE_CASE(AArch64ISD::BSWAP_MERGE_PASSTHRU)
 MAKE_CASE(AArch64ISD::DUP_MERGE_PASSTHRU)
 MAKE_CASE(AArch64ISD::INDEX_VECTOR)
 MAKE_CASE(AArch64ISD::UABD)
@@ -3646,7 +3656,13 @@ SDValue 
AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
 return DAG.getNode(AArch64ISD::INSR, dl, Op.getValueType(),
Op.getOperand(1), Scalar);
   }
-
+  case Intrinsic::aarch64_sve_rbit:
+return DAG.getNode(AArch64ISD::BITREVERSE_MERGE_PASSTHRU, dl,
+   Op.getValueType(), Op.getOperand(2), Op.getOperand(3),
+   Op.getOperand(1));
+  case Intrinsic::aarch64_sve_revb:
+return DAG.getNode(AArch64ISD::BSWAP_MERGE_PASSTHRU, dl, Op.getValueType(),
+   Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
   case Intrinsic::aarch64_sve_sxtb:
 return DAG.getNode(
 AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU, dl, Op.getValueType(),
@@ -4357,6 +4373,11 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
 return Lowe

[llvm-branch-commits] [llvm] be85b3e - Fix some misnamed variables in sve-fixed-length-int-minmax.ll.

2020-12-22 Thread Paul Walker via llvm-branch-commits

Author: Paul Walker
Date: 2020-12-22T17:11:23Z
New Revision: be85b3e4324b5a03abd929815b7fc1c2184db97a

URL: 
https://github.com/llvm/llvm-project/commit/be85b3e4324b5a03abd929815b7fc1c2184db97a
DIFF: 
https://github.com/llvm/llvm-project/commit/be85b3e4324b5a03abd929815b7fc1c2184db97a.diff

LOG: Fix some misnamed variables in sve-fixed-length-int-minmax.ll.

Added: 


Modified: 
llvm/test/CodeGen/AArch64/sve-fixed-length-int-minmax.ll

Removed: 




diff  --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-int-minmax.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-int-minmax.ll
index cc9e172de5f8..e94abe815f3c 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-int-minmax.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-int-minmax.ll
@@ -69,14 +69,14 @@ define void @smax_v64i8(<64 x i8>* %a, <64 x i8>* %b) #0 {
 ; Ensure sensible type legalisation.
 ; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].b, vl32
 ; VBITS_EQ_256-DAG: mov w[[A:[0-9]+]], #32
-; VBITS_EQ_256-DAG: ld1b { [[OP1_LO:z[0-9]+]].b }, [[PG]]/z, [x0, x[[A]]]
-; VBITS_EQ_256-DAG: ld1b { [[OP1_HI:z[0-9]+]].b }, [[PG]]/z, [x0]
-; VBITS_EQ_256-DAG: ld1b { [[OP2_LO:z[0-9]+]].b }, [[PG]]/z, [x1, x[[A]]]
-; VBITS_EQ_256-DAG: ld1b { [[OP2_HI:z[0-9]+]].b }, [[PG]]/z, [x1]
+; VBITS_EQ_256-DAG: ld1b { [[OP1_LO:z[0-9]+]].b }, [[PG]]/z, [x0]
+; VBITS_EQ_256-DAG: ld1b { [[OP1_HI:z[0-9]+]].b }, [[PG]]/z, [x0, x[[A]]]
+; VBITS_EQ_256-DAG: ld1b { [[OP2_LO:z[0-9]+]].b }, [[PG]]/z, [x1]
+; VBITS_EQ_256-DAG: ld1b { [[OP2_HI:z[0-9]+]].b }, [[PG]]/z, [x1, x[[A]]]
 ; VBITS_EQ_256-DAG: smax [[RES_LO:z[0-9]+]].b, [[PG]]/m, [[OP1_LO]].b, 
[[OP2_LO]].b
 ; VBITS_EQ_256-DAG: smax [[RES_HI:z[0-9]+]].b, [[PG]]/m, [[OP1_HI]].b, 
[[OP2_HI]].b
-; VBITS_EQ_256-DAG: st1b { [[RES_LO]].b }, [[PG]], [x0, x[[A]]]
-; VBITS_EQ_256-DAG: st1b { [[RES_HI]].b }, [[PG]], [x0]
+; VBITS_EQ_256-DAG: st1b { [[RES_LO]].b }, [[PG]], [x0]
+; VBITS_EQ_256-DAG: st1b { [[RES_HI]].b }, [[PG]], [x0, x[[A]]]
 ; VBITS_EQ_256-NEXT: ret
   %op1 = load <64 x i8>, <64 x i8>* %a
   %op2 = load <64 x i8>, <64 x i8>* %b
@@ -442,14 +442,14 @@ define void @smin_v64i8(<64 x i8>* %a, <64 x i8>* %b) #0 {
 ; Ensure sensible type legalisation.
 ; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].b, vl32
 ; VBITS_EQ_256-DAG: mov w[[A:[0-9]+]], #32
-; VBITS_EQ_256-DAG: ld1b { [[OP1_LO:z[0-9]+]].b }, [[PG]]/z, [x0, x[[A]]]
-; VBITS_EQ_256-DAG: ld1b { [[OP1_HI:z[0-9]+]].b }, [[PG]]/z, [x0]
-; VBITS_EQ_256-DAG: ld1b { [[OP2_LO:z[0-9]+]].b }, [[PG]]/z, [x1, x[[A]]]
-; VBITS_EQ_256-DAG: ld1b { [[OP2_HI:z[0-9]+]].b }, [[PG]]/z, [x1]
+; VBITS_EQ_256-DAG: ld1b { [[OP1_LO:z[0-9]+]].b }, [[PG]]/z, [x0]
+; VBITS_EQ_256-DAG: ld1b { [[OP1_HI:z[0-9]+]].b }, [[PG]]/z, [x0, x[[A]]]
+; VBITS_EQ_256-DAG: ld1b { [[OP2_LO:z[0-9]+]].b }, [[PG]]/z, [x1]
+; VBITS_EQ_256-DAG: ld1b { [[OP2_HI:z[0-9]+]].b }, [[PG]]/z, [x1, x[[A]]]
 ; VBITS_EQ_256-DAG: smin [[RES_LO:z[0-9]+]].b, [[PG]]/m, [[OP1_LO]].b, 
[[OP2_LO]].b
 ; VBITS_EQ_256-DAG: smin [[RES_HI:z[0-9]+]].b, [[PG]]/m, [[OP1_HI]].b, 
[[OP2_HI]].b
-; VBITS_EQ_256-DAG: st1b { [[RES_LO]].b }, [[PG]], [x0, x[[A]]]
-; VBITS_EQ_256-DAG: st1b { [[RES_HI]].b }, [[PG]], [x0]
+; VBITS_EQ_256-DAG: st1b { [[RES_LO]].b }, [[PG]], [x0]
+; VBITS_EQ_256-DAG: st1b { [[RES_HI]].b }, [[PG]], [x0, x[[A]]]
   %op1 = load <64 x i8>, <64 x i8>* %a
   %op2 = load <64 x i8>, <64 x i8>* %b
   %res = call <64 x i8> @llvm.smin.v64i8(<64 x i8> %op1, <64 x i8> %op2)
@@ -814,14 +814,14 @@ define void @umax_v64i8(<64 x i8>* %a, <64 x i8>* %b) #0 {
 ; Ensure sensible type legalisation.
 ; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].b, vl32
 ; VBITS_EQ_256-DAG: mov w[[A:[0-9]+]], #32
-; VBITS_EQ_256-DAG: ld1b { [[OP1_LO:z[0-9]+]].b }, [[PG]]/z, [x0, x[[A]]]
-; VBITS_EQ_256-DAG: ld1b { [[OP1_HI:z[0-9]+]].b }, [[PG]]/z, [x0]
-; VBITS_EQ_256-DAG: ld1b { [[OP2_LO:z[0-9]+]].b }, [[PG]]/z, [x1, x[[A]]]
-; VBITS_EQ_256-DAG: ld1b { [[OP2_HI:z[0-9]+]].b }, [[PG]]/z, [x1]
+; VBITS_EQ_256-DAG: ld1b { [[OP1_LO:z[0-9]+]].b }, [[PG]]/z, [x0]
+; VBITS_EQ_256-DAG: ld1b { [[OP1_HI:z[0-9]+]].b }, [[PG]]/z, [x0, x[[A]]]
+; VBITS_EQ_256-DAG: ld1b { [[OP2_LO:z[0-9]+]].b }, [[PG]]/z, [x1]
+; VBITS_EQ_256-DAG: ld1b { [[OP2_HI:z[0-9]+]].b }, [[PG]]/z, [x1, x[[A]]]
 ; VBITS_EQ_256-DAG: umax [[RES_LO:z[0-9]+]].b, [[PG]]/m, [[OP1_LO]].b, 
[[OP2_LO]].b
 ; VBITS_EQ_256-DAG: umax [[RES_HI:z[0-9]+]].b, [[PG]]/m, [[OP1_HI]].b, 
[[OP2_HI]].b
-; VBITS_EQ_256-DAG: st1b { [[RES_LO]].b }, [[PG]], [x0, x[[A]]]
-; VBITS_EQ_256-DAG: st1b { [[RES_HI]].b }, [[PG]], [x0]
+; VBITS_EQ_256-DAG: st1b { [[RES_LO]].b }, [[PG]], [x0]
+; VBITS_EQ_256-DAG: st1b { [[RES_HI]].b }, [[PG]], [x0, x[[A]]]
 ; VBITS_EQ_256-NEXT: ret
   %op1 = load <64 x i8>, <64 x i8>* %a
   %op2 = load <64 x i8>, <64 x i8>* %b
@@ -1187,14 +1187,14 @@ define void @umin_v64i8(<64 x i8>* %a, <64 x i8>* %b) 
#0 {
 ; Ensure sensible type legalisation.
 ; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].b, vl32
 ; VBITS_EQ_256-DAG: mov w[[

[llvm-branch-commits] [llvm] [AArch64] Update feature dep. for Armv9.6 extensions (#125874) (PR #126210)

2025-02-07 Thread Paul Walker via llvm-branch-commits

https://github.com/paulwalker-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/126210
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits