[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-21 Thread Joe Nash via llvm-branch-commits


@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, 
unsigned RegClassID) {
   CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);
 }
 
+void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) {
+  EVT VT = N->getValueType(0);
+  EVT EltVT = VT.getVectorElementType();
+
+  // TODO: Handle 16-bit element vectors with even aligned masks.
+  if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) ||
+  VT.getVectorNumElements() != 2) {
+SelectCode(N);
+return;
+  }
+
+  auto *SVN = cast(N);
+
+  SDValue Src0 = SVN->getOperand(0);
+  SDValue Src1 = SVN->getOperand(1);
+  ArrayRef Mask = SVN->getMask();
+  SDLoc DL(N);
+
+  assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 &&
+ Mask[0] < 4 && Mask[1] < 4);
+
+  SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1;
+  SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1;
+  unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+  unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+
+  if (Mask[0] < 0) {
+Src0SubReg = Src1SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc0 = SDValue(ImpDef, 0);
+  }
+
+  if (Mask[1] < 0) {
+Src1SubReg = Src0SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc1 = SDValue(ImpDef, 0);
+  }
+
+  // SGPR case needs to lower to copies.
+  //
+  // Also use subregister extract when we can directly blend the registers with
+  // a simple subregister copy.
+  //
+  // TODO: Maybe we should fold this out earlier
+  if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 &&
+  Src1SubReg == AMDGPU::sub0) {
+// The low element of the result always comes from src0.
+// The high element of the result always comes from src1.
+// op_sel selects the high half of src0.
+// op_sel_hi selects the high half of src1.
+
+unsigned Src0OpSel =
+Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;
+unsigned Src1OpSel =
+Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;

Sisyph wrote:

It is written in a very confusing way in the docs, but I think you have it 
correct in the code. Out of the 6 bits (op_sel[0-2] and op_sel_hi[0-2]) only 
op_sel[0] and op_sel[1] do anything iiuc. 

https://github.com/llvm/llvm-project/pull/123684
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-21 Thread Matt Arsenault via llvm-branch-commits


@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, 
unsigned RegClassID) {
   CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);
 }
 
+void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) {
+  EVT VT = N->getValueType(0);
+  EVT EltVT = VT.getVectorElementType();
+
+  // TODO: Handle 16-bit element vectors with even aligned masks.
+  if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) ||
+  VT.getVectorNumElements() != 2) {
+SelectCode(N);
+return;
+  }
+
+  auto *SVN = cast(N);
+
+  SDValue Src0 = SVN->getOperand(0);
+  SDValue Src1 = SVN->getOperand(1);
+  ArrayRef Mask = SVN->getMask();
+  SDLoc DL(N);
+
+  assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 &&
+ Mask[0] < 4 && Mask[1] < 4);
+
+  SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1;
+  SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1;
+  unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+  unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+
+  if (Mask[0] < 0) {
+Src0SubReg = Src1SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc0 = SDValue(ImpDef, 0);
+  }
+
+  if (Mask[1] < 0) {
+Src1SubReg = Src0SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc1 = SDValue(ImpDef, 0);
+  }
+
+  // SGPR case needs to lower to copies.
+  //
+  // Also use subregister extract when we can directly blend the registers with
+  // a simple subregister copy.
+  //
+  // TODO: Maybe we should fold this out earlier
+  if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 &&
+  Src1SubReg == AMDGPU::sub0) {
+// The low element of the result always comes from src0.
+// The high element of the result always comes from src1.
+// op_sel selects the high half of src0.
+// op_sel_hi selects the high half of src1.
+
+unsigned Src0OpSel =
+Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;
+unsigned Src1OpSel =
+Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;

arsenm wrote:

I'm not sure this is correctly encoded. I'm confused by how op_sel and 
op_sel_hi are supposed to be represented. We set fields in the source 
modifiers. I guess this should probably be OP_SEL_1?

https://github.com/llvm/llvm-project/pull/123684
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-20 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: undef deprecator found issues in your code. :warning:



You can test this locally with the following command:


``bash
git diff -U0 --pickaxe-regex -S 
'([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 
2d8035aff3d44bd59f4ff3af60f87c7d6e6219ea 
c5caf560857f3c4f71416940a528df5ce75212bc 
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp 
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h 
llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v2f32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v3f32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v4f32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v8f32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v2i32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v3i32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v4i32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v8i32.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v2p3.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v3p3.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v4p3.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v8p3.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll 
llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll 
llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll 
llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll
``




The following files introduce new uses of undef:
 - llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

[Undef](https://llvm.org/docs/LangRef.html#undefined-values) is now deprecated 
and should only be used in the rare cases where no replacement is possible. For 
example, a load of uninitialized memory yields `undef`. You should use `poison` 
values for placeholders instead.

In tests, avoid using `undef` and having tests that trigger undefined behavior. 
If you need an operand with some unimportant value, you can add a new argument 
to the function and use that instead.

For example, this is considered a bad practice:
```llvm
define void @fn() {
  ...
  br i1 undef, ...
}
```

Please use the following instead:
```llvm
define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}
```

Please refer to the [Undefined Behavior 
Manual](https://llvm.org/docs/UndefinedBehavior.html) for more information.



https://github.com/llvm/llvm-project/pull/123684
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-20 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/123684
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-20 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

For VALU shuffles, this saves an instruction in some case.

---

Patch is 285.82 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/123684.diff


19 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+114) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h (+1) 
- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+7) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v2f32.ll (+21-28) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v3f32.ll (+17-23) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v4f32.ll (+34-50) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v8f32.ll (+112-160) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v2i32.ll (+21-28) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v3i32.ll (+17-23) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v4i32.ll (+34-50) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v8i32.ll (+112-160) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v2p3.ll (+21-28) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v3p3.ll (+17-23) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v4p3.ll (+34-50) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v8p3.ll (+112-160) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll (+500-287) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll (+500-287) 
- (modified) llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll (+48-48) 
- (modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll 
(+1-2) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 6d5c3b5e0742b3..8d03fde8911242 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, 
unsigned RegClassID) {
   CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);
 }
 
+void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) {
+  EVT VT = N->getValueType(0);
+  EVT EltVT = VT.getVectorElementType();
+
+  // TODO: Handle 16-bit element vectors with even aligned masks.
+  if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) ||
+  VT.getVectorNumElements() != 2) {
+SelectCode(N);
+return;
+  }
+
+  auto *SVN = cast(N);
+
+  SDValue Src0 = SVN->getOperand(0);
+  SDValue Src1 = SVN->getOperand(1);
+  ArrayRef Mask = SVN->getMask();
+  SDLoc DL(N);
+
+  assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 &&
+ Mask[0] < 4 && Mask[1] < 4);
+
+  SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1;
+  SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1;
+  unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+  unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0;
+
+  if (Mask[0] < 0) {
+Src0SubReg = Src1SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc0 = SDValue(ImpDef, 0);
+  }
+
+  if (Mask[1] < 0) {
+Src1SubReg = Src0SubReg;
+MachineSDNode *ImpDef =
+CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
+VSrc1 = SDValue(ImpDef, 0);
+  }
+
+  // SGPR case needs to lower to copies.
+  //
+  // Also use subregister extract when we can directly blend the registers with
+  // a simple subregister copy.
+  //
+  // TODO: Maybe we should fold this out earlier
+  if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 &&
+  Src1SubReg == AMDGPU::sub0) {
+// The low element of the result always comes from src0.
+// The high element of the result always comes from src1.
+// op_sel selects the high half of src0.
+// op_sel_hi selects the high half of src1.
+
+unsigned Src0OpSel =
+Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;
+unsigned Src1OpSel =
+Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE;
+
+SDValue Src0OpSelVal = CurDAG->getTargetConstant(Src0OpSel, DL, MVT::i32);
+SDValue Src1OpSelVal = CurDAG->getTargetConstant(Src1OpSel, DL, MVT::i32);
+SDValue ZeroMods = CurDAG->getTargetConstant(0, DL, MVT::i32);
+
+CurDAG->SelectNodeTo(N, AMDGPU::V_PK_MOV_B32, N->getVTList(),
+ {Src0OpSelVal, VSrc0, Src1OpSelVal, VSrc1,
+  ZeroMods,   // clamp
+  ZeroMods,   // op_sel
+  ZeroMods,   // op_sel_hi
+  ZeroMods,   // neg_lo
+  ZeroMods}); // neg_hi
+return;
+  }
+
+  SDValue ResultElt0 =
+  CurDAG->getTargetExtractSubreg(Src0SubReg, DL, EltVT, VSrc0);
+  SDValue ResultElt1 =
+  CurDAG->getTargetExtractSubreg(Src1SubReg, DL, EltVT, VSrc1);
+
+  const SDValu

[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)

2025-01-20 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#123684** https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#123596** https://app.graphite.dev/github/pr/llvm/llvm-project/123596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/123684
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits