[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)
@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) { CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs); } +void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) { + EVT VT = N->getValueType(0); + EVT EltVT = VT.getVectorElementType(); + + // TODO: Handle 16-bit element vectors with even aligned masks. + if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) || + VT.getVectorNumElements() != 2) { +SelectCode(N); +return; + } + + auto *SVN = cast(N); + + SDValue Src0 = SVN->getOperand(0); + SDValue Src1 = SVN->getOperand(1); + ArrayRef Mask = SVN->getMask(); + SDLoc DL(N); + + assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 && + Mask[0] < 4 && Mask[1] < 4); + + SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1; + SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1; + unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + + if (Mask[0] < 0) { +Src0SubReg = Src1SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc0 = SDValue(ImpDef, 0); + } + + if (Mask[1] < 0) { +Src1SubReg = Src0SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc1 = SDValue(ImpDef, 0); + } + + // SGPR case needs to lower to copies. + // + // Also use subregister extract when we can directly blend the registers with + // a simple subregister copy. + // + // TODO: Maybe we should fold this out earlier + if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 && + Src1SubReg == AMDGPU::sub0) { +// The low element of the result always comes from src0. +// The high element of the result always comes from src1. +// op_sel selects the high half of src0. +// op_sel_hi selects the high half of src1. + +unsigned Src0OpSel = +Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; +unsigned Src1OpSel = +Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; Sisyph wrote: It is written in a very confusing way in the docs, but I think you have it correct in the code. Out of the 6 bits (op_sel[0-2] and op_sel_hi[0-2]) only op_sel[0] and op_sel[1] do anything iiuc. https://github.com/llvm/llvm-project/pull/123684 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)
@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) { CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs); } +void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) { + EVT VT = N->getValueType(0); + EVT EltVT = VT.getVectorElementType(); + + // TODO: Handle 16-bit element vectors with even aligned masks. + if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) || + VT.getVectorNumElements() != 2) { +SelectCode(N); +return; + } + + auto *SVN = cast(N); + + SDValue Src0 = SVN->getOperand(0); + SDValue Src1 = SVN->getOperand(1); + ArrayRef Mask = SVN->getMask(); + SDLoc DL(N); + + assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 && + Mask[0] < 4 && Mask[1] < 4); + + SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1; + SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1; + unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + + if (Mask[0] < 0) { +Src0SubReg = Src1SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc0 = SDValue(ImpDef, 0); + } + + if (Mask[1] < 0) { +Src1SubReg = Src0SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc1 = SDValue(ImpDef, 0); + } + + // SGPR case needs to lower to copies. + // + // Also use subregister extract when we can directly blend the registers with + // a simple subregister copy. + // + // TODO: Maybe we should fold this out earlier + if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 && + Src1SubReg == AMDGPU::sub0) { +// The low element of the result always comes from src0. +// The high element of the result always comes from src1. +// op_sel selects the high half of src0. +// op_sel_hi selects the high half of src1. + +unsigned Src0OpSel = +Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; +unsigned Src1OpSel = +Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; arsenm wrote: I'm not sure this is correctly encoded. I'm confused by how op_sel and op_sel_hi are supposed to be represented. We set fields in the source modifiers. I guess this should probably be OP_SEL_1? https://github.com/llvm/llvm-project/pull/123684 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)
github-actions[bot] wrote: :warning: undef deprecator found issues in your code. :warning: You can test this locally with the following command: ``bash git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 2d8035aff3d44bd59f4ff3af60f87c7d6e6219ea c5caf560857f3c4f71416940a528df5ce75212bc llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v2f32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v3f32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v4f32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v8f32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v2i32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v3i32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v4i32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v8i32.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v2p3.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v3p3.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v4p3.ll llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v8p3.ll llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll `` The following files introduce new uses of undef: - llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp [Undef](https://llvm.org/docs/LangRef.html#undefined-values) is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields `undef`. You should use `poison` values for placeholders instead. In tests, avoid using `undef` and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead. For example, this is considered a bad practice: ```llvm define void @fn() { ... br i1 undef, ... } ``` Please use the following instead: ```llvm define void @fn(i1 %cond) { ... br i1 %cond, ... } ``` Please refer to the [Undefined Behavior Manual](https://llvm.org/docs/UndefinedBehavior.html) for more information. https://github.com/llvm/llvm-project/pull/123684 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/123684 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes For VALU shuffles, this saves an instruction in some case. --- Patch is 285.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123684.diff 19 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+114) - (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h (+1) - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+7) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v2f32.ll (+21-28) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v3f32.ll (+17-23) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v4f32.ll (+34-50) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v8f32.ll (+112-160) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v2i32.ll (+21-28) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v3i32.ll (+17-23) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v4i32.ll (+34-50) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v8i32.ll (+112-160) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v2p3.ll (+21-28) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v3p3.ll (+17-23) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v4p3.ll (+34-50) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v8p3.ll (+112-160) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll (+500-287) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll (+500-287) - (modified) llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll (+48-48) - (modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll (+1-2) ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index 6d5c3b5e0742b3..8d03fde8911242 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) { CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs); } +void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) { + EVT VT = N->getValueType(0); + EVT EltVT = VT.getVectorElementType(); + + // TODO: Handle 16-bit element vectors with even aligned masks. + if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) || + VT.getVectorNumElements() != 2) { +SelectCode(N); +return; + } + + auto *SVN = cast(N); + + SDValue Src0 = SVN->getOperand(0); + SDValue Src1 = SVN->getOperand(1); + ArrayRef Mask = SVN->getMask(); + SDLoc DL(N); + + assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 && + Mask[0] < 4 && Mask[1] < 4); + + SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1; + SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1; + unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + + if (Mask[0] < 0) { +Src0SubReg = Src1SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc0 = SDValue(ImpDef, 0); + } + + if (Mask[1] < 0) { +Src1SubReg = Src0SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc1 = SDValue(ImpDef, 0); + } + + // SGPR case needs to lower to copies. + // + // Also use subregister extract when we can directly blend the registers with + // a simple subregister copy. + // + // TODO: Maybe we should fold this out earlier + if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 && + Src1SubReg == AMDGPU::sub0) { +// The low element of the result always comes from src0. +// The high element of the result always comes from src1. +// op_sel selects the high half of src0. +// op_sel_hi selects the high half of src1. + +unsigned Src0OpSel = +Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; +unsigned Src1OpSel = +Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; + +SDValue Src0OpSelVal = CurDAG->getTargetConstant(Src0OpSel, DL, MVT::i32); +SDValue Src1OpSelVal = CurDAG->getTargetConstant(Src1OpSel, DL, MVT::i32); +SDValue ZeroMods = CurDAG->getTargetConstant(0, DL, MVT::i32); + +CurDAG->SelectNodeTo(N, AMDGPU::V_PK_MOV_B32, N->getVTList(), + {Src0OpSelVal, VSrc0, Src1OpSelVal, VSrc1, + ZeroMods, // clamp + ZeroMods, // op_sel + ZeroMods, // op_sel_hi + ZeroMods, // neg_lo + ZeroMods}); // neg_hi +return; + } + + SDValue ResultElt0 = + CurDAG->getTargetExtractSubreg(Src0SubReg, DL, EltVT, VSrc0); + SDValue ResultElt1 = + CurDAG->getTargetExtractSubreg(Src1SubReg, DL, EltVT, VSrc1); + + const SDValu
[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#123684** https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/123684?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#123596** https://app.graphite.dev/github/pr/llvm/llvm-project/123596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/123684 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits