[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits


@@ -838,20 +841,24 @@ class ComponentLayout {
   const ComponentKind Kind;
   const ComponentProps PrevComp;
   const unsigned VOPD3ModsNum;
-  const int BitOp3Idx; // Index of bitop3 operand or -1
+  const int BitOp3Idx;// Index of bitop3 operand or -1
+  const bool IsVOP3PDot2; // True for V_DOT2_F32_F16 / V_DOT2_F32_BF16
 
 public:
   // Create layout for COMPONENT_X or SINGLE component.
-  ComponentLayout(ComponentKind Kind, unsigned VOPD3ModsNum, int BitOp3Idx)
-  : Kind(Kind), VOPD3ModsNum(VOPD3ModsNum), BitOp3Idx(BitOp3Idx) {
+  ComponentLayout(ComponentKind Kind, unsigned VOPD3ModsNum, int BitOp3Idx,
+  bool IsVOP3PDot2 = false)
+  : Kind(Kind), VOPD3ModsNum(VOPD3ModsNum), BitOp3Idx(BitOp3Idx),
+IsVOP3PDot2(IsVOP3PDot2) {

petar-avramovic wrote:

Not sure. I wanted to treat VOP3 dot instructions, with some extra checks, as 
VOP2. Will not merge this one unit we resolve what should be done.

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits


@@ -2,8 +2,8 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx906 < %s | FileCheck %s 
--check-prefixes=GCN,GFX906
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx950 < %s | FileCheck %s 
--check-prefixes=GCN,GFX950
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 < %s | FileCheck %s 
--check-prefixes=GCN,GFX10
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1170 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX1170
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11PLUS-TRUE16,GFX11
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1170 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11PLUS-TRUE16,GFX1170
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX12

petar-avramovic wrote:

I would just drop GFX11PLUS-TRUE16 for now then, and update test when 
https://github.com/llvm/llvm-project/pull/187514 gets merged.

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Joe Nash via llvm-branch-commits


@@ -2,8 +2,8 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx906 < %s | FileCheck %s 
--check-prefixes=GCN,GFX906
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx950 < %s | FileCheck %s 
--check-prefixes=GCN,GFX950
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 < %s | FileCheck %s 
--check-prefixes=GCN,GFX10
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1170 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX1170
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11PLUS-TRUE16,GFX11
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1170 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11PLUS-TRUE16,GFX1170
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX12

Sisyph wrote:

As is, the naming of GFX11PLUS-TRUE16 is confusing, since it does not include 
GFX12. Please add a true16 runline for GFX12. 

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Joe Nash via llvm-branch-commits


@@ -838,20 +841,24 @@ class ComponentLayout {
   const ComponentKind Kind;
   const ComponentProps PrevComp;
   const unsigned VOPD3ModsNum;
-  const int BitOp3Idx; // Index of bitop3 operand or -1
+  const int BitOp3Idx;// Index of bitop3 operand or -1
+  const bool IsVOP3PDot2; // True for V_DOT2_F32_F16 / V_DOT2_F32_BF16
 
 public:
   // Create layout for COMPONENT_X or SINGLE component.
-  ComponentLayout(ComponentKind Kind, unsigned VOPD3ModsNum, int BitOp3Idx)
-  : Kind(Kind), VOPD3ModsNum(VOPD3ModsNum), BitOp3Idx(BitOp3Idx) {
+  ComponentLayout(ComponentKind Kind, unsigned VOPD3ModsNum, int BitOp3Idx,
+  bool IsVOP3PDot2 = false)
+  : Kind(Kind), VOPD3ModsNum(VOPD3ModsNum), BitOp3Idx(BitOp3Idx),
+IsVOP3PDot2(IsVOP3PDot2) {

Sisyph wrote:

It less maintainable to keep adding more class members for specific 
instructions. Does it make sense to rename VOPD3ModsNum to VOP3ModsNum, and 
handle dual_dot2c generically as a VOP3? @rampitec 

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Joe Nash via llvm-branch-commits


@@ -2,8 +2,8 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx906 < %s | FileCheck %s 
--check-prefixes=GCN,GFX906
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx950 < %s | FileCheck %s 
--check-prefixes=GCN,GFX950
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 < %s | FileCheck %s 
--check-prefixes=GCN,GFX10
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1170 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX1170
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11PLUS-TRUE16,GFX11
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1170 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX11PLUS-TRUE16,GFX1170
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 < %s | FileCheck %s 
--check-prefixes=GCN,GFX11PLUS,GFX12

Sisyph wrote:

FYI True16 will soon become the default for gfx12 
https://github.com/llvm/llvm-project/pull/187514/changes

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 65d9780bbba6439d9a0fb65ade4d1e5568138679 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  20 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 383 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 20e811503256e..a94e242586293 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2734,6 +2734,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..601acaf4c79bf 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,23 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+// Check if MI is a VOP3 instruction with operands that satisfy the constraints
+// for mapping it to a VOP2/VOPD opcode: no modifiers, no clamp, src1 and src2
+// are registers (src0 can be register or literal), and src2 is same as dst.
+static bool canMapVOP3ToVOPD(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +61,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !canMapVOP3ToVOPD(MIX)) ||
+   (TII.isVOP3(MIY) && !canMapVOP3ToVOPD(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..238939288089c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->get

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 65d9780bbba6439d9a0fb65ade4d1e5568138679 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  20 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 383 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 20e811503256e..a94e242586293 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2734,6 +2734,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..601acaf4c79bf 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,23 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+// Check if MI is a VOP3 instruction with operands that satisfy the constraints
+// for mapping it to a VOP2/VOPD opcode: no modifiers, no clamp, src1 and src2
+// are registers (src0 can be register or literal), and src2 is same as dst.
+static bool canMapVOP3ToVOPD(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +61,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !canMapVOP3ToVOPD(MIX)) ||
+   (TII.isVOP3(MIY) && !canMapVOP3ToVOPD(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..238939288089c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->get

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 178dc56e767ad32aa9f9bcadc28b916ac379e7c9 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  20 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 383 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 20e811503256e..a94e242586293 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2734,6 +2734,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..601acaf4c79bf 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,23 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+// Check if MI is a VOP3 instruction with operands that satisfy the constraints
+// for mapping it to a VOP2/VOPD opcode: no modifiers, no clamp, src1 and src2
+// are registers (src0 can be register or literal), and src2 is same as dst.
+static bool canMapVOP3ToVOPD(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +61,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !canMapVOP3ToVOPD(MIX)) ||
+   (TII.isVOP3(MIY) && !canMapVOP3ToVOPD(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..238939288089c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->get

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 178dc56e767ad32aa9f9bcadc28b916ac379e7c9 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  20 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 383 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 20e811503256e..a94e242586293 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2734,6 +2734,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..601acaf4c79bf 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,23 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+// Check if MI is a VOP3 instruction with operands that satisfy the constraints
+// for mapping it to a VOP2/VOPD opcode: no modifiers, no clamp, src1 and src2
+// are registers (src0 can be register or literal), and src2 is same as dst.
+static bool canMapVOP3ToVOPD(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +61,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !canMapVOP3ToVOPD(MIX)) ||
+   (TII.isVOP3(MIY) && !canMapVOP3ToVOPD(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..238939288089c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->get

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Mirko Brkušanin via llvm-branch-commits

https://github.com/mbrkusanin approved this pull request.

LGTM. Thanks

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From eaf5d355a4b42583c33a94cc9a1b31ea2c2c3090 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  20 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  28 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 384 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 20e811503256e..a94e242586293 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2734,6 +2734,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..601acaf4c79bf 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,23 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+// Check if MI is a VOP3 instruction with operands that satisfy the constraints
+// for mapping it to a VOP2/VOPD opcode: no modifiers, no clamp, src1 and src2
+// are registers (src0 can be register or literal), and src2 is same as dst.
+static bool canMapVOP3ToVOPD(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +61,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !canMapVOP3ToVOPD(MIX)) ||
+   (TII.isVOP3(MIY) && !canMapVOP3ToVOPD(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..1477f03fec4a1 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->get

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From dd68fc3072cb0ba7dacd4eb1d19d81c8947ff198 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  20 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 383 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 20e811503256e..a94e242586293 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2734,6 +2734,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..601acaf4c79bf 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,23 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+// Check if MI is a VOP3 instruction with operands that satisfy the constraints
+// for mapping it to a VOP2/VOPD opcode: no modifiers, no clamp, src1 and src2
+// are registers (src0 can be register or literal), and src2 is same as dst.
+static bool canMapVOP3ToVOPD(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +61,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !canMapVOP3ToVOPD(MIX)) ||
+   (TII.isVOP3(MIY) && !canMapVOP3ToVOPD(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..238939288089c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->get

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-23 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From dd68fc3072cb0ba7dacd4eb1d19d81c8947ff198 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  20 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 383 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 20e811503256e..a94e242586293 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2734,6 +2734,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..601acaf4c79bf 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,23 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+// Check if MI is a VOP3 instruction with operands that satisfy the constraints
+// for mapping it to a VOP2/VOPD opcode: no modifiers, no clamp, src1 and src2
+// are registers (src0 can be register or literal), and src2 is same as dst.
+static bool canMapVOP3ToVOPD(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +61,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !canMapVOP3ToVOPD(MIX)) ||
+   (TII.isVOP3(MIY) && !canMapVOP3ToVOPD(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..238939288089c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->get

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-19 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 66ed7e4cb7c33b4868ab1fc0865aa7a85e3c2146 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  17 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 380 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index d87e612cedd54..5d160b8dbfd25 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2718,6 +2718,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..e2b7c23803bcb 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,20 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+static bool isVOPDDot(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +58,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !isVOPDDot(MIX)) ||
+   (TII.isVOP3(MIY) && !isVOPDDot(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..0af4921040aad 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->getOperand(3).setImm(SavedMod);
+MI->getOperand(4).setReg(MI->getOperand(2).getReg());
+OpNo = 2;
+  }
+
   bool IsLegal = OpToFold.isOperandLegal(*TII, *MI, OpNo);
   if (!IsLegal && OpToFold.isImm()) {
 if (std::optional ImmVal = OpToFold.getEffecti

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-19 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 66ed7e4cb7c33b4868ab1fc0865aa7a85e3c2146 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 19 Mar 2026 12:57:25 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  17 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 380 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index d87e612cedd54..5d160b8dbfd25 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2718,6 +2718,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..e2b7c23803bcb 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,20 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+static bool isVOPDDot(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +58,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !isVOPDDot(MIX)) ||
+   (TII.isVOP3(MIY) && !isVOPDDot(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..0af4921040aad 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->getOperand(3).setImm(SavedMod);
+MI->getOperand(4).setReg(MI->getOperand(2).getReg());
+OpNo = 2;
+  }
+
   bool IsLegal = OpToFold.isOperandLegal(*TII, *MI, OpNo);
   if (!IsLegal && OpToFold.isImm()) {
 if (std::optional ImmVal = OpToFold.getEffecti

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-18 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 81ecd00f7cdf7539ddea8748a3e63259e52294d3 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Wed, 18 Mar 2026 12:49:49 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  17 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 380 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 616effeb5b9f2..0b0a72cae901d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2706,6 +2706,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..e2b7c23803bcb 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,20 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+static bool isVOPDDot(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +58,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !isVOPDDot(MIX)) ||
+   (TII.isVOP3(MIY) && !isVOPDDot(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..0af4921040aad 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->getOperand(3).setImm(SavedMod);
+MI->getOperand(4).setReg(MI->getOperand(2).getReg());
+OpNo = 2;
+  }
+
   bool IsLegal = OpToFold.isOperandLegal(*TII, *MI, OpNo);
   if (!IsLegal && OpToFold.isImm()) {
 if (std::optional ImmVal = OpToFold.getEffecti

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-18 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 81ecd00f7cdf7539ddea8748a3e63259e52294d3 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Wed, 18 Mar 2026 12:49:49 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |  17 +-
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp |  27 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   4 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  29 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 191 +++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 482 ++
 10 files changed, 380 insertions(+), 422 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 616effeb5b9f2..0b0a72cae901d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2706,6 +2706,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index b17cabf37d53f..e2b7c23803bcb 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -34,6 +34,20 @@ using namespace llvm;
 
 #define DEBUG_TYPE "gcn-vopd-utils"
 
+static bool isVOPDDot(const MachineInstr &MI) {
+  unsigned Opc = MI.getOpcode();
+  if (Opc != AMDGPU::V_DOT2_F32_F16 && Opc != AMDGPU::V_DOT2_F32_BF16)
+return false;
+  // src0 can be register or literal
+  return MI.getOperand(1).getImm() == SISrcMods::OP_SEL_1 && // default 
src0mods
+ MI.getOperand(3).getImm() == SISrcMods::OP_SEL_1 && // default 
src1mods
+ MI.getOperand(4).isReg() && // register src1
+ MI.getOperand(5).getImm() == SISrcMods::OP_SEL_1 && // default 
src2mods
+ MI.getOperand(6).isReg() && // register src2
+ MI.getOperand(7).getImm() == 0 &&   // no clamp
+ MI.getOperand(0).getReg() == MI.getOperand(6).getReg(); // dst == src2
+}
+
 bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
const MachineInstr &MIX,
const MachineInstr &MIY, bool IsVOPD3) {
@@ -44,7 +58,8 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && ((TII.isVOP3(MIX) && !isVOPDDot(MIX)) ||
+   (TII.isVOP3(MIY) && !isVOPDDot(MIY
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index a2fe31bd849c3..0af4921040aad 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -827,6 +827,19 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
 return false;
   };
 
+  // For V_DOT2ACC pseudos, prefer folding literals into src0 (operand 2) for
+  // VOPD compatibility. If the literal would go into src1 (operand 4), commute
+  // src0 and src1 (including their modifiers) so the literal ends up in src0.
+  if (OpToFold.isImm() && OpNo == 4 &&
+  (Opc == AMDGPU::V_DOT2ACC_F32_F16_PSEUDO ||
+   Opc == AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO)) {
+int64_t SavedMod = MI->getOperand(1).getImm();
+MI->getOperand(1).setImm(MI->getOperand(3).getImm());
+MI->getOperand(3).setImm(SavedMod);
+MI->getOperand(4).setReg(MI->getOperand(2).getReg());
+OpNo = 2;
+  }
+
   bool IsLegal = OpToFold.isOperandLegal(*TII, *MI, OpNo);
   if (!IsLegal && OpToFold.isImm()) {
 if (std::optional ImmVal = OpToFold.getEffecti

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-03-05 Thread Mirko Brkušanin via llvm-branch-commits


@@ -297,7 +316,10 @@ define float @v_fdot2_f32_bf16_inline_literal_c(<2 x 
bfloat> %a, <2 x bfloat> %b
 ;
 ; GFX11PLUS-LABEL: v_fdot2_f32_bf16_inline_literal_c:
 ; GFX11PLUS:  ; %bb.0:
-; GFX11PLUS:v_dot2_f32_bf16 v0, v0, v1, 2.0
+; GFX11PLUS:s_mov_b32 s0, 2.0
+; GFX11PLUS:v_mov_b32_e32 v2, s0
+; GFX11PLUS:v_dot2_f32_bf16 v2, v0, v1, v2
+; GFX11PLUS:v_mov_b32_e32 v0, v2

mbrkusanin wrote:

I guess this is a tradeoff between keeping `v_dot2_f32_bf16` in a form so that 
it can be a VOPD candidate vs. having optimal register allocation and being 
able to inline constants.

Extra `v_mov`(s) can potentially be eliminated in tests with more instructions, 
but `s_mov` from constant not being folded will stay. Maybe we could transform 
`VOP2 pseudo` to `VOP3` in SIFoldOperands if there is a constant that can be 
folded. In that case we are definitely eliminating one instruction vs maybe 
eliminating one by creating v_dual later.

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-20 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 08fd28025a4510bf489c6d5d5f8a0df3222fa99d Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 12 Feb 2026 18:02:57 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  |   5 +-
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |   2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |  16 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   8 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 ++-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 186 +++--
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 256 ++
 10 files changed, 315 insertions(+), 210 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 07fb32173c2a3..1f4f1fbc15622 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2645,6 +2645,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 72805aa9165b6..0118c2436d7a4 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -94,14 +94,15 @@ class GCNCreateVOPD {
 for (auto CompIdx : VOPD::COMPONENTS) {
   auto CompSrcOprNum = InstInfo[CompIdx].getCompSrcOperandsNum();
   bool IsVOP3 = SII->isVOP3(*MI[CompIdx]);
+  bool IsVOP3Dot = IsVOP3 && SII->isDOT(*MI[CompIdx]);
   for (unsigned CompSrcIdx = 0; CompSrcIdx < CompSrcOprNum; ++CompSrcIdx) {
 if (AMDGPU::hasNamedOperand(VOPDOpc, Mods[CompIdx][CompSrcIdx])) {
   const MachineOperand *Mod =
   SII->getNamedOperand(*MI[CompIdx], SrcMods[CompSrcIdx]);
   VOPDInst.addImm(Mod ? Mod->getImm() : 0);
 }
-auto MCOprIdx =
-InstInfo[CompIdx].getIndexOfSrcInMCOperands(CompSrcIdx, IsVOP3);
+auto MCOprIdx = InstInfo[CompIdx].getIndexOfSrcInMCOperands(
+CompSrcIdx, IsVOP3, IsVOP3Dot);
 VOPDInst.add(MI[CompIdx]->getOperand(MCOprIdx));
   }
   if (MI[CompIdx]->getOpcode() == AMDGPU::V_CNDMASK_B32_e32 && CI.IsVOPD3)
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 663f53889ac74..4300d5a3a8dd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -44,7 +44,7 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && (TII.isVOP3WithoutVOPD(MIX) || TII.isVOP3WithoutVOPD(MIY)))
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index b051f790118ef..9dabafe1a0a12 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2069,6 +2069,16 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+MI.untieRegOperand(6);
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+MI.untieRegOperand(6);
+break;
+
   case AMDGPU::S_MOV_B64_term:
 // This is only a terminator to get the correct spill code placement during
 // register allocation.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index c945533f0

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-20 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 08fd28025a4510bf489c6d5d5f8a0df3222fa99d Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 12 Feb 2026 18:02:57 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  |   5 +-
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |   2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |  16 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   8 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 ++-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 186 +++--
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 256 ++
 10 files changed, 315 insertions(+), 210 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 07fb32173c2a3..1f4f1fbc15622 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2645,6 +2645,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 72805aa9165b6..0118c2436d7a4 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -94,14 +94,15 @@ class GCNCreateVOPD {
 for (auto CompIdx : VOPD::COMPONENTS) {
   auto CompSrcOprNum = InstInfo[CompIdx].getCompSrcOperandsNum();
   bool IsVOP3 = SII->isVOP3(*MI[CompIdx]);
+  bool IsVOP3Dot = IsVOP3 && SII->isDOT(*MI[CompIdx]);
   for (unsigned CompSrcIdx = 0; CompSrcIdx < CompSrcOprNum; ++CompSrcIdx) {
 if (AMDGPU::hasNamedOperand(VOPDOpc, Mods[CompIdx][CompSrcIdx])) {
   const MachineOperand *Mod =
   SII->getNamedOperand(*MI[CompIdx], SrcMods[CompSrcIdx]);
   VOPDInst.addImm(Mod ? Mod->getImm() : 0);
 }
-auto MCOprIdx =
-InstInfo[CompIdx].getIndexOfSrcInMCOperands(CompSrcIdx, IsVOP3);
+auto MCOprIdx = InstInfo[CompIdx].getIndexOfSrcInMCOperands(
+CompSrcIdx, IsVOP3, IsVOP3Dot);
 VOPDInst.add(MI[CompIdx]->getOperand(MCOprIdx));
   }
   if (MI[CompIdx]->getOpcode() == AMDGPU::V_CNDMASK_B32_e32 && CI.IsVOPD3)
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 663f53889ac74..4300d5a3a8dd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -44,7 +44,7 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && (TII.isVOP3WithoutVOPD(MIX) || TII.isVOP3WithoutVOPD(MIY)))
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index b051f790118ef..9dabafe1a0a12 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2069,6 +2069,16 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+MI.untieRegOperand(6);
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+MI.untieRegOperand(6);
+break;
+
   case AMDGPU::S_MOV_B64_term:
 // This is only a terminator to get the correct spill code placement during
 // register allocation.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index c945533f0

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-18 Thread Mirko Brkušanin via llvm-branch-commits


@@ -641,12 +643,37 @@ let SubtargetPredicate = HasDot12Insts  in {
 
 defm V_DOT2_F32_BF16 :
   VOP3PInstDotWithDual<"v_dot2_f32_bf16", DOT2_BF16_Profile,
-   int_amdgcn_fdot2_f32_bf16>;
+   int_amdgcn_fdot2_f32_bf16, 0xD, "v_dot2acc_f32_bf16">;
 
 } // End SubtargetPredicate = HasDot12Insts
 
 } // End let IsDOT = 1
 
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32F16],
+Constraints = "$vdst = $src2" in
+def V_DOT2ACC_F32_F16_PSEUDO
+  : VOP3P_Pseudo<"", VOP3P_Profile>;
+
+class Dot2AccPseudo_Pat 
+  : GCNPat <
+(f32 (node (ty (VOP3PNoModsDOT ty:$src0)), (ty (VOP3PNoModsDOT ty:$src1)),
+   (f32 (VOP3PNoModsF32 f32:$src2)), (i1 DSTCLAMP.NONE))),
+(f32 (inst (i32 SRCMODS.OP_SEL_1), $src0, (i32 SRCMODS.OP_SEL_1), $src1,
+   (i32 SRCMODS.OP_SEL_1), $src2))
+>;
+
+let SubtargetPredicate = HasOnlyDualDot2AccF32F16 in
+def : Dot2AccPseudo_Pat;
+
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32BF16],
+Constraints = "$vdst = $src2" in
+def V_DOT2ACC_F32_BF16_PSEUDO :
+  VOP3P_Pseudo<"", VOP3P_Profile>;
+
+let SubtargetPredicate = HasOnlyDualDot2AccF32BF16 in
+def : Dot2AccPseudo_Pat;

mbrkusanin wrote:

Can you please reorder this to:
def pseudo
def pseudo
class pattern
def pattern
def pattern

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-12 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 98fc88ee933cbb82c11f3fc57be20e58d32c9f33 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 12 Feb 2026 18:02:57 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  |   5 +-
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |   2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |  16 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   8 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +++-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 186 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 131 ++--
 10 files changed, 239 insertions(+), 161 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 9ad2f2e11fbcc..bbda2a9a655ee 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2620,6 +2620,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 72805aa9165b6..0118c2436d7a4 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -94,14 +94,15 @@ class GCNCreateVOPD {
 for (auto CompIdx : VOPD::COMPONENTS) {
   auto CompSrcOprNum = InstInfo[CompIdx].getCompSrcOperandsNum();
   bool IsVOP3 = SII->isVOP3(*MI[CompIdx]);
+  bool IsVOP3Dot = IsVOP3 && SII->isDOT(*MI[CompIdx]);
   for (unsigned CompSrcIdx = 0; CompSrcIdx < CompSrcOprNum; ++CompSrcIdx) {
 if (AMDGPU::hasNamedOperand(VOPDOpc, Mods[CompIdx][CompSrcIdx])) {
   const MachineOperand *Mod =
   SII->getNamedOperand(*MI[CompIdx], SrcMods[CompSrcIdx]);
   VOPDInst.addImm(Mod ? Mod->getImm() : 0);
 }
-auto MCOprIdx =
-InstInfo[CompIdx].getIndexOfSrcInMCOperands(CompSrcIdx, IsVOP3);
+auto MCOprIdx = InstInfo[CompIdx].getIndexOfSrcInMCOperands(
+CompSrcIdx, IsVOP3, IsVOP3Dot);
 VOPDInst.add(MI[CompIdx]->getOperand(MCOprIdx));
   }
   if (MI[CompIdx]->getOpcode() == AMDGPU::V_CNDMASK_B32_e32 && CI.IsVOPD3)
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 663f53889ac74..4300d5a3a8dd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -44,7 +44,7 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && (TII.isVOP3WithoutVOPD(MIX) || TII.isVOP3WithoutVOPD(MIY)))
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 24aa31a318df3..402219b7f4e4c 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2069,6 +2069,16 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+MI.untieRegOperand(6);
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+MI.untieRegOperand(6);
+break;
+
   case AMDGPU::S_MOV_B64_term:
 // This is only a terminator to get the correct spill code placement during
 // register allocation.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 0b54513bb

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-12 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From 98fc88ee933cbb82c11f3fc57be20e58d32c9f33 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 12 Feb 2026 18:02:57 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  |   5 +-
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |   2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  10 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |  16 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   8 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  35 +++-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 186 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 131 ++--
 10 files changed, 239 insertions(+), 161 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 9ad2f2e11fbcc..bbda2a9a655ee 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2620,6 +2620,9 @@ def isWave32Strict : Predicate<"Subtarget->isWave32()">,
 def isWave64Strict : Predicate<"Subtarget->isWave64()">,
   AssemblerPredicate <(all_of FeatureWavefrontSize64)>;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 
//===--===//
 // HwModes
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 72805aa9165b6..0118c2436d7a4 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -94,14 +94,15 @@ class GCNCreateVOPD {
 for (auto CompIdx : VOPD::COMPONENTS) {
   auto CompSrcOprNum = InstInfo[CompIdx].getCompSrcOperandsNum();
   bool IsVOP3 = SII->isVOP3(*MI[CompIdx]);
+  bool IsVOP3Dot = IsVOP3 && SII->isDOT(*MI[CompIdx]);
   for (unsigned CompSrcIdx = 0; CompSrcIdx < CompSrcOprNum; ++CompSrcIdx) {
 if (AMDGPU::hasNamedOperand(VOPDOpc, Mods[CompIdx][CompSrcIdx])) {
   const MachineOperand *Mod =
   SII->getNamedOperand(*MI[CompIdx], SrcMods[CompSrcIdx]);
   VOPDInst.addImm(Mod ? Mod->getImm() : 0);
 }
-auto MCOprIdx =
-InstInfo[CompIdx].getIndexOfSrcInMCOperands(CompSrcIdx, IsVOP3);
+auto MCOprIdx = InstInfo[CompIdx].getIndexOfSrcInMCOperands(
+CompSrcIdx, IsVOP3, IsVOP3Dot);
 VOPDInst.add(MI[CompIdx]->getOperand(MCOprIdx));
   }
   if (MI[CompIdx]->getOpcode() == AMDGPU::V_CNDMASK_B32_e32 && CI.IsVOPD3)
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 663f53889ac74..4300d5a3a8dd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -44,7 +44,7 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && (TII.isVOP3WithoutVOPD(MIX) || TII.isVOP3WithoutVOPD(MIY)))
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 24aa31a318df3..402219b7f4e4c 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2069,6 +2069,16 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+MI.untieRegOperand(6);
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+MI.untieRegOperand(6);
+break;
+
   case AMDGPU::S_MOV_B64_term:
 // This is only a terminator to get the correct spill code placement during
 // register allocation.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 0b54513bb

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-11 Thread Mirko Brkušanin via llvm-branch-commits


@@ -608,12 +610,41 @@ def DOT2_BF16_Profile
 let SubtargetPredicate = HasDot12Insts  in {
 
 defm V_DOT2_F32_BF16 : VOP3PInstDotWithDual<"v_dot2_f32_bf16", 
DOT2_BF16_Profile,
-  int_amdgcn_fdot2_f32_bf16>;
+  int_amdgcn_fdot2_f32_bf16, 0xD, "v_dot2acc_f32_bf16">;
 
 } // End SubtargetPredicate = HasDot12Insts
 
 } // End let IsDOT = 1
 
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32F16] in
+def V_DOT2ACC_F32_F16_PSEUDO : VOP3P_Pseudo<"", 
VOP3P_Profile> {
+  let Constraints = "$vdst = $src2";
+}
+
+let SubtargetPredicate = HasOnlyDualDot2AccF32F16 in
+def : GCNPat<
+  (f32 (AMDGPUfdot2 (v2f16 (VOP3PNoModsDOT v2f16:$src0)),
+(v2f16 (VOP3PNoModsDOT v2f16:$src1)),
+(f32 (VOP3PNoModsF32 f32:$src2)),
+(i1 DSTCLAMP.NONE))),
+  (f32 (V_DOT2ACC_F32_F16_PSEUDO (i32 8), $src0, (i32 8), $src1, (i32 8), 
$src2))
+>;
+
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32BF16] in
+def V_DOT2ACC_F32_BF16_PSEUDO : VOP3P_Pseudo<"", DOT2_BF16_Profile> {
+  let Constraints = "$vdst = $src2";
+}
+
+let SubtargetPredicate = HasOnlyDualDot2AccF32BF16 in
+def : GCNPat<
+  (f32 (int_amdgcn_fdot2_f32_bf16 (v2bf16 (VOP3PNoModsDOT v2bf16:$src0)),
+  (v2bf16 (VOP3PNoModsDOT v2bf16:$src1)),
+  (f32 (VOP3PNoModsF32 f32:$src2)),
+  (i1 DSTCLAMP.NONE))),
+  (f32 (V_DOT2ACC_F32_BF16_PSEUDO (i32 8), $src0, (i32 8), $src1, (i32 8), 
$src2))

mbrkusanin wrote:

```suggestion
  (f32 (V_DOT2ACC_F32_BF16_PSEUDO (i32 SRCMODS.OP_SEL_1), $src0,
  (i32 SRCMODS.OP_SEL_1), $src1,
  (i32 SRCMODS.OP_SEL_1), $src2))
```

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-11 Thread Mirko Brkušanin via llvm-branch-commits


@@ -608,12 +610,41 @@ def DOT2_BF16_Profile
 let SubtargetPredicate = HasDot12Insts  in {
 
 defm V_DOT2_F32_BF16 : VOP3PInstDotWithDual<"v_dot2_f32_bf16", 
DOT2_BF16_Profile,
-  int_amdgcn_fdot2_f32_bf16>;
+  int_amdgcn_fdot2_f32_bf16, 0xD, "v_dot2acc_f32_bf16">;
 
 } // End SubtargetPredicate = HasDot12Insts
 
 } // End let IsDOT = 1
 
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32F16] in
+def V_DOT2ACC_F32_F16_PSEUDO : VOP3P_Pseudo<"", 
VOP3P_Profile> {
+  let Constraints = "$vdst = $src2";
+}
+
+let SubtargetPredicate = HasOnlyDualDot2AccF32F16 in
+def : GCNPat<
+  (f32 (AMDGPUfdot2 (v2f16 (VOP3PNoModsDOT v2f16:$src0)),
+(v2f16 (VOP3PNoModsDOT v2f16:$src1)),
+(f32 (VOP3PNoModsF32 f32:$src2)),
+(i1 DSTCLAMP.NONE))),
+  (f32 (V_DOT2ACC_F32_F16_PSEUDO (i32 8), $src0, (i32 8), $src1, (i32 8), 
$src2))
+>;
+
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32BF16] in
+def V_DOT2ACC_F32_BF16_PSEUDO : VOP3P_Pseudo<"", DOT2_BF16_Profile> {
+  let Constraints = "$vdst = $src2";

mbrkusanin wrote:

can be above with other `let`s

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-11 Thread Mirko Brkušanin via llvm-branch-commits


@@ -608,12 +610,41 @@ def DOT2_BF16_Profile
 let SubtargetPredicate = HasDot12Insts  in {
 
 defm V_DOT2_F32_BF16 : VOP3PInstDotWithDual<"v_dot2_f32_bf16", 
DOT2_BF16_Profile,
-  int_amdgcn_fdot2_f32_bf16>;
+  int_amdgcn_fdot2_f32_bf16, 0xD, "v_dot2acc_f32_bf16">;
 
 } // End SubtargetPredicate = HasDot12Insts
 
 } // End let IsDOT = 1
 
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32F16] in
+def V_DOT2ACC_F32_F16_PSEUDO : VOP3P_Pseudo<"", 
VOP3P_Profile> {
+  let Constraints = "$vdst = $src2";
+}
+
+let SubtargetPredicate = HasOnlyDualDot2AccF32F16 in
+def : GCNPat<
+  (f32 (AMDGPUfdot2 (v2f16 (VOP3PNoModsDOT v2f16:$src0)),
+(v2f16 (VOP3PNoModsDOT v2f16:$src1)),
+(f32 (VOP3PNoModsF32 f32:$src2)),
+(i1 DSTCLAMP.NONE))),
+  (f32 (V_DOT2ACC_F32_F16_PSEUDO (i32 8), $src0, (i32 8), $src1, (i32 8), 
$src2))

mbrkusanin wrote:

```suggestion
  (f32 (V_DOT2ACC_F32_F16_PSEUDO (i32 SRCMODS.OP_SEL_1), $src0,
 (i32 SRCMODS.OP_SEL_1), $src1,
 (i32 SRCMODS.OP_SEL_1), $src2))
```

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-11 Thread Mirko Brkušanin via llvm-branch-commits


@@ -608,12 +610,41 @@ def DOT2_BF16_Profile
 let SubtargetPredicate = HasDot12Insts  in {
 
 defm V_DOT2_F32_BF16 : VOP3PInstDotWithDual<"v_dot2_f32_bf16", 
DOT2_BF16_Profile,
-  int_amdgcn_fdot2_f32_bf16>;
+  int_amdgcn_fdot2_f32_bf16, 0xD, "v_dot2acc_f32_bf16">;
 
 } // End SubtargetPredicate = HasDot12Insts
 
 } // End let IsDOT = 1
 
+let IsDOT = 1, OtherPredicates = [HasOnlyDualDot2AccF32F16] in
+def V_DOT2ACC_F32_F16_PSEUDO : VOP3P_Pseudo<"", 
VOP3P_Profile> {
+  let Constraints = "$vdst = $src2";

mbrkusanin wrote:

can be above with other `let`s

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-11 Thread Mirko Brkušanin via llvm-branch-commits


@@ -752,6 +752,9 @@ defm Dot13Insts : AMDGPUSubtargetFeature<"dot13-insts",
   "Has v_dot2c_f32_bf16 instructions"
 >;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+

mbrkusanin wrote:

Move this lower in the file next to other predicates

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-11 Thread Mirko Brkušanin via llvm-branch-commits


@@ -2065,6 +2065,14 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+break;

mbrkusanin wrote:

I looked at print-after-all
src2 should also be untied from dst: `MI.untieRegOperand(Idx)`

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-11 Thread Mirko Brkušanin via llvm-branch-commits


@@ -559,6 +559,19 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 
   static bool isVOP3(const MachineInstr &MI) { return isVOP3(MI.getDesc()); }
 
+  static bool isVOP3WithoutVOPD(const MachineInstr &MI) {
+if (MI.getOpcode() == AMDGPU::V_DOT2_F32_F16 ||
+MI.getOpcode() == AMDGPU::V_DOT2_F32_BF16) {
+  // VOPD if no src_mods, no clamp, no inline const and src2 same as dst.
+  return MI.getOperand(1).getImm() != 8 || !MI.getOperand(2).isReg() ||
+ MI.getOperand(3).getImm() != 8 || !MI.getOperand(4).isReg() ||
+ MI.getOperand(5).getImm() != 8 || !MI.getOperand(6).isReg() ||
+ MI.getOperand(6).getReg() != MI.getOperand(0).getReg() ||
+ MI.getOperand(7).getImm() != 0;

mbrkusanin wrote:

Use SISrcMods::OP_SEL_1 for modifier values.


https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-10 Thread Petar Avramovic via llvm-branch-commits

petar-avramovic wrote:

ping

https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-03 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From bb3f6f8f72f26cb36a9a93dfb4140427e06f0a8e Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 2 Feb 2026 13:25:21 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  |   5 +-
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |   2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|   8 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |  13 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   8 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  39 +++-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 186 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 131 ++--
 10 files changed, 238 insertions(+), 161 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 828f87ccfaf97..9340be24a4c21 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -752,6 +752,9 @@ defm Dot13Insts : AMDGPUSubtargetFeature<"dot13-insts",
   "Has v_dot2c_f32_bf16 instructions"
 >;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 defm MAIInsts : AMDGPUSubtargetFeature<"mai-insts",
   "Has mAI instructions"
 >;
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 72805aa9165b6..0118c2436d7a4 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -94,14 +94,15 @@ class GCNCreateVOPD {
 for (auto CompIdx : VOPD::COMPONENTS) {
   auto CompSrcOprNum = InstInfo[CompIdx].getCompSrcOperandsNum();
   bool IsVOP3 = SII->isVOP3(*MI[CompIdx]);
+  bool IsVOP3Dot = IsVOP3 && SII->isDOT(*MI[CompIdx]);
   for (unsigned CompSrcIdx = 0; CompSrcIdx < CompSrcOprNum; ++CompSrcIdx) {
 if (AMDGPU::hasNamedOperand(VOPDOpc, Mods[CompIdx][CompSrcIdx])) {
   const MachineOperand *Mod =
   SII->getNamedOperand(*MI[CompIdx], SrcMods[CompSrcIdx]);
   VOPDInst.addImm(Mod ? Mod->getImm() : 0);
 }
-auto MCOprIdx =
-InstInfo[CompIdx].getIndexOfSrcInMCOperands(CompSrcIdx, IsVOP3);
+auto MCOprIdx = InstInfo[CompIdx].getIndexOfSrcInMCOperands(
+CompSrcIdx, IsVOP3, IsVOP3Dot);
 VOPDInst.add(MI[CompIdx]->getOperand(MCOprIdx));
   }
   if (MI[CompIdx]->getOpcode() == AMDGPU::V_CNDMASK_B32_e32 && CI.IsVOPD3)
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 663f53889ac74..4300d5a3a8dd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -44,7 +44,7 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && (TII.isVOP3WithoutVOPD(MIX) || TII.isVOP3WithoutVOPD(MIY)))
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 09efba485f6f8..684a0368fb292 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2065,6 +2065,14 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+break;
+
   case AMDGPU::S_MOV_B64_term:
 // This is only a terminator to get the correct spill code placement during
 // register allocation.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 05cf804d08ffc..da0678644d787 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -559,6 +559,19 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 
   static bool isVOP3(const MachineIn

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-03 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/179226

>From bb3f6f8f72f26cb36a9a93dfb4140427e06f0a8e Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 2 Feb 2026 13:25:21 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  |   5 +-
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |   2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|   8 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |  13 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   8 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  39 +++-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 186 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 131 ++--
 10 files changed, 238 insertions(+), 161 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 828f87ccfaf97..9340be24a4c21 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -752,6 +752,9 @@ defm Dot13Insts : AMDGPUSubtargetFeature<"dot13-insts",
   "Has v_dot2c_f32_bf16 instructions"
 >;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 defm MAIInsts : AMDGPUSubtargetFeature<"mai-insts",
   "Has mAI instructions"
 >;
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 72805aa9165b6..0118c2436d7a4 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -94,14 +94,15 @@ class GCNCreateVOPD {
 for (auto CompIdx : VOPD::COMPONENTS) {
   auto CompSrcOprNum = InstInfo[CompIdx].getCompSrcOperandsNum();
   bool IsVOP3 = SII->isVOP3(*MI[CompIdx]);
+  bool IsVOP3Dot = IsVOP3 && SII->isDOT(*MI[CompIdx]);
   for (unsigned CompSrcIdx = 0; CompSrcIdx < CompSrcOprNum; ++CompSrcIdx) {
 if (AMDGPU::hasNamedOperand(VOPDOpc, Mods[CompIdx][CompSrcIdx])) {
   const MachineOperand *Mod =
   SII->getNamedOperand(*MI[CompIdx], SrcMods[CompSrcIdx]);
   VOPDInst.addImm(Mod ? Mod->getImm() : 0);
 }
-auto MCOprIdx =
-InstInfo[CompIdx].getIndexOfSrcInMCOperands(CompSrcIdx, IsVOP3);
+auto MCOprIdx = InstInfo[CompIdx].getIndexOfSrcInMCOperands(
+CompSrcIdx, IsVOP3, IsVOP3Dot);
 VOPDInst.add(MI[CompIdx]->getOperand(MCOprIdx));
   }
   if (MI[CompIdx]->getOpcode() == AMDGPU::V_CNDMASK_B32_e32 && CI.IsVOPD3)
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 663f53889ac74..4300d5a3a8dd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -44,7 +44,7 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && (TII.isVOP3WithoutVOPD(MIX) || TII.isVOP3WithoutVOPD(MIY)))
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 09efba485f6f8..684a0368fb292 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2065,6 +2065,14 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+break;
+
   case AMDGPU::S_MOV_B64_term:
 // This is only a terminator to get the correct spill code placement during
 // register allocation.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 05cf804d08ffc..da0678644d787 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -559,6 +559,19 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 
   static bool isVOP3(const MachineIn

[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-02 Thread Petar Avramovic via llvm-branch-commits

petar-avramovic wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.com/github/pr/llvm/llvm-project/179226?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#179226** https://app.graphite.com/github/pr/llvm/llvm-project/179226?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.com/github/pr/llvm/llvm-project/179226?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#179225** https://app.graphite.com/github/pr/llvm/llvm-project/179225?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#179224** https://app.graphite.com/github/pr/llvm/llvm-project/179224?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#179223** https://app.graphite.com/github/pr/llvm/llvm-project/179223?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/179226
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)

2026-02-02 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic created 
https://github.com/llvm/llvm-project/pull/179226

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.

>From 0070a4ae320b5e90f2544d4a7dd5399a24e335a2 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 2 Feb 2026 13:25:21 +0100
Subject: [PATCH] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   3 +
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  |   5 +-
 llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp   |   2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|   8 ++
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |  13 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   8 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  39 +-
 llvm/lib/Target/AMDGPU/VOPInstructions.td |   4 +-
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 118 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll |  95 +++---
 10 files changed, 182 insertions(+), 113 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 1a9bdb6634629..d006509b6aa6d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -740,6 +740,9 @@ defm Dot13Insts : AMDGPUSubtargetFeature<"dot13-insts",
   "Has v_dot2c_f32_bf16 instructions"
 >;
 
+def HasOnlyDualDot2AccF32F16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot10Insts() && !Subtarget->hasDot5Insts()">;
+def HasOnlyDualDot2AccF32BF16 : Predicate<"Subtarget->hasVOPDInsts() && 
Subtarget->hasDot12Insts() && !Subtarget->hasDot13Insts()">;
+
 defm MAIInsts : AMDGPUSubtargetFeature<"mai-insts",
   "Has mAI instructions"
 >;
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 72805aa9165b6..0118c2436d7a4 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -94,14 +94,15 @@ class GCNCreateVOPD {
 for (auto CompIdx : VOPD::COMPONENTS) {
   auto CompSrcOprNum = InstInfo[CompIdx].getCompSrcOperandsNum();
   bool IsVOP3 = SII->isVOP3(*MI[CompIdx]);
+  bool IsVOP3Dot = IsVOP3 && SII->isDOT(*MI[CompIdx]);
   for (unsigned CompSrcIdx = 0; CompSrcIdx < CompSrcOprNum; ++CompSrcIdx) {
 if (AMDGPU::hasNamedOperand(VOPDOpc, Mods[CompIdx][CompSrcIdx])) {
   const MachineOperand *Mod =
   SII->getNamedOperand(*MI[CompIdx], SrcMods[CompSrcIdx]);
   VOPDInst.addImm(Mod ? Mod->getImm() : 0);
 }
-auto MCOprIdx =
-InstInfo[CompIdx].getIndexOfSrcInMCOperands(CompSrcIdx, IsVOP3);
+auto MCOprIdx = InstInfo[CompIdx].getIndexOfSrcInMCOperands(
+CompSrcIdx, IsVOP3, IsVOP3Dot);
 VOPDInst.add(MI[CompIdx]->getOperand(MCOprIdx));
   }
   if (MI[CompIdx]->getOpcode() == AMDGPU::V_CNDMASK_B32_e32 && CI.IsVOPD3)
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp 
b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 663f53889ac74..4300d5a3a8dd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -44,7 +44,7 @@ bool llvm::checkVOPDRegConstraints(const SIInstrInfo &TII,
 
   if (IsVOPD3 && !ST.hasVOPD3())
 return false;
-  if (!IsVOPD3 && (TII.isVOP3(MIX) || TII.isVOP3(MIY)))
+  if (!IsVOPD3 && (TII.isVOP3WithoutVOPD(MIX) || TII.isVOP3WithoutVOPD(MIY)))
 return false;
   if (TII.isDPP(MIX) || TII.isDPP(MIY))
 return false;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 09efba485f6f8..684a0368fb292 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2065,6 +2065,14 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) 
const {
   const AMDGPU::LaneMaskConstants &LMC = AMDGPU::LaneMaskConstants::get(ST);
   switch (MI.getOpcode()) {
   default: return TargetInstrInfo::expandPostRAPseudo(MI);
+  case AMDGPU::V_DOT2ACC_F32_F16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_F16));
+break;
+
+  case AMDGPU::V_DOT2ACC_F32_BF16_PSEUDO:
+MI.setDesc(get(AMDGPU::V_DOT2_F32_BF16));
+break;
+
   case AMDGPU::S_MOV_B64_t