[llvm-branch-commits] [llvm] [AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern (PR #188116)
https://github.com/adelejjeh updated
https://github.com/llvm/llvm-project/pull/188116
>From 1283298a74c6ae99472117a3e41a75f8783ddc0d Mon Sep 17 00:00:00 2001
From: Adel Ejjeh
Date: Thu, 12 Mar 2026 10:09:35 -0500
Subject: [PATCH] [AMDGPU][DAGCombiner][GlobalISel] Extend
allMulUsesCanBeContracted with FPEXT pattern
Extend the allMulUsesCanBeContracted analysis to recognize FPEXT patterns
where the multiply result flows through fpext before being used in
contractable operations (fadd, fsub). This covers:
- fmul --> fpext --> {fadd, fsub}: FPEXT folds if isFPExtFoldable
- fmul --> fpext --> fneg --> fsub: FPEXT then FNEG to FSUB
- fmul --> fneg --> fpext --> fsub: FNEG then FPEXT folds if foldable
Also adds allMulUsesCanBeContracted guards to all FPEXT fold sites in
both SDAG (visitFADDForFMACombine, visitFSUBForFMACombine) and GISel
(matchCombineFAddFpExtFMulToFMadOrFMA, matchCombineFSubFpExtFMulToFMadOrFMA,
matchCombineFSubFpExtFNegFMulToFMadOrFMA).
Fixes a missing isFPExtFoldable check in GISel's
matchCombineFSubFpExtFMulToFMadOrFMA which could fold without verifying
the extension is actually foldable.
Co-Authored-By: Claude Opus 4.6
Made-with: Cursor
---
.../llvm/CodeGen/GlobalISel/CombinerHelper.h | 3 +-
.../lib/CodeGen/GlobalISel/CombinerHelper.cpp | 102 ++-
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 91 ++-
.../AMDGPU/fma-multiple-uses-contraction.ll | 680 ++
4 files changed, 390 insertions(+), 486 deletions(-)
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 09c827f71a34d..8440fdcbbd08b 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -805,7 +805,8 @@ class CombinerHelper {
/// Check if all uses of a multiply can be contracted into fma/fmad
/// operations, so that duplicating the multiply is acceptable.
- bool allMulUsesCanBeContracted(const MachineInstr &MI) const;
+ bool allMulUsesCanBeContracted(const MachineInstr &MI,
+ unsigned PreferredFusedOpcode) const;
bool canCombineFMadOrFMA(MachineInstr &MI, bool &AllowFusionGlobally,
bool &HasFMAD, bool &Aggressive,
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index d2bf2568df276..0941e6da0f40f 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -6316,10 +6316,15 @@ static bool hasMoreUses(const MachineInstr &MI0, const
MachineInstr &MI1,
/// would duplicate the multiply without reducing the total number of
/// operations.
///
-/// Currently checks for the following patterns:
+/// This uses a simple, non-recursive check for the following patterns:
/// - fmul --> fadd/fsub: Direct contraction
/// - fmul --> fneg --> fsub: Contraction through fneg
-bool CombinerHelper::allMulUsesCanBeContracted(const MachineInstr &MI) const {
+/// - fmul --> fneg --> fpext --> fsub: FNEG then FPEXT folds if foldable
+/// - fmul --> fpext --> {fadd, fsub}: FPEXT folds if foldable
+/// - fmul --> fpext --> fneg --> fsub: FPEXT then FNEG to FSUB
+bool CombinerHelper::allMulUsesCanBeContracted(
+const MachineInstr &MI, unsigned PreferredFusedOpcode) const {
+ const auto &TLI = getTargetLowering();
Register MulReg = MI.getOperand(0).getReg();
for (const MachineInstr &UseMI : MRI.use_nodbg_instructions(MulReg)) {
@@ -6329,13 +6334,66 @@ bool CombinerHelper::allMulUsesCanBeContracted(const
MachineInstr &MI) const {
if (Opcode == TargetOpcode::G_FADD || Opcode == TargetOpcode::G_FSUB)
continue;
-// G_FNEG use - contractable if all users of the fneg are G_FSUB.
+// FNEG --> FSUB pattern
+// Also handles FNEG --> FPEXT --> FSUB
if (Opcode == TargetOpcode::G_FNEG) {
Register FNegReg = UseMI.getOperand(0).getReg();
- for (const MachineInstr &FNegUser : MRI.use_nodbg_instructions(FNegReg))
{
-unsigned FNegUserOp = FNegUser.getOpcode();
-if (FNegUserOp != TargetOpcode::G_FSUB)
+ // ALL users of the FNEG must be contractable FSUBs or FPEXTs leading to
+ // FSUBs
+ for (const MachineInstr &FNegUseMI :
+ MRI.use_nodbg_instructions(FNegReg)) {
+unsigned FNegUseOpcode = FNegUseMI.getOpcode();
+
+if (FNegUseOpcode == TargetOpcode::G_FSUB)
+ continue;
+if (FNegUseOpcode == TargetOpcode::G_FPEXT) {
+ // FNEG --> FPEXT --> FSUB
+ Register FNegFPExtReg = FNegUseMI.getOperand(0).getReg();
+ for (const MachineInstr &FNegFPExtUseMI :
+ MRI.use_nodbg_instructions(FNegFPExtReg)) {
+if (FNegFPExtUseMI.getOpcode() != TargetOpcode::G_FSUB)
+ return false;
+// FPEXT use is FSUB, check if can be folded in
+if (!TLI.isFPExtFoldable(
+FNegFPExtUseMI,
[llvm-branch-commits] [llvm] [AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern (PR #188116)
github-actions[bot] wrote: # :window: Windows x64 Test Results * 132607 tests passed * 3027 tests skipped * 3 tests failed ## Failed Tests (click on a test name to see its output) ### LLVM LLVM.CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll ``` Exit Code: 1 Command Output (stdout): -- # RUN: at line 2 c:\_work\llvm-project\llvm-project\build\bin\opt.exe -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink < C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll | c:\_work\llvm-project\llvm-project\build\bin\llc.exe -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink | c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll # executed command: 'c:\_work\llvm-project\llvm-project\build\bin\opt.exe' -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink # note: command had no output on stdout or stderr # executed command: 'c:\_work\llvm-project\llvm-project\build\bin\llc.exe' -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink # note: command had no output on stdout or stderr # executed command: 'c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe' 'C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll' # .---command stderr # | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:37:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: s_add_u32 s16, s16, _Z3powff@rel32@lo+4 # | ^ # | :41:22: note: scanning from here # | s_getpc_b64 s[16:17] # | ^ # | :42:2: note: possible intended match here # | s_add_u32 s16, s16, _Z10__pow_fastff@gotpcrel32@lo+4 # | ^ # | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:70:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v1 # | ^ # | :114:26: note: scanning from here # | v_and_b32_e32 v0, v2, v0 # | ^ # | :115:2: note: possible intended match here # | v_or_b32_e32 v0, v1, v0 # | ^ # | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:179:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1 # | ^ # | :259:73: note: scanning from here # | buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload # | ^ # | :260:2: note: possible intended match here # | v_or_b32_e32 v1, v1, v2 # | ^ # | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:356:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v2 # | ^ # | :503:22: note: scanning from here # | v_exp_f16_e32 v2, v2 # | ^ # | :504:2: note: possible intended match here # | v_or_b32_e32 v0, v2, v0 # | ^ # | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:461:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1 # | ^ # | :646:73: note: scanning from here # | buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload # | ^ # | :647:2: note: possible intended match here # | v_or_b32_e32 v1, v1, v2 # | ^ # | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:684:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_writelane_b32 v43, s16, 14 # | ^ # | :978:26: note: scanning from here # | s_mov_b64 exec, s[18:19] # | ^ # | :979:2: note: possible intended match here # | v_writelane_b32 v43, s16, 15 # | ^ # | # | Input file: # | Check file: C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll # | # | -dump-input=help explains the following input dump. # | # | Input was: # | << # | . # | . # | . # |36: .p2align 6 # |37: .type test_pow_fast_f32,@function # |38: test_pow_fast_f32: ; @test_pow_fast_f32 # |39: ; %bb.0: # |40: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) # |41: s_getpc_b64 s[16:17] # | next:37'0 X error: no match found # |42: s_add_u32 s16, s16, _Z10__pow_fastff@gotpcrel32@lo+4 # | next:37'0 ~~ # | next:37'1 ?
[llvm-branch-commits] [llvm] [AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern (PR #188116)
github-actions[bot] wrote: # :penguin: Linux x64 Test Results * 171661 tests passed * 3068 tests skipped * 3 tests failed ## Failed Tests (click on a test name to see its output) ### LLVM LLVM.CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll ``` Exit Code: 1 Command Output (stdout): -- # RUN: at line 2 /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink < /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll # executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink # note: command had no output on stdout or stderr # executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink # note: command had no output on stdout or stderr # executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll # .---command stderr # | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:37:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: s_add_u32 s16, s16, _Z3powff@rel32@lo+4 # | ^ # | :41:22: note: scanning from here # | s_getpc_b64 s[16:17] # | ^ # | :42:2: note: possible intended match here # | s_add_u32 s16, s16, _Z10__pow_fastff@gotpcrel32@lo+4 # | ^ # | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:70:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v1 # | ^ # | :114:26: note: scanning from here # | v_and_b32_e32 v0, v2, v0 # | ^ # | :115:2: note: possible intended match here # | v_or_b32_e32 v0, v1, v0 # | ^ # | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:179:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1 # | ^ # | :259:73: note: scanning from here # | buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload # | ^ # | :260:2: note: possible intended match here # | v_or_b32_e32 v1, v1, v2 # | ^ # | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:356:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v2 # | ^ # | :503:22: note: scanning from here # | v_exp_f16_e32 v2, v2 # | ^ # | :504:2: note: possible intended match here # | v_or_b32_e32 v0, v2, v0 # | ^ # | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:461:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1 # | ^ # | :646:73: note: scanning from here # | buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload # | ^ # | :647:2: note: possible intended match here # | v_or_b32_e32 v1, v1, v2 # | ^ # | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:684:15: error: CHECK-NEXT: expected string not found in input # | ; CHECK-NEXT: v_writelane_b32 v43, s16, 14 # | ^ # | :978:26: note: scanning from here # | s_mov_b64 exec, s[18:19] # | ^ # | :979:2: note: possible intended match here # | v_writelane_b32 v43, s16, 15 # | ^ # | # | Input file: # | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll # | # | -dump-input=help explains the following input dump. # | # | Input was: # | << # | . # | . # | . # |36: .p2align 6 # |37: .type test_pow_fast_f32,@function # |38: test_pow_fast_f32: ; @test_pow_fast_f32 # |39: ; %bb.0: # |40: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) #
