[llvm-branch-commits] [llvm] [AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern (PR #188116)

2026-03-30 Thread Adel Ejjeh via llvm-branch-commits

https://github.com/adelejjeh updated 
https://github.com/llvm/llvm-project/pull/188116

>From 1283298a74c6ae99472117a3e41a75f8783ddc0d Mon Sep 17 00:00:00 2001
From: Adel Ejjeh 
Date: Thu, 12 Mar 2026 10:09:35 -0500
Subject: [PATCH] [AMDGPU][DAGCombiner][GlobalISel] Extend
 allMulUsesCanBeContracted with FPEXT pattern

Extend the allMulUsesCanBeContracted analysis to recognize FPEXT patterns
where the multiply result flows through fpext before being used in
contractable operations (fadd, fsub). This covers:
  - fmul --> fpext --> {fadd, fsub}: FPEXT folds if isFPExtFoldable
  - fmul --> fpext --> fneg --> fsub: FPEXT then FNEG to FSUB
  - fmul --> fneg --> fpext --> fsub: FNEG then FPEXT folds if foldable

Also adds allMulUsesCanBeContracted guards to all FPEXT fold sites in
both SDAG (visitFADDForFMACombine, visitFSUBForFMACombine) and GISel
(matchCombineFAddFpExtFMulToFMadOrFMA, matchCombineFSubFpExtFMulToFMadOrFMA,
matchCombineFSubFpExtFNegFMulToFMadOrFMA).

Fixes a missing isFPExtFoldable check in GISel's
matchCombineFSubFpExtFMulToFMadOrFMA which could fold without verifying
the extension is actually foldable.

Co-Authored-By: Claude Opus 4.6 
Made-with: Cursor
---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   3 +-
 .../lib/CodeGen/GlobalISel/CombinerHelper.cpp | 102 ++-
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp |  91 ++-
 .../AMDGPU/fma-multiple-uses-contraction.ll   | 680 ++
 4 files changed, 390 insertions(+), 486 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 09c827f71a34d..8440fdcbbd08b 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -805,7 +805,8 @@ class CombinerHelper {
 
   /// Check if all uses of a multiply can be contracted into fma/fmad
   /// operations, so that duplicating the multiply is acceptable.
-  bool allMulUsesCanBeContracted(const MachineInstr &MI) const;
+  bool allMulUsesCanBeContracted(const MachineInstr &MI,
+ unsigned PreferredFusedOpcode) const;
 
   bool canCombineFMadOrFMA(MachineInstr &MI, bool &AllowFusionGlobally,
bool &HasFMAD, bool &Aggressive,
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index d2bf2568df276..0941e6da0f40f 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -6316,10 +6316,15 @@ static bool hasMoreUses(const MachineInstr &MI0, const 
MachineInstr &MI1,
 /// would duplicate the multiply without reducing the total number of
 /// operations.
 ///
-/// Currently checks for the following patterns:
+/// This uses a simple, non-recursive check for the following patterns:
 ///   - fmul --> fadd/fsub: Direct contraction
 ///   - fmul --> fneg --> fsub: Contraction through fneg
-bool CombinerHelper::allMulUsesCanBeContracted(const MachineInstr &MI) const {
+///   - fmul --> fneg --> fpext --> fsub: FNEG then FPEXT folds if foldable
+///   - fmul --> fpext --> {fadd, fsub}: FPEXT folds if foldable
+///   - fmul --> fpext --> fneg --> fsub: FPEXT then FNEG to FSUB
+bool CombinerHelper::allMulUsesCanBeContracted(
+const MachineInstr &MI, unsigned PreferredFusedOpcode) const {
+  const auto &TLI = getTargetLowering();
   Register MulReg = MI.getOperand(0).getReg();
 
   for (const MachineInstr &UseMI : MRI.use_nodbg_instructions(MulReg)) {
@@ -6329,13 +6334,66 @@ bool CombinerHelper::allMulUsesCanBeContracted(const 
MachineInstr &MI) const {
 if (Opcode == TargetOpcode::G_FADD || Opcode == TargetOpcode::G_FSUB)
   continue;
 
-// G_FNEG use - contractable if all users of the fneg are G_FSUB.
+// FNEG --> FSUB pattern
+// Also handles FNEG --> FPEXT --> FSUB
 if (Opcode == TargetOpcode::G_FNEG) {
   Register FNegReg = UseMI.getOperand(0).getReg();
-  for (const MachineInstr &FNegUser : MRI.use_nodbg_instructions(FNegReg)) 
{
-unsigned FNegUserOp = FNegUser.getOpcode();
-if (FNegUserOp != TargetOpcode::G_FSUB)
+  // ALL users of the FNEG must be contractable FSUBs or FPEXTs leading to
+  // FSUBs
+  for (const MachineInstr &FNegUseMI :
+   MRI.use_nodbg_instructions(FNegReg)) {
+unsigned FNegUseOpcode = FNegUseMI.getOpcode();
+
+if (FNegUseOpcode == TargetOpcode::G_FSUB)
+  continue;
+if (FNegUseOpcode == TargetOpcode::G_FPEXT) {
+  // FNEG --> FPEXT --> FSUB
+  Register FNegFPExtReg = FNegUseMI.getOperand(0).getReg();
+  for (const MachineInstr &FNegFPExtUseMI :
+   MRI.use_nodbg_instructions(FNegFPExtReg)) {
+if (FNegFPExtUseMI.getOpcode() != TargetOpcode::G_FSUB)
+  return false;
+// FPEXT use is FSUB, check if can be folded in
+if (!TLI.isFPExtFoldable(
+FNegFPExtUseMI,

[llvm-branch-commits] [llvm] [AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern (PR #188116)

2026-03-23 Thread via llvm-branch-commits

github-actions[bot] wrote:


# :window: Windows x64 Test Results

* 132607 tests passed
* 3027 tests skipped
* 3 tests failed

## Failed Tests
(click on a test name to see its output)

### LLVM

LLVM.CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll

```
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
c:\_work\llvm-project\llvm-project\build\bin\opt.exe -mtriple=amdgcn-amd-amdhsa 
-mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink < 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll
 | c:\_work\llvm-project\llvm-project\build\bin\llc.exe 
-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink | 
c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\opt.exe' 
-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine 
-amdgpu-prelink
# note: command had no output on stdout or stderr
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\llc.exe' 
-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink
# note: command had no output on stdout or stderr
# executed command: 
'c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe' 
'C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll'
# .---command stderr
# | 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:37:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: s_add_u32 s16, s16, _Z3powff@rel32@lo+4
# |   ^
# | :41:22: note: scanning from here
# |  s_getpc_b64 s[16:17]
# |  ^
# | :42:2: note: possible intended match here
# |  s_add_u32 s16, s16, _Z10__pow_fastff@gotpcrel32@lo+4
# |  ^
# | 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:70:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v1
# |   ^
# | :114:26: note: scanning from here
# |  v_and_b32_e32 v0, v2, v0
# |  ^
# | :115:2: note: possible intended match here
# |  v_or_b32_e32 v0, v1, v0
# |  ^
# | 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:179:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1
# |   ^
# | :259:73: note: scanning from here
# |  buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
# | ^
# | :260:2: note: possible intended match here
# |  v_or_b32_e32 v1, v1, v2
# |  ^
# | 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:356:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v2
# |   ^
# | :503:22: note: scanning from here
# |  v_exp_f16_e32 v2, v2
# |  ^
# | :504:2: note: possible intended match here
# |  v_or_b32_e32 v0, v2, v0
# |  ^
# | 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:461:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1
# |   ^
# | :646:73: note: scanning from here
# |  buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
# | ^
# | :647:2: note: possible intended match here
# |  v_or_b32_e32 v1, v1, v2
# |  ^
# | 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll:684:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_writelane_b32 v43, s16, 14
# |   ^
# | :978:26: note: scanning from here
# |  s_mov_b64 exec, s[18:19]
# |  ^
# | :979:2: note: possible intended match here
# |  v_writelane_b32 v43, s16, 15
# |  ^
# | 
# | Input file: 
# | Check file: 
C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AMDGPU\amdgpu-simplify-libcall-pow-codegen.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<
# | .
# | .
# | .
# |36:  .p2align 6 
# |37:  .type test_pow_fast_f32,@function 
# |38: test_pow_fast_f32: ; @test_pow_fast_f32 
# |39: ; %bb.0: 
# |40:  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) 
# |41:  s_getpc_b64 s[16:17] 
# | next:37'0   X error: no match found
# |42:  s_add_u32 s16, s16, _Z10__pow_fastff@gotpcrel32@lo+4 
# | next:37'0  ~~
# | next:37'1   ?

[llvm-branch-commits] [llvm] [AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern (PR #188116)

2026-03-23 Thread via llvm-branch-commits

github-actions[bot] wrote:


# :penguin: Linux x64 Test Results

* 171661 tests passed
* 3068 tests skipped
* 3 tests failed

## Failed Tests
(click on a test name to see its output)

### LLVM

LLVM.CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll

```
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt 
-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine 
-amdgpu-prelink < 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
 | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llc 
-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink | 
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
# executed command: 
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt 
-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine 
-amdgpu-prelink
# note: command had no output on stdout or stderr
# executed command: 
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llc 
-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-prelink
# note: command had no output on stdout or stderr
# executed command: 
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
# .---command stderr
# | 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:37:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: s_add_u32 s16, s16, _Z3powff@rel32@lo+4
# |   ^
# | :41:22: note: scanning from here
# |  s_getpc_b64 s[16:17]
# |  ^
# | :42:2: note: possible intended match here
# |  s_add_u32 s16, s16, _Z10__pow_fastff@gotpcrel32@lo+4
# |  ^
# | 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:70:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v1
# |   ^
# | :114:26: note: scanning from here
# |  v_and_b32_e32 v0, v2, v0
# |  ^
# | :115:2: note: possible intended match here
# |  v_or_b32_e32 v0, v1, v0
# |  ^
# | 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:179:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1
# |   ^
# | :259:73: note: scanning from here
# |  buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
# | ^
# | :260:2: note: possible intended match here
# |  v_or_b32_e32 v1, v1, v2
# |  ^
# | 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:356:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v0, v0, v2
# |   ^
# | :503:22: note: scanning from here
# |  v_exp_f16_e32 v2, v2
# |  ^
# | :504:2: note: possible intended match here
# |  v_or_b32_e32 v0, v2, v0
# |  ^
# | 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:461:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_or_b32_e32 v1, v2, v1
# |   ^
# | :646:73: note: scanning from here
# |  buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
# | ^
# | :647:2: note: possible intended match here
# |  v_or_b32_e32 v1, v1, v2
# |  ^
# | 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll:684:15:
 error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: v_writelane_b32 v43, s16, 14
# |   ^
# | :978:26: note: scanning from here
# |  s_mov_b64 exec, s[18:19]
# |  ^
# | :979:2: note: possible intended match here
# |  v_writelane_b32 v43, s16, 15
# |  ^
# | 
# | Input file: 
# | Check file: 
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<
# | .
# | .
# | .
# |36:  .p2align 6 
# |37:  .type test_pow_fast_f32,@function 
# |38: test_pow_fast_f32: ; @test_pow_fast_f32 
# |39: ; %bb.0: 
# |40:  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) 
#