date:20250630

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray created 
https://github.com/llvm/llvm-project/pull/146331

This is a series of patches (4/4) to unify assembly/disassembly of recent 
AArch64 tests into a single file. The aim is to improve consistency, so that 
all instructions and system registers are thoroughly tested, and future test 
cases will be in a unified format.

This patch:
 * removes .txt tests whose .s tests have functions
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly

>From 8c9eccdc95e465fdbfe833080afb1ad1099c224c Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Fri, 27 Jun 2025 20:16:06 +0100
Subject: [PATCH] [AArch64][llvm] Unify AArch64 tests into a single file (4/4)
 (NFC)

This is a series of patches (4/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests whose .s tests have functions
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly

Co-authored-by: Virginia Cangelosi 
---
 llvm/test/MC/AArch64/armv9.6a-lsui.s  | 1073 +++--
 llvm/test/MC/AArch64/armv9.6a-occmo.s |   54 +-
 llvm/test/MC/AArch64/armv9.6a-pcdphint.s  |   37 +-
 llvm/test/MC/AArch64/armv9.6a-rme-gpc3.s  |   46 +-
 .../MC/Disassembler/AArch64/armv9.6a-lsui.txt |  323 -
 .../Disassembler/AArch64/armv9.6a-occmo.txt   |   11 -
 .../AArch64/armv9.6a-pcdphint.txt |8 -
 .../AArch64/armv9.6a-rme-gpc3.txt |   18 -
 8 files changed, 805 insertions(+), 765 deletions(-)
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-lsui.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-occmo.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-pcdphint.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-rme-gpc3.txt

diff --git a/llvm/test/MC/AArch64/armv9.6a-lsui.s 
b/llvm/test/MC/AArch64/armv9.6a-lsui.s
index d4a5e1f980560..264a869b6d286 100644
--- a/llvm/test/MC/AArch64/armv9.6a-lsui.s
+++ b/llvm/test/MC/AArch64/armv9.6a-lsui.s
@@ -1,408 +1,751 @@
-// RUN: llvm-mc -triple aarch64 -mattr=+lsui -show-encoding %s  | FileCheck %s
-// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>&1  | FileCheck %s 
--check-prefix=ERROR
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \
+// RUN:  | llvm-objdump -d --mattr=+lsui --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \
+// RUN:   | llvm-objdump -d --mattr=-lsui --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+lsui -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
 
-_func:
-// CHECK: _func:
 
//--
 // Unprivileged load/store operations
 
//--
-  ldtxr   x9, [sp]
-// CHECK: ldtxrx9, [sp]// encoding: 
[0xe9,0x7f,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x9, [sp, #0]
-// CHECK: ldtxrx9, [sp]// encoding: 
[0xe9,0x7f,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x10, [x11]
-// CHECK: ldtxrx10, [x11]  // encoding: 
[0x6a,0x7d,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x10, [x11, #0]
-// CHECK: ldtxrx10, [x11]  // encoding: 
[0x6a,0x7d,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  ldatxr  x9, [sp]
-// CHECK: ldatxr   x9, [sp]// encoding: 
[0xe9,0xff,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldatxr  x10, [x11]
-// CHECK: ldatxr   x10, [x11]  // encoding: 
[0x6a,0xfd,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  sttxr   wzr, w4, [sp]
-// CHECK: sttxrwzr, w4, [sp]   // encoding: 
[0xe4,0x7f,0x1f,0x89]
-// ERROR: error: instruction requires: lsui
-  sttxr   wzr, w4, [sp, #0]
-// CHECK: sttxrwzr, w4, [sp]   // encoding: 
[0xe4,0x7f,0x1f,0x89]
-// ERROR: error: instruction requires: lsui
-  sttxr   w5, x6, [x7]
-// CHECK: stt

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray edited 
https://github.com/llvm/llvm-project/pull/146329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)

2025-06-30 Thread Jay Foad via llvm-branch-commits



@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const 
MachineInstr &MI,
 
   return false;
 }
+
+bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const {
+  const GICmp *Cmp = cast(&MI);
+
+  ICmpInst::Predicate CC = Cmp->getCond();
+  if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE)

jayfoad wrote:

Use `CmpInst::isEquality`

https://github.com/llvm/llvm-project/pull/146055
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Jay Foad via llvm-branch-commits



@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();

jayfoad wrote:

> Should I bother supporting vector types here?

I don't have a strong opinion on that. You could leave a TODO?

https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Jay Foad via llvm-branch-commits



@@ -28909,13 +28909,99 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();
+
+  if (!VT.isScalarInteger() || Root.getOpcode() != ISD::AND)
+return SDValue();
+
+  SDValue N0 = Root.getOperand(0);
+  SDValue N1 = Root.getOperand(1);
+
+  if (N0.getOpcode() != ISD::OR || !isa(N1))
+return SDValue();
+
+  APInt RootMask = cast(N1)->getAsAPIntVal();
+
+  SDValue Src;
+  const auto IsSrc = [&](SDValue V) {
+if (!Src) {
+  Src = V;
+  return true;
+}
+
+return Src == V;
+  };
+
+  SmallVector Worklist = {N0};
+  APInt PartsMask(VT.getSizeInBits(), 0);
+  while (!Worklist.empty()) {
+SDValue V = Worklist.pop_back_val();
+if (!V.hasOneUse() && (Src && Src != V))
+  return SDValue();
+
+if (V.getOpcode() == ISD::OR) {
+  Worklist.push_back(V.getOperand(0));
+  Worklist.push_back(V.getOperand(1));
+  continue;
+}
+
+if (V.getOpcode() == ISD::SRL) {
+  SDValue ShiftSrc = V.getOperand(0);
+  SDValue ShiftAmt = V.getOperand(1);
+
+  if (!IsSrc(ShiftSrc) || !isa(ShiftAmt))
+return SDValue();
+
+  auto ShiftAmtVal = cast(ShiftAmt)->getAsZExtVal();
+  if (ShiftAmtVal > RootMask.getBitWidth())
+return SDValue();
+
+  PartsMask |= (RootMask << ShiftAmtVal);
+  continue;
+}
+
+if (IsSrc(V)) {
+  PartsMask |= RootMask;
+  continue;
+}
+
+return SDValue();
+  }
+
+  if (!Src)
+return SDValue();
+
+  SDLoc DL(Root);
+  return DAG.getNode(ISD::AND, DL, VT,
+ {Src, DAG.getConstant(PartsMask, DL, VT)});
+}
+
 /// This is a stub for TargetLowering::SimplifySetCC.
 SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
ISD::CondCode Cond, const SDLoc &DL,
bool foldBooleans) {
   TargetLowering::DAGCombinerInfo
 DagCombineInfo(DAG, Level, false, this);
-  return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);
+  if (SDValue C =
+  TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, 
DL))
+return C;
+
+  if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) &&

jayfoad wrote:

Use `isIntEqualitySetCC`

https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146053

>From f137136b2f527aaf1b2f2847e821085aabfc299e Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 26 Jun 2025 13:08:31 +0200
Subject: [PATCH 1/2] [AMDGPU] Add tests for workgroup/workitem intrinsic
 optimizations

---
 .../AMDGPU/workitems-intrinsics-opts.ll   | 553 ++
 1 file changed, 553 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll

diff --git a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll 
b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
new file mode 100644
index 0..14120680216fc
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
@@ -0,0 +1,553 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s 
--check-prefixes=GFX8,DAGISEL-GFX9
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s 
--check-prefixes=GFX942,DAGISEL-GFX942
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s 
--check-prefixes=GFX12,DAGISEL-GFX12
+
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s 
--check-prefixes=GFX8,GISEL-GFX8
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck 
%s --check-prefixes=GFX942,GISEL-GFX942
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck 
%s --check-prefixes=GFX12,GISEL-GFX12
+
+; (workitem_id_x | workitem_id_y | workitem_id_z) == 0
+define i1 @workitem_zero() {
+; DAGISEL-GFX9-LABEL: workitem_zero:
+; DAGISEL-GFX9:   ; %bb.0: ; %entry
+; DAGISEL-GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX9-NEXT:v_or_b32_e32 v1, v31, v1
+; DAGISEL-GFX9-NEXT:v_or_b32_e32 v0, v1, v0
+; DAGISEL-GFX9-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX9-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; DAGISEL-GFX9-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; DAGISEL-GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; DAGISEL-GFX942-LABEL: workitem_zero:
+; DAGISEL-GFX942:   ; %bb.0: ; %entry
+; DAGISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX942-NEXT:v_or3_b32 v0, v31, v1, v0
+; DAGISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; DAGISEL-GFX942-NEXT:s_nop 1
+; DAGISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; DAGISEL-GFX942-NEXT:s_setpc_b64 s[30:31]
+;
+; DAGISEL-GFX12-LABEL: workitem_zero:
+; DAGISEL-GFX12:   ; %bb.0: ; %entry
+; DAGISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_expcnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_samplecnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_bvhcnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_kmcnt 0x0
+; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
+; DAGISEL-GFX12-NEXT:v_or3_b32 v0, v31, v1, v0
+; DAGISEL-GFX12-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1)
+; DAGISEL-GFX12-NEXT:v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; DAGISEL-GFX12-NEXT:s_wait_alu 0xfffd
+; DAGISEL-GFX12-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc_lo
+; DAGISEL-GFX12-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX8-LABEL: workitem_zero:
+; GISEL-GFX8:   ; %bb.0: ; %entry
+; GISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v31
+; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 10, 10
+; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1
+; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 20, 10
+; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1
+; GISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; GISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; GISEL-GFX8-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX942-LABEL: workitem_zero:
+; GISEL-GFX942:   ; %bb.0: ; %entry
+; GISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v31
+; GISEL-GFX942-NEXT:v_bfe_u32 v1, v31, 10, 10
+; GISEL-GFX942-NEXT:v_bfe_u32 v2, v31, 20, 10
+; GISEL-GFX942-NEXT:v_or3_b32 v0, v0, v1, v2
+; GISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; GISEL-GFX942-NEXT:s_nop 1
+; GISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; GISEL-GFX942-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX12-LABEL: workitem_zero:
+; GISEL-GFX12:   ; %bb.0: ; %entry
+; GISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0
+; GISEL-GFX12-NEXT:s_wait_expcnt 0x0
+; GISEL-GFX12-NEXT:s_wait_samplecnt 0x0
+; GISEL-GFX12-NEXT:s_wait_bvhcnt 0x0
+; GISEL-GFX1

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Jay Foad via llvm-branch-commits


jayfoad wrote:

Does this also handle the case where _all_ of the values ORed together are 
shifted, like `(setcc ((x >> c0 | x >> c1 | ...) & mask))` ?

https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146055

>From d97992ef24abae69878fd1e49270bf0f7372ca39 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 27 Jun 2025 12:04:53 +0200
Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together.

Equivalent of the previous DAG patch for GISel.
The shifts are BFXs in GISel, so the canonical form of the entire expression
is different than in the DAG. The mask is not at the root of the expression, it
remains on the leaves instead.

See #136727
---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   2 +
 .../include/llvm/Target/GlobalISel/Combine.td |  11 +-
 .../GlobalISel/CombinerHelperCompares.cpp |  89 +
 .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++
 5 files changed, 483 insertions(+), 139 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index c15263e0b06f8..5ec82c30f268f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -641,6 +641,8 @@ class CombinerHelper {
   /// KnownBits information.
   bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  bool combineMergedBFXCompare(MachineInstr &MI) const;
+
   /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2)
   bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 4a92dc16c1bf4..cba46a5edf9ec 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule<
  (G_ICMP $root, $p, $ordst, 0))
 >;
 
+// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0)
+// into a compare of a extract/mask of X
+def icmp_merged_bfx_combine: GICombineRule<
+  (defs root:$root),
+  (combine (G_ICMP $dst, $p, $src, 0):$root,
+   [{ return Helper.combineMergedBFXCompare(*${root}); }])
+>;
+
 def and_or_disjoint_mask : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$info),
   (match (wip_match_opcode G_AND):$root,
@@ -2052,7 +2060,8 @@ def all_combines : 
GICombineGroup<[integer_reassoc_combines, trivial_combines,
 fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
 simplify_neg_minmax, combine_concat_vector,
 sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines,
-combine_use_vector_truncate, merge_combines, overflow_combines]>;
+combine_use_vector_truncate, merge_combines, overflow_combines,
+icmp_merged_bfx_combine]>;
 
 // A combine group used to for prelegalizer combiners at -O0. The combines in
 // this group have been selected based on experiments to balance code size and
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
index fc40533cf3dc9..e1d43f37bac13 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const 
MachineInstr &MI,
 
   return false;
 }
+
+bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const {
+  const GICmp *Cmp = cast(&MI);
+
+  ICmpInst::Predicate CC = Cmp->getCond();
+  if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE)
+return false;
+
+  Register CmpLHS = Cmp->getLHSReg();
+  Register CmpRHS = Cmp->getRHSReg();
+
+  LLT OpTy = MRI.getType(CmpLHS);
+  if (!OpTy.isScalar() || OpTy.isPointer())
+return false;
+
+  assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false));
+
+  Register Src;
+  const auto IsSrc = [&](Register R) {
+if (!Src) {
+  Src = R;
+  return true;
+}
+
+return Src == R;
+  };
+
+  MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS);
+  if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR)
+return false;
+
+  APInt PartsMask(OpTy.getSizeInBits(), 0);
+  SmallVector Worklist = {CmpLHSDef};
+  while (!Worklist.empty()) {
+MachineInstr *Cur = Worklist.pop_back_val();
+
+Register Dst = Cur->getOperand(0).getReg();
+if (!MRI.hasOneUse(Dst) && Dst != Src)
+  return false;
+
+if (Cur->getOpcode() == TargetOpcode::G_OR) {
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg()));
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg()));
+  continue;
+}
+
+if (Cur->getOpcode() == TargetOpcode::G_UBFX) {
+  Register Op = Cur->getOperand(1).getReg();
+  Register Width = Cur->getOperand(2).getReg();
+  Register Off = Cur->getOperand(3).getReg();
+
+  auto WidthCst = getIConstantVRegVal(Width, MRI);
+  auto

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146053

>From f137136b2f527aaf1b2f2847e821085aabfc299e Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 26 Jun 2025 13:08:31 +0200
Subject: [PATCH 1/2] [AMDGPU] Add tests for workgroup/workitem intrinsic
 optimizations

---
 .../AMDGPU/workitems-intrinsics-opts.ll   | 553 ++
 1 file changed, 553 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll

diff --git a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll 
b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
new file mode 100644
index 0..14120680216fc
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
@@ -0,0 +1,553 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s 
--check-prefixes=GFX8,DAGISEL-GFX9
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s 
--check-prefixes=GFX942,DAGISEL-GFX942
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s 
--check-prefixes=GFX12,DAGISEL-GFX12
+
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s 
--check-prefixes=GFX8,GISEL-GFX8
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck 
%s --check-prefixes=GFX942,GISEL-GFX942
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck 
%s --check-prefixes=GFX12,GISEL-GFX12
+
+; (workitem_id_x | workitem_id_y | workitem_id_z) == 0
+define i1 @workitem_zero() {
+; DAGISEL-GFX9-LABEL: workitem_zero:
+; DAGISEL-GFX9:   ; %bb.0: ; %entry
+; DAGISEL-GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX9-NEXT:v_or_b32_e32 v1, v31, v1
+; DAGISEL-GFX9-NEXT:v_or_b32_e32 v0, v1, v0
+; DAGISEL-GFX9-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX9-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; DAGISEL-GFX9-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; DAGISEL-GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; DAGISEL-GFX942-LABEL: workitem_zero:
+; DAGISEL-GFX942:   ; %bb.0: ; %entry
+; DAGISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX942-NEXT:v_or3_b32 v0, v31, v1, v0
+; DAGISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; DAGISEL-GFX942-NEXT:s_nop 1
+; DAGISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; DAGISEL-GFX942-NEXT:s_setpc_b64 s[30:31]
+;
+; DAGISEL-GFX12-LABEL: workitem_zero:
+; DAGISEL-GFX12:   ; %bb.0: ; %entry
+; DAGISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_expcnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_samplecnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_bvhcnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_kmcnt 0x0
+; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
+; DAGISEL-GFX12-NEXT:v_or3_b32 v0, v31, v1, v0
+; DAGISEL-GFX12-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1)
+; DAGISEL-GFX12-NEXT:v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; DAGISEL-GFX12-NEXT:s_wait_alu 0xfffd
+; DAGISEL-GFX12-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc_lo
+; DAGISEL-GFX12-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX8-LABEL: workitem_zero:
+; GISEL-GFX8:   ; %bb.0: ; %entry
+; GISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v31
+; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 10, 10
+; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1
+; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 20, 10
+; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1
+; GISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; GISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; GISEL-GFX8-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX942-LABEL: workitem_zero:
+; GISEL-GFX942:   ; %bb.0: ; %entry
+; GISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v31
+; GISEL-GFX942-NEXT:v_bfe_u32 v1, v31, 10, 10
+; GISEL-GFX942-NEXT:v_bfe_u32 v2, v31, 20, 10
+; GISEL-GFX942-NEXT:v_or3_b32 v0, v0, v1, v2
+; GISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; GISEL-GFX942-NEXT:s_nop 1
+; GISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; GISEL-GFX942-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX12-LABEL: workitem_zero:
+; GISEL-GFX12:   ; %bb.0: ; %entry
+; GISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0
+; GISEL-GFX12-NEXT:s_wait_expcnt 0x0
+; GISEL-GFX12-NEXT:s_wait_samplecnt 0x0
+; GISEL-GFX12-NEXT:s_wait_bvhcnt 0x0
+; GISEL-GFX1

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146054

>From 26615132899d40b8d245fd98d093ef8c26cdc3e1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 26 Jun 2025 13:31:37 +0200
Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences

Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or not.

It seems like a strange sequence at first but it's an idiom used by device
libs in device libs to check workitem IDs for AMDGPU.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727
---
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++-
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++--
 2 files changed, 91 insertions(+), 29 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 6ca243990c468..a6eb214762fcb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28912,13 +28912,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();
+
+  if (Root.getOpcode() != ISD::AND)
+return SDValue();
+
+  SDValue N0 = Root.getOperand(0);
+  SDValue N1 = Root.getOperand(1);
+
+  if (N0.getOpcode() != ISD::OR || !isa(N1))
+return SDValue();
+
+  APInt RootMask = cast(N1)->getAsAPIntVal();
+  if (!RootMask.isMask())
+return SDValue();
+
+  SDValue Src;
+  const auto IsSrc = [&](SDValue V) {
+if (!Src) {
+  Src = V;
+  return true;
+}
+
+return Src == V;
+  };
+
+  SmallVector Worklist = {N0};
+  APInt PartsMask(VT.getSizeInBits(), 0);
+  while (!Worklist.empty()) {
+SDValue V = Worklist.pop_back_val();
+if (!V.hasOneUse() && Src != V)
+  return SDValue();
+
+if (V.getOpcode() == ISD::OR) {
+  Worklist.push_back(V.getOperand(0));
+  Worklist.push_back(V.getOperand(1));
+  continue;
+}
+
+if (V.getOpcode() == ISD::SRL) {
+  SDValue ShiftSrc = V.getOperand(0);
+  SDValue ShiftAmt = V.getOperand(1);
+
+  if (!IsSrc(ShiftSrc) || !isa(ShiftAmt))
+return SDValue();
+
+  PartsMask |= (RootMask << 
cast(ShiftAmt)->getAsZExtVal());
+  continue;
+}
+
+if (IsSrc(V)) {
+  PartsMask |= RootMask;
+  continue;
+}
+
+return SDValue();
+  }
+
+  if (!RootMask.isMask() || !Src)
+return SDValue();
+
+  SDLoc DL(Root);
+  return DAG.getNode(ISD::AND, DL, VT,
+ {Src, DAG.getConstant(PartsMask, DL, VT)});
+}
+
 /// This is a stub for TargetLowering::SimplifySetCC.
 SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
ISD::CondCode Cond, const SDLoc &DL,
bool foldBooleans) {
   TargetLowering::DAGCombinerInfo
 DagCombineInfo(DAG, Level, false, this);
-  return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);
+  if (SDValue C =
+  TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, 
DL))
+return C;
+
+  if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) &&
+  N0.getOpcode() == ISD::AND && isNullConstant(N1)) {
+
+if (SDValue Res = matchMergedBFX(N0, DAG, TLI))
+  return DAG.getSetCC(DL, VT, Res, N1, Cond);
+  }
+
+  return SDValue();
 }
 
 /// Given an ISD::SDIV node expressing a divide by constant, return
diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll 
b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
index 07c4aeb1ac7df..64d055bc40e98 100644
--- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
+++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
@@ -12,11 +12,7 @@ define i1 @workitem_zero() {
 ; DAGISEL-GFX8-LABEL: workitem_zero:
 ; DAGISEL-GFX8:   ; %bb.0: ; %entry
 ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0
-; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31
 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
 ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31]
@@ -

[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146055

>From d97992ef24abae69878fd1e49270bf0f7372ca39 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 27 Jun 2025 12:04:53 +0200
Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together.

Equivalent of the previous DAG patch for GISel.
The shifts are BFXs in GISel, so the canonical form of the entire expression
is different than in the DAG. The mask is not at the root of the expression, it
remains on the leaves instead.

See #136727
---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   2 +
 .../include/llvm/Target/GlobalISel/Combine.td |  11 +-
 .../GlobalISel/CombinerHelperCompares.cpp |  89 +
 .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++
 5 files changed, 483 insertions(+), 139 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index c15263e0b06f8..5ec82c30f268f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -641,6 +641,8 @@ class CombinerHelper {
   /// KnownBits information.
   bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  bool combineMergedBFXCompare(MachineInstr &MI) const;
+
   /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2)
   bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 4a92dc16c1bf4..cba46a5edf9ec 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule<
  (G_ICMP $root, $p, $ordst, 0))
 >;
 
+// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0)
+// into a compare of a extract/mask of X
+def icmp_merged_bfx_combine: GICombineRule<
+  (defs root:$root),
+  (combine (G_ICMP $dst, $p, $src, 0):$root,
+   [{ return Helper.combineMergedBFXCompare(*${root}); }])
+>;
+
 def and_or_disjoint_mask : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$info),
   (match (wip_match_opcode G_AND):$root,
@@ -2052,7 +2060,8 @@ def all_combines : 
GICombineGroup<[integer_reassoc_combines, trivial_combines,
 fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
 simplify_neg_minmax, combine_concat_vector,
 sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines,
-combine_use_vector_truncate, merge_combines, overflow_combines]>;
+combine_use_vector_truncate, merge_combines, overflow_combines,
+icmp_merged_bfx_combine]>;
 
 // A combine group used to for prelegalizer combiners at -O0. The combines in
 // this group have been selected based on experiments to balance code size and
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
index fc40533cf3dc9..e1d43f37bac13 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const 
MachineInstr &MI,
 
   return false;
 }
+
+bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const {
+  const GICmp *Cmp = cast(&MI);
+
+  ICmpInst::Predicate CC = Cmp->getCond();
+  if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE)
+return false;
+
+  Register CmpLHS = Cmp->getLHSReg();
+  Register CmpRHS = Cmp->getRHSReg();
+
+  LLT OpTy = MRI.getType(CmpLHS);
+  if (!OpTy.isScalar() || OpTy.isPointer())
+return false;
+
+  assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false));
+
+  Register Src;
+  const auto IsSrc = [&](Register R) {
+if (!Src) {
+  Src = R;
+  return true;
+}
+
+return Src == R;
+  };
+
+  MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS);
+  if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR)
+return false;
+
+  APInt PartsMask(OpTy.getSizeInBits(), 0);
+  SmallVector Worklist = {CmpLHSDef};
+  while (!Worklist.empty()) {
+MachineInstr *Cur = Worklist.pop_back_val();
+
+Register Dst = Cur->getOperand(0).getReg();
+if (!MRI.hasOneUse(Dst) && Dst != Src)
+  return false;
+
+if (Cur->getOpcode() == TargetOpcode::G_OR) {
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg()));
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg()));
+  continue;
+}
+
+if (Cur->getOpcode() == TargetOpcode::G_UBFX) {
+  Register Op = Cur->getOperand(1).getReg();
+  Register Width = Cur->getOperand(2).getReg();
+  Register Off = Cur->getOperand(3).getReg();
+
+  auto WidthCst = getIConstantVRegVal(Width, MRI);
+  auto

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146054

>From 26615132899d40b8d245fd98d093ef8c26cdc3e1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 26 Jun 2025 13:31:37 +0200
Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences

Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or not.

It seems like a strange sequence at first but it's an idiom used by device
libs in device libs to check workitem IDs for AMDGPU.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727
---
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++-
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++--
 2 files changed, 91 insertions(+), 29 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 6ca243990c468..a6eb214762fcb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28912,13 +28912,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();
+
+  if (Root.getOpcode() != ISD::AND)
+return SDValue();
+
+  SDValue N0 = Root.getOperand(0);
+  SDValue N1 = Root.getOperand(1);
+
+  if (N0.getOpcode() != ISD::OR || !isa(N1))
+return SDValue();
+
+  APInt RootMask = cast(N1)->getAsAPIntVal();
+  if (!RootMask.isMask())
+return SDValue();
+
+  SDValue Src;
+  const auto IsSrc = [&](SDValue V) {
+if (!Src) {
+  Src = V;
+  return true;
+}
+
+return Src == V;
+  };
+
+  SmallVector Worklist = {N0};
+  APInt PartsMask(VT.getSizeInBits(), 0);
+  while (!Worklist.empty()) {
+SDValue V = Worklist.pop_back_val();
+if (!V.hasOneUse() && Src != V)
+  return SDValue();
+
+if (V.getOpcode() == ISD::OR) {
+  Worklist.push_back(V.getOperand(0));
+  Worklist.push_back(V.getOperand(1));
+  continue;
+}
+
+if (V.getOpcode() == ISD::SRL) {
+  SDValue ShiftSrc = V.getOperand(0);
+  SDValue ShiftAmt = V.getOperand(1);
+
+  if (!IsSrc(ShiftSrc) || !isa(ShiftAmt))
+return SDValue();
+
+  PartsMask |= (RootMask << 
cast(ShiftAmt)->getAsZExtVal());
+  continue;
+}
+
+if (IsSrc(V)) {
+  PartsMask |= RootMask;
+  continue;
+}
+
+return SDValue();
+  }
+
+  if (!RootMask.isMask() || !Src)
+return SDValue();
+
+  SDLoc DL(Root);
+  return DAG.getNode(ISD::AND, DL, VT,
+ {Src, DAG.getConstant(PartsMask, DL, VT)});
+}
+
 /// This is a stub for TargetLowering::SimplifySetCC.
 SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
ISD::CondCode Cond, const SDLoc &DL,
bool foldBooleans) {
   TargetLowering::DAGCombinerInfo
 DagCombineInfo(DAG, Level, false, this);
-  return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);
+  if (SDValue C =
+  TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, 
DL))
+return C;
+
+  if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) &&
+  N0.getOpcode() == ISD::AND && isNullConstant(N1)) {
+
+if (SDValue Res = matchMergedBFX(N0, DAG, TLI))
+  return DAG.getSetCC(DL, VT, Res, N1, Cond);
+  }
+
+  return SDValue();
 }
 
 /// Given an ISD::SDIV node expressing a divide by constant, return
diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll 
b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
index 07c4aeb1ac7df..64d055bc40e98 100644
--- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
+++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
@@ -12,11 +12,7 @@ define i1 @workitem_zero() {
 ; DAGISEL-GFX8-LABEL: workitem_zero:
 ; DAGISEL-GFX8:   ; %bb.0: ; %entry
 ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0
-; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31
 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
 ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31]
@@ -

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #145911)

2025-06-30 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/145911

>From 046418f7ccd46a2b0c2ea3c9ab15e659de709b27 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 5 Jun 2025 12:17:13 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in
 regbanklegalize

---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   | 317 --
 .../AMDGPU/GlobalISel/readanylane-combines.ll |  25 +-
 .../GlobalISel/readanylane-combines.mir   |  78 ++---
 .../GlobalISel/regbankselect-and-s1.mir   |   6 +
 .../GlobalISel/regbankselect-anyext.mir   |   4 +
 .../AMDGPU/GlobalISel/regbankselect-trunc.mir |   2 +
 6 files changed, 246 insertions(+), 186 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index ba661348ca5b5..e1879598f098a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -23,6 +23,8 @@
 #include "GCNSubtarget.h"
 #include "llvm/CodeGen/GlobalISel/CSEInfo.h"
 #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h"
+#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
+#include "llvm/CodeGen/GlobalISel/Utils.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineUniformityAnalysis.h"
 #include "llvm/CodeGen/TargetPassConfig.h"
@@ -115,126 +117,233 @@ class AMDGPURegBankLegalizeCombiner {
 VgprRB(&RBI.getRegBank(AMDGPU::VGPRRegBankID)),
 VccRB(&RBI.getRegBank(AMDGPU::VCCRegBankID)) {};
 
-  bool isLaneMask(Register Reg) {
-const RegisterBank *RB = MRI.getRegBankOrNull(Reg);
-if (RB && RB->getID() == AMDGPU::VCCRegBankID)
-  return true;
+  bool isLaneMask(Register Reg);
+  std::pair tryMatch(Register Src, unsigned Opcode);
+  std::pair tryMatchRALFromUnmerge(Register Src);
+  Register getReadAnyLaneSrc(Register Src);
+  void replaceRegWithOrBuildCopy(Register Dst, Register Src);
 
-const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg);
-return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1);
-  }
+  bool tryEliminateReadAnyLane(MachineInstr &Copy);
+  void tryCombineCopy(MachineInstr &MI);
+  void tryCombineS1AnyExt(MachineInstr &MI);
+};
 
-  void cleanUpAfterCombine(MachineInstr &MI, MachineInstr *Optional0) {
-MI.eraseFromParent();
-if (Optional0 && isTriviallyDead(*Optional0, MRI))
-  Optional0->eraseFromParent();
-  }
+bool AMDGPURegBankLegalizeCombiner::isLaneMask(Register Reg) {
+  const RegisterBank *RB = MRI.getRegBankOrNull(Reg);
+  if (RB && RB->getID() == AMDGPU::VCCRegBankID)
+return true;
 
-  std::pair tryMatch(Register Src, unsigned Opcode) {
-MachineInstr *MatchMI = MRI.getVRegDef(Src);
-if (MatchMI->getOpcode() != Opcode)
-  return {nullptr, Register()};
-return {MatchMI, MatchMI->getOperand(1).getReg()};
-  }
+  const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg);
+  return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1);
+}
 
-  void tryCombineCopy(MachineInstr &MI) {
-Register Dst = MI.getOperand(0).getReg();
-Register Src = MI.getOperand(1).getReg();
-// Skip copies of physical registers.
-if (!Dst.isVirtual() || !Src.isVirtual())
-  return;
-
-// This is a cross bank copy, sgpr S1 to lane mask.
-//
-// %Src:sgpr(s1) = G_TRUNC %TruncS32Src:sgpr(s32)
-// %Dst:lane-mask(s1) = COPY %Src:sgpr(s1)
-// ->
-// %Dst:lane-mask(s1) = G_AMDGPU_COPY_VCC_SCC %TruncS32Src:sgpr(s32)
-if (isLaneMask(Dst) && MRI.getRegBankOrNull(Src) == SgprRB) {
-  auto [Trunc, TruncS32Src] = tryMatch(Src, AMDGPU::G_TRUNC);
-  assert(Trunc && MRI.getType(TruncS32Src) == S32 &&
- "sgpr S1 must be result of G_TRUNC of sgpr S32");
-
-  B.setInstr(MI);
-  // Ensure that truncated bits in BoolSrc are 0.
-  auto One = B.buildConstant({SgprRB, S32}, 1);
-  auto BoolSrc = B.buildAnd({SgprRB, S32}, TruncS32Src, One);
-  B.buildInstr(AMDGPU::G_AMDGPU_COPY_VCC_SCC, {Dst}, {BoolSrc});
-  cleanUpAfterCombine(MI, Trunc);
-  return;
-}
+std::pair
+AMDGPURegBankLegalizeCombiner::tryMatch(Register Src, unsigned Opcode) {
+  MachineInstr *MatchMI = MRI.getVRegDef(Src);
+  if (MatchMI->getOpcode() != Opcode)
+return {nullptr, Register()};
+  return {MatchMI, MatchMI->getOperand(1).getReg()};
+}
+
+std::pair
+AMDGPURegBankLegalizeCombiner::tryMatchRALFromUnmerge(Register Src) {
+  MachineInstr *ReadAnyLane = MRI.getVRegDef(Src);
+  if (ReadAnyLane->getOpcode() != AMDGPU::G_AMDGPU_READANYLANE)
+return {nullptr, -1};
+
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  if (auto *UnMerge = getOpcodeDef(RALSrc, MRI))
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
 
-// Src = G_AMDGPU_READANYLANE RALSrc
-// Dst = COPY Src
-// ->
-// Dst = RALSrc
-if (MRI.getRegBankOrNull(Dst) == VgprRB &&
-MRI.getRegBankOrNull(Src) == SgprRB) {
-

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #145912)

2025-06-30 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/145912

>From 7c5c7bf98afe91f015b36e42536a8a700b27b686 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 26 Jun 2025 16:03:56 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize

Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering
for divergent operands that must be sgpr.
---
 .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp   |  61 +++--
 .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h |   2 +
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  22 +-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |   6 +-
 .../AMDGPU/GlobalISel/buffer-schedule.ll  |   2 +-
 .../llvm.amdgcn.make.buffer.rsrc.ll   |   2 +-
 .../regbankselect-amdgcn.raw.buffer.load.ll   |  59 ++---
 ...egbankselect-amdgcn.raw.ptr.buffer.load.ll |  59 ++---
 ...regbankselect-amdgcn.struct.buffer.load.ll |  59 ++---
 ...ankselect-amdgcn.struct.ptr.buffer.load.ll |  59 ++---
 .../llvm.amdgcn.buffer.load-last-use.ll   |   2 +-
 .../llvm.amdgcn.raw.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.struct.atomic.buffer.load.ll  |  48 ++--
 ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll |  48 ++--
 .../CodeGen/AMDGPU/swizzle.bit.extract.ll |   4 +-
 18 files changed, 514 insertions(+), 243 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
index 00979f44f9d34..f36935d8c0e8f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
@@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) {
   return LLT::scalar(32);
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI);
-
-static void unmergeReadAnyLane(MachineIRBuilder &B,
-   SmallVectorImpl &SgprDstParts,
-   LLT UnmergeTy, Register VgprSrc,
-   const RegisterBankInfo &RBI) {
+template 
+static Register buildReadLane(MachineIRBuilder &, Register,
+  const RegisterBankInfo &, ReadLaneFnTy);
+
+template 
+static void
+unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl 
&SgprDstParts,
+   LLT UnmergeTy, Register VgprSrc, const RegisterBankInfo 
&RBI,
+   ReadLaneFnTy BuildRL) {
   const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID);
   auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc);
   for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
-SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI));
+SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL));
   }
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI) {
+template 
+static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc,
+  const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildRL) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID);
   if (Ty.getSizeInBits() == 32) {
-return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, 
{VgprSrc})
-.getReg(0);
+Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty});
+return BuildRL(B, SgprDst, VgprSrc).getReg(0);
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildRL);
 
   return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0);
 }
 
-void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
-  Register VgprSrc, const RegisterBankInfo &RBI) {
+template 
+static void buildReadLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildReadLane) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   if (Ty.getSizeInBits() == 32) {
-B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc});
+BuildReadLane(B, SgprDst, VgprSrc);
 return;
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildReadLane);
 
   B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0);
 }
+
+void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #145912)

2025-06-30 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/145912

>From 7c5c7bf98afe91f015b36e42536a8a700b27b686 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 26 Jun 2025 16:03:56 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize

Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering
for divergent operands that must be sgpr.
---
 .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp   |  61 +++--
 .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h |   2 +
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  22 +-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |   6 +-
 .../AMDGPU/GlobalISel/buffer-schedule.ll  |   2 +-
 .../llvm.amdgcn.make.buffer.rsrc.ll   |   2 +-
 .../regbankselect-amdgcn.raw.buffer.load.ll   |  59 ++---
 ...egbankselect-amdgcn.raw.ptr.buffer.load.ll |  59 ++---
 ...regbankselect-amdgcn.struct.buffer.load.ll |  59 ++---
 ...ankselect-amdgcn.struct.ptr.buffer.load.ll |  59 ++---
 .../llvm.amdgcn.buffer.load-last-use.ll   |   2 +-
 .../llvm.amdgcn.raw.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll |  42 +--
 .../llvm.amdgcn.struct.atomic.buffer.load.ll  |  48 ++--
 ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll |  48 ++--
 .../CodeGen/AMDGPU/swizzle.bit.extract.ll |   4 +-
 18 files changed, 514 insertions(+), 243 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
index 00979f44f9d34..f36935d8c0e8f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
@@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) {
   return LLT::scalar(32);
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI);
-
-static void unmergeReadAnyLane(MachineIRBuilder &B,
-   SmallVectorImpl &SgprDstParts,
-   LLT UnmergeTy, Register VgprSrc,
-   const RegisterBankInfo &RBI) {
+template 
+static Register buildReadLane(MachineIRBuilder &, Register,
+  const RegisterBankInfo &, ReadLaneFnTy);
+
+template 
+static void
+unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl 
&SgprDstParts,
+   LLT UnmergeTy, Register VgprSrc, const RegisterBankInfo 
&RBI,
+   ReadLaneFnTy BuildRL) {
   const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID);
   auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc);
   for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
-SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI));
+SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL));
   }
 }
 
-static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc,
- const RegisterBankInfo &RBI) {
+template 
+static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc,
+  const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildRL) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID);
   if (Ty.getSizeInBits() == 32) {
-return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, 
{VgprSrc})
-.getReg(0);
+Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty});
+return BuildRL(B, SgprDst, VgprSrc).getReg(0);
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildRL);
 
   return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0);
 }
 
-void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
-  Register VgprSrc, const RegisterBankInfo &RBI) {
+template 
+static void buildReadLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc, const RegisterBankInfo &RBI,
+  ReadLaneFnTy BuildReadLane) {
   LLT Ty = B.getMRI()->getType(VgprSrc);
   if (Ty.getSizeInBits() == 32) {
-B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc});
+BuildReadLane(B, SgprDst, VgprSrc);
 return;
   }
 
   SmallVector SgprDstParts;
-  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI);
+  unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI,
+ BuildReadLane);
 
   B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0);
 }
+
+void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst,
+  Register VgprSrc

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #145911)

2025-06-30 Thread Petar Avramovic via llvm-branch-commits



@@ -115,126 +117,233 @@ class AMDGPURegBankLegalizeCombiner {
 VgprRB(&RBI.getRegBank(AMDGPU::VGPRRegBankID)),
 VccRB(&RBI.getRegBank(AMDGPU::VCCRegBankID)) {};
 
-  bool isLaneMask(Register Reg) {
-const RegisterBank *RB = MRI.getRegBankOrNull(Reg);
-if (RB && RB->getID() == AMDGPU::VCCRegBankID)
-  return true;
+  bool isLaneMask(Register Reg);
+  std::pair tryMatch(Register Src, unsigned Opcode);
+  std::pair tryMatchRALFromUnmerge(Register Src);
+  Register getReadAnyLaneSrc(Register Src);
+  void replaceRegWithOrBuildCopy(Register Dst, Register Src);
 
-const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg);
-return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1);
-  }
+  bool tryEliminateReadAnyLane(MachineInstr &Copy);
+  void tryCombineCopy(MachineInstr &MI);
+  void tryCombineS1AnyExt(MachineInstr &MI);
+};
 
-  void cleanUpAfterCombine(MachineInstr &MI, MachineInstr *Optional0) {
-MI.eraseFromParent();
-if (Optional0 && isTriviallyDead(*Optional0, MRI))
-  Optional0->eraseFromParent();
-  }
+bool AMDGPURegBankLegalizeCombiner::isLaneMask(Register Reg) {
+  const RegisterBank *RB = MRI.getRegBankOrNull(Reg);
+  if (RB && RB->getID() == AMDGPU::VCCRegBankID)
+return true;
 
-  std::pair tryMatch(Register Src, unsigned Opcode) {
-MachineInstr *MatchMI = MRI.getVRegDef(Src);
-if (MatchMI->getOpcode() != Opcode)
-  return {nullptr, Register()};
-return {MatchMI, MatchMI->getOperand(1).getReg()};
-  }
+  const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg);
+  return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1);
+}
 
-  void tryCombineCopy(MachineInstr &MI) {
-Register Dst = MI.getOperand(0).getReg();
-Register Src = MI.getOperand(1).getReg();
-// Skip copies of physical registers.
-if (!Dst.isVirtual() || !Src.isVirtual())
-  return;
-
-// This is a cross bank copy, sgpr S1 to lane mask.
-//
-// %Src:sgpr(s1) = G_TRUNC %TruncS32Src:sgpr(s32)
-// %Dst:lane-mask(s1) = COPY %Src:sgpr(s1)
-// ->
-// %Dst:lane-mask(s1) = G_AMDGPU_COPY_VCC_SCC %TruncS32Src:sgpr(s32)
-if (isLaneMask(Dst) && MRI.getRegBankOrNull(Src) == SgprRB) {
-  auto [Trunc, TruncS32Src] = tryMatch(Src, AMDGPU::G_TRUNC);
-  assert(Trunc && MRI.getType(TruncS32Src) == S32 &&
- "sgpr S1 must be result of G_TRUNC of sgpr S32");
-
-  B.setInstr(MI);
-  // Ensure that truncated bits in BoolSrc are 0.
-  auto One = B.buildConstant({SgprRB, S32}, 1);
-  auto BoolSrc = B.buildAnd({SgprRB, S32}, TruncS32Src, One);
-  B.buildInstr(AMDGPU::G_AMDGPU_COPY_VCC_SCC, {Dst}, {BoolSrc});
-  cleanUpAfterCombine(MI, Trunc);
-  return;
-}
+std::pair
+AMDGPURegBankLegalizeCombiner::tryMatch(Register Src, unsigned Opcode) {
+  MachineInstr *MatchMI = MRI.getVRegDef(Src);
+  if (MatchMI->getOpcode() != Opcode)
+return {nullptr, Register()};
+  return {MatchMI, MatchMI->getOperand(1).getReg()};
+}

petar-avramovic wrote:

Can use mi_match, this is shorter because we use auto instead of declaring what 
we want to capture. To me at least, this has nicer formatting.
How about matchInstAndGetSrc?

https://github.com/llvm/llvm-project/pull/145911
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread Tomas Matheson via llvm-branch-commits



@@ -1,592 +1,697 @@
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding  
 -mattr=+the -mattr=+d128 < %s | FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding 
-mattr=+v8.9a -mattr=+the -mattr=+d128 < %s | FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding 
-mattr=+v9.4a -mattr=+the -mattr=+d128 < %s | FileCheck %s
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v8.9a < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v9.4a < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \
+// RUN:| llvm-objdump -d --mattr=+the,+d128 - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \
+// RUN:   | llvm-objdump -d --mattr=-the,-d128 - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+the,+d128 -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
 
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu   < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s
 
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu   -mattr=+the < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a -mattr=+the < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a -mattr=+the < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s
+mrs x3, RCWMASK_EL1
+// CHECK-INST: mrs x3, RCWMASK_EL1
+// CHECK-ENCODING: encoding: [0xc3,0xd0,0x38,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d538d0c3  mrs x3, S3_0_C13_C0_6
 
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+the 
-mattr=+d128 < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-ZXR %s
+msr RCWMASK_EL1, x1
+// CHECK-INST: msr RCWMASK_EL1, x1
+// CHECK-ENCODING: encoding: [0xc1,0xd0,0x18,0xd5]
+// CHECK-ERROR: error: expected writable system register or pstate
+// CHECK-UNKNOWN:  d518d0c1  msr S3_0_C13_C0_6, x1
 
-mrs x3, RCWMASK_EL1
-// CHECK:   mrs x3, RCWMASK_EL1   // encoding: [0xc3,0xd0,0x38,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register
-msr RCWMASK_EL1, x1
-// CHECK:   msr RCWMASK_EL1, x1   // encoding: [0xc1,0xd0,0x18,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or 
pstate
-mrs x3, RCWSMASK_EL1
-// CHECK:   mrs x3, RCWSMASK_EL1  // encoding: [0x63,0xd0,0x38,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register
-msr RCWSMASK_EL1, x1
-// CHECK:   msr RCWSMASK_EL1, x1  // encoding: [0x61,0xd0,0x18,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or 
pstate
+mrs x3, RCWSMASK_EL1
+// CHECK-INST: mrs x3, RCWSMASK_EL1
+// CHECK-ENCODING: encoding: [0x63,0xd0,0x38,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d538d063  mrs x3, S3_0_C13_C0_3
+msr RCWSMASK_EL1, x1
+// CHECK-INST: msr RCWSMASK_EL1, x1
+// CHECK-ENCODING: encoding: [0x61,0xd0,0x18,0xd5]
+// CHECK-ERROR: error: expected writable system register or pstate
+// CHECK-UNKNOWN:  d518d061  msr S3_0_C13_C0_3, x1
 
-rcwcas   x0, x1, [x4]
-// CHECK:   rcwcas   x0, x1, [x4] // encoding: [0x81,0x08,0x20,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
-rcwcasa  x0, x1, [x4]
-// CHECK:   rcwcasa  x0, x1, [x4] // encoding: [0x81,0x08,0xa0,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
-rcwcasal x0, x1, [x4]
-// CHECK:   rcwcasal x0, x1, [x4] // encoding: [0x81,0x08,0xe0,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
-rcwcasl  x0, x1, [x4]
-// CHECK:   rcwcasl  x0, x1, [x4] // encoding: [0x81,0x08,0x60,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread Tomas Matheson via llvm-branch-commits



@@ -16,28 +16,41 @@
 // RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,-clrbhb 
< %s | FileCheck %s --check-prefix=HINT_22
 
 // Optional, off by default, manually enabled
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | 
FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < 
%s | FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb 
< %s | FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < 
%s | FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb 
< %s | FileCheck %s --check-prefix=CLRBHB
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | 
FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < 
%s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb 
< %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < 
%s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb 
< %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST

tmatheson-arm wrote:

Why keeping a different test format for CLRBHB? 

https://github.com/llvm/llvm-project/pull/146330
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Tomas Matheson via llvm-branch-commits



@@ -1,55 +1,117 @@
-// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s 
| FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck 
--check-prefix=CHECK-NO-MEC %s
-
-  mrs x0, MECIDR_EL2
-// CHECK: mrs   x0, MECIDR_EL2   // encoding: [0xe0,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P0_EL2
-// CHECK: mrs   x0, MECID_P0_EL2  // encoding: [0x00,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A0_EL2
-// CHECK: mrs   x0, MECID_A0_EL2  // encoding: [0x20,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P1_EL2
-// CHECK: mrs   x0, MECID_P1_EL2  // encoding: [0x40,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A1_EL2
-// CHECK: mrs   x0, MECID_A1_EL2  // encoding: [0x60,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_P_EL2
-// CHECK: mrs   x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_A_EL2
-// CHECK: mrs   x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_RL_A_EL3
-// CHECK: mrs   x0, MECID_RL_A_EL3   // encoding: [0x20,0xaa,0x3e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  msr MECID_P0_EL2,x0
-// CHECK: msr   MECID_P0_EL2, x0  // encoding: [0x00,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A0_EL2,x0
-// CHECK: msr   MECID_A0_EL2, x0  // encoding: [0x20,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_P1_EL2,x0
-// CHECK: msr   MECID_P1_EL2, x0  // encoding: [0x40,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A1_EL2,x0
-// CHECK: msr   MECID_A1_EL2, x0  // encoding: [0x60,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr VMECID_P_EL2,   x0
-// CHECK: msr   VMECID_P_EL2, x0 // encoding: [0x00,0xa9,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr VMECID_A_EL2,   x0
-// CHECK: msr   VMECID_A_EL2, x0 // encoding: [0x20,0xa9,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_RL_A_EL3, x0
-// CHECK: msr   MECID_RL_A_EL3, x0   // encoding: [0x20,0xaa,0x1e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-
-  dc cigdpae, x0
-// CHECK: dc cigdpae, x0 // encoding: [0xe0,0x7e,0x0c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIGDPAE requires: mec
-  dc cipae, x0
-// CHECK: dc cipae, x0   // encoding: [0x00,0x7e,0x0c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIPAE requires: mec
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \
+// RUN:| llvm-objdump -d --mattr=+mec --no-print-imm-hex - | FileCheck 
%s --check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \
+// RUN:   | llvm-objdump -d --mattr=-mec --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+mec -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
+
+mrs x0, MECIDR_EL2
+// CHECK-INST: mrs x0, MECIDR_EL2
+// CHECK-ENCODING: encoding: [0xe0,0xa8,0x3c,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d53ca8e0 mrs x0, S3_4_C10_C8_7
+
+mrs x0, MECID_P0_EL2
+// CHECK-INST: mrs x0, MECID_P0_EL2
+// CHECK-ENCODING: encoding: [0x00,0xa8,0x3c,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d53ca800 mrs x0, S3_4_C10_C8_0
+
+mrs x0, MECID_A0_EL2
+// CHECK-INST: mrs x0, MECID_A0_EL2
+// CHECK-ENCODING: encoding: [0x20,0xa8,0x3c,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d53ca820 mrs

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Tomas Matheson via llvm-branch-commits



@@ -1,115 +1,203 @@
-// RUN: llvm-mc -triple aarch64 -mattr +gcs -show-encoding %s | FileCheck %s
-// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>%t | FileCheck %s 
--check-prefix=NO-GCS
-// RUN: FileCheck --check-prefix=ERROR-NO-GCS %s < %t
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \
+// RUN:| llvm-objdump -d --mattr=+gcs --no-print-imm-hex - | FileCheck 
%s --check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \
+// RUN:   | llvm-objdump -d --mattr=-gcs --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+gcs -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
 
 msr GCSCR_EL1, x0
+// CHECK-INST: msr GCSCR_EL1, x0
+// CHECK-ENCODING: encoding: [0x00,0x25,0x18,0xd5]
+// CHECK-UNKNOWN:  d5182500 msr GCSCR_EL1, x0
+
 mrs x1, GCSCR_EL1
-// CHECK: msr GCSCR_EL1, x0   // encoding: 
[0x00,0x25,0x18,0xd5]
-// CHECK: mrs x1, GCSCR_EL1   // encoding: 
[0x01,0x25,0x38,0xd5]
+// CHECK-INST: mrs x1, GCSCR_EL1
+// CHECK-ENCODING: encoding: [0x01,0x25,0x38,0xd5]
+// CHECK-UNKNOWN:  d5382501 mrs x1, GCSCR_EL1
 
 msr GCSPR_EL1, x2
+// CHECK-INST: msr GCSPR_EL1, x2
+// CHECK-ENCODING: encoding: [0x22,0x25,0x18,0xd5]
+// CHECK-UNKNOWN:  d5182522 msr GCSPR_EL1, x2
+
 mrs x3, GCSPR_EL1
-// CHECK: msr GCSPR_EL1, x2   // encoding: 
[0x22,0x25,0x18,0xd5]
-// CHECK: mrs x3, GCSPR_EL1   // encoding: 
[0x23,0x25,0x38,0xd5]
+// CHECK-INST: mrs x3, GCSPR_EL1
+// CHECK-ENCODING: encoding: [0x23,0x25,0x38,0xd5]
+// CHECK-UNKNOWN:  d5382523 mrs x3, GCSPR_EL1
 
 msr GCSCRE0_EL1, x4
+// CHECK-INST: msr GCSCRE0_EL1, x4
+// CHECK-ENCODING: encoding: [0x44,0x25,0x18,0xd5]
+// CHECK-UNKNOWN:  d5182544 msr GCSCRE0_EL1, x4
+
 mrs x5, GCSCRE0_EL1
-// CHECK: msr GCSCRE0_EL1, x4 // encoding: 
[0x44,0x25,0x18,0xd5]
-// CHECK: mrs x5, GCSCRE0_EL1 // encoding: 
[0x45,0x25,0x38,0xd5]
+// CHECK-INST: mrs x5, GCSCRE0_EL1
+// CHECK-ENCODING: encoding: [0x45,0x25,0x38,0xd5]
+// CHECK-UNKNOWN:  d5382545 mrs x5, GCSCRE0_EL1
 
 msr GCSPR_EL0, x6
+// CHECK-INST: msr GCSPR_EL0, x6
+// CHECK-ENCODING: encoding: [0x26,0x25,0x1b,0xd5]
+// CHECK-UNKNOWN:  d51b2526 msr GCSPR_EL0, x6
+
 mrs x7, GCSPR_EL0
-// CHECK: msr GCSPR_EL0, x6   // encoding: 
[0x26,0x25,0x1b,0xd5]
-// CHECK: mrs x7, GCSPR_EL0   // encoding: 
[0x27,0x25,0x3b,0xd5]
+// CHECK-INST: mrs x7, GCSPR_EL0
+// CHECK-ENCODING: encoding: [0x27,0x25,0x3b,0xd5]
+// CHECK-UNKNOWN:  d53b2527 mrs x7, GCSPR_EL0
 
 msr GCSCR_EL2, x10
+// CHECK-INST: msr GCSCR_EL2, x10
+// CHECK-ENCODING: encoding: [0x0a,0x25,0x1c,0xd5]
+// CHECK-UNKNOWN:  d51c250a msr GCSCR_EL2, x10
+
 mrs x11, GCSCR_EL2
-// CHECK: msr GCSCR_EL2, x10  // encoding: 
[0x0a,0x25,0x1c,0xd5]
-// CHECK: mrs x11, GCSCR_EL2  // encoding: 
[0x0b,0x25,0x3c,0xd5]
+// CHECK-INST: mrs x11, GCSCR_EL2
+// CHECK-ENCODING: encoding: [0x0b,0x25,0x3c,0xd5]
+// CHECK-UNKNOWN:  d53c250b mrs x11, GCSCR_EL2
 
 msr GCSPR_EL2, x12
+// CHECK-INST: msr GCSPR_EL2, x12
+// CHECK-ENCODING: encoding: [0x2c,0x25,0x1c,0xd5]
+// CHECK-UNKNOWN:  d51c252c msr GCSPR_EL2, x12
+
 mrs x13, GCSPR_EL2
-// CHECK: msr GCSPR_EL2, x12  // encoding: 
[0x2c,0x25,0x1c,0xd5]
-// CHECK: mrs x13, GCSPR_EL2  // encoding: 
[0x2d,0x25,0x3c,0xd5]
+// CHECK-INST: mrs x13, GCSPR_EL2
+// CHECK-ENCODING: encoding: [0x2d,0x25,0x3c,0xd5]
+// CHECK-UNKNOWN:  d53c252d mrs x13, GCSPR_EL2
 
 msr GCSCR_EL12, x14
+// CHECK-INST: msr GCSCR_EL12, x14
+// CHECK-ENCODING: encoding: [0x0e,0x25,0x1d,0xd5]
+// CHECK-UNKNOWN:  d51d250e msr GCSCR_EL12, x14
+
 mrs x15, GCSCR_EL12
-// CHECK: msr GCSCR_EL12, x14 // encoding: 
[0x0e,0x25,0x1d,0xd5]
-// CHECK: mrs x15, GCSCR_EL12 // encoding: 
[0x0f,0x25,0x3d,0xd5]
+// CHECK-INST: mrs x15, GCSCR_EL12
+// CHECK-ENCODING: encoding: [0x0f,0x25,0x3d,0xd5]
+// CHECK-UNKNOWN:  d53d250f mrs x15, GCSCR_EL12
 
 msr GCSPR_EL12, x16
+// CHECK-INST: msr GCSPR_EL12, x16
+// CHECK-ENCODING: encoding: [0x30,0x25,0x1d,0xd5]
+// CHECK-UNKNOWN:  d51d2530 msr GCSPR_EL12, x16
+
 mrs x17, GCSPR_EL12
-// CHECK: msr GCSPR_EL12, x16 // encoding: 
[0x30,0x25,0x1d,0xd5]
-// CHECK: mrs x17, GCSPR_EL12 // encoding: 
[0

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Tomas Matheson via llvm-branch-commits



@@ -0,0 +1,138 @@
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \
+// RUN:| llvm-objdump -d --mattr=+lse128 - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \
+// RUN:   | llvm-objdump -d --mattr=-lse128 - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+lse128 -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
+
+ldclrp   x1, x2, [x11]
+// CHECK-INST: ldclrp x1, x2, [x11]
+// CHECK-ENCODING: encoding: [0x61,0x11,0x22,0x19]
+// CHECK-ERROR: :[[@LINE-3]]:1: error: instruction requires: lse128
+// CHECK-UNKNOWN:  19221161 
+ldclrp   x21, x22, [sp]

tmatheson-arm wrote:

No spaces between cases?

https://github.com/llvm/llvm-project/pull/146329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_f16_bf8` on gfx1250 (PR #146305)

2025-06-30 Thread Shilei Tian via llvm-branch-commits


shiltian wrote:

### Merge activity

* **Jun 30, 11:47 AM UTC**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/146305).


https://github.com/llvm/llvm-project/pull/146305
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [gtest] Fix building on OpenBSD/sparc64 (#145225) (PR #146155)

2025-06-30 Thread Aaron Ballman via llvm-branch-commits


https://github.com/AaronBallman approved this pull request.

LGTM!

https://github.com/llvm/llvm-project/pull/146155
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray edited 
https://github.com/llvm/llvm-project/pull/146330
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Intrinsic for launching whole wave functions (PR #145859)

2025-06-30 Thread Diana Picus via llvm-branch-commits



@@ -297,8 +297,13 @@ namespace CallingConv {
 /// directly or indirectly via a call-like instruction.
 constexpr bool isCallableCC(CallingConv::ID CC) {
   switch (CC) {
+  // Called with special intrinsics:
+  // llvm.amdgcn.cs.chain
   case CallingConv::AMDGPU_CS_Chain:
   case CallingConv::AMDGPU_CS_ChainPreserve:
+  // llvm.amdgcn.call.whole.wave
+  case CallingConv::AMDGPU_Gfx_WholeWave:

rovka wrote:

Yeah, that's in the [previous 
patch](https://github.com/llvm/llvm-project/pull/145858) in this stack. I've 
added some more tests like you requested :)

https://github.com/llvm/llvm-project/pull/145859
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)

2025-06-30 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/146343

Currently wasm adds an extra level of options that work backwards
from the standard options, and overwrites them. The ExceptionModel
field in TM->Options is the standard user configuration option for the
exception model to use. MCAsmInfo's ExceptionsType is a constant for the
default to use for the triple if not explicitly set in the TargetOptions
ExceptionModel. This was adding 2 custom flags, changing the MCAsmInfo
default, and overwriting the ExceptionModel from the custom flags.

These comments about compiling bitcode with clang are describing a toolchain
bug or user error. TargetOptions is bad, and we should move to eliminating it.
It is module state not captured in the IR. Ideally the exception model should 
either
come implied from the triple, or a module flag and not depend on this side 
state.
Currently it is the  responsibility of the toolchain and/or user to ensure the 
same
command line flags are used at each phase of the compilation. It is not the 
backend's
responsibilty to try to second guess these options.

-wasm-enable-eh and -wasm-enable-sjlj should also be removed in favor of the 
standard
exception control. I'm a bit confused by how all of these fields are supposed 
to interact,
but there are a few uses in the backend that are directly looking at these 
flags instead
of the already parsed ExceptionModel which need to be cleaned up.

Additionally, this was enforcing some rules about the combinations of flags at 
a random
point in the IR pass pipeline configuration. This is a module property that 
should
be handled at TargetMachine construction time at the latest. This required 
adding flags
to a few mir and clang tests which never got this far to avoid hitting the 
errors.

>From 9868f97f4e1f71dfbb2c12b3f6a9a0f04f5bd42c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 30 Jun 2025 15:26:44 +0900
Subject: [PATCH] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based
 on flags

Currently wasm adds an extra level of options that work backwards
from the standard options, and overwrites them. The ExceptionModel
field in TM->Options is the standard user configuration option for the
exception model to use. MCAsmInfo's ExceptionsType is a constant for the
default to use for the triple if not explicitly set in the TargetOptions
ExceptionModel. This was adding 2 custom flags, changing the MCAsmInfo
default, and overwriting the ExceptionModel from the custom flags.

These comments about compiling bitcode with clang are describing a toolchain
bug or user error. TargetOptions is bad, and we should move to eliminating it.
It is module state not captured in the IR. Ideally the exception model should 
either
come implied from the triple, or a module flag and not depend on this side 
state.
Currently it is the  responsibility of the toolchain and/or user to ensure the 
same
command line flags are used at each phase of the compilation. It is not the 
backend's
responsibilty to try to second guess these options.

-wasm-enable-eh and -wasm-enable-sjlj should also be removed in favor of the 
standard
exception control. I'm a bit confused by how all of these fields are supposed 
to interact,
but there are a few uses in the backend that are directly looking at these 
flags instead
of the already parsed ExceptionModel which need to be cleaned up.

Additionally, this was enforcing some rules about the combinations of flags at 
a random
point in the IR pass pipeline configuration. This is a module property that 
should
be handled at TargetMachine construction time at the latest. This required 
adding flags
to a few mir and clang tests which never got this far to avoid hitting the 
errors.
---
 ...asm-exception-model-flag-parse-ir-input.ll |   7 +-
 clang/test/CodeGenCXX/builtins-eh-wasm.cpp|   2 +-
 clang/test/CodeGenCXX/wasm-eh.cpp |   6 +-
 .../MCTargetDesc/WebAssemblyMCAsmInfo.cpp |   9 +-
 .../MCTargetDesc/WebAssemblyMCTargetDesc.cpp  |  29 
 .../MCTargetDesc/WebAssemblyMCTargetDesc.h|   7 -
 .../WebAssembly/WebAssemblyAsmPrinter.cpp |  13 +-
 .../WebAssembly/WebAssemblyAsmPrinter.h   |   2 +-
 .../WebAssembly/WebAssemblyCFGStackify.cpp|   2 +-
 .../WebAssembly/WebAssemblyLateEHPrepare.cpp  |   2 +-
 .../WebAssembly/WebAssemblyMCInstLower.cpp|   4 +-
 .../WebAssembly/WebAssemblyTargetMachine.cpp  | 139 ++
 .../WebAssembly/WebAssemblyTargetMachine.h|   9 ++
 .../WebAssembly/cfg-stackify-eh-legacy.mir|   2 +-
 .../CodeGen/WebAssembly/exception-legacy.mir  |   2 +-
 .../CodeGen/WebAssembly/function-info.mir |   2 +-
 16 files changed, 113 insertions(+), 124 deletions(-)

diff --git 
a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll 
b/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll
index 4a7eeece58717..85bfc7f74daed 100644
--- a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse

[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)

2025-06-30 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/146343?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#146343** https://app.graphite.dev/github/pr/llvm/llvm-project/146343?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/146343?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#146342** https://app.graphite.dev/github/pr/llvm/llvm-project/146342?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/146343
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-backend-webassembly

Author: Matt Arsenault (arsenm)


Changes

Currently wasm adds an extra level of options that work backwards
from the standard options, and overwrites them. The ExceptionModel
field in TM->Options is the standard user configuration option for the
exception model to use. MCAsmInfo's ExceptionsType is a constant for the
default to use for the triple if not explicitly set in the TargetOptions
ExceptionModel. This was adding 2 custom flags, changing the MCAsmInfo
default, and overwriting the ExceptionModel from the custom flags.

These comments about compiling bitcode with clang are describing a toolchain
bug or user error. TargetOptions is bad, and we should move to eliminating it.
It is module state not captured in the IR. Ideally the exception model should 
either
come implied from the triple, or a module flag and not depend on this side 
state.
Currently it is the  responsibility of the toolchain and/or user to ensure the 
same
command line flags are used at each phase of the compilation. It is not the 
backend's
responsibilty to try to second guess these options.

-wasm-enable-eh and -wasm-enable-sjlj should also be removed in favor of the 
standard
exception control. I'm a bit confused by how all of these fields are supposed 
to interact,
but there are a few uses in the backend that are directly looking at these 
flags instead
of the already parsed ExceptionModel which need to be cleaned up.

Additionally, this was enforcing some rules about the combinations of flags at 
a random
point in the IR pass pipeline configuration. This is a module property that 
should
be handled at TargetMachine construction time at the latest. This required 
adding flags
to a few mir and clang tests which never got this far to avoid hitting the 
errors.

---

Patch is 25.33 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146343.diff


16 Files Affected:

- (modified) 
clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll 
(+4-3) 
- (modified) clang/test/CodeGenCXX/builtins-eh-wasm.cpp (+1-1) 
- (modified) clang/test/CodeGenCXX/wasm-eh.cpp (+3-3) 
- (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCAsmInfo.cpp 
(+1-8) 
- (modified) 
llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.cpp (-29) 
- (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h 
(-7) 
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp (+7-6) 
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h (+1-1) 
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (+1-1) 
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyLateEHPrepare.cpp (+1-1) 
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyMCInstLower.cpp (+1-3) 
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp (+81-58) 
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.h (+9) 
- (modified) llvm/test/CodeGen/WebAssembly/cfg-stackify-eh-legacy.mir (+1-1) 
- (modified) llvm/test/CodeGen/WebAssembly/exception-legacy.mir (+1-1) 
- (modified) llvm/test/CodeGen/WebAssembly/function-info.mir (+1-1) 


``diff
diff --git 
a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll 
b/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll
index 4a7eeece58717..85bfc7f74daed 100644
--- a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll
+++ b/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll
@@ -2,15 +2,16 @@
 
 ; Check all the options parse
 ; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=none %s | 
FileCheck %s
-; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=wasm %s | 
FileCheck %s
-; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=dwarf %s | 
FileCheck %s
-; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=sjlj %s | 
FileCheck %s
+; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=wasm -mllvm 
-wasm-enable-eh %s | FileCheck %s
 
 ; RUN: not %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=invalid 
%s 2>&1 | FileCheck -check-prefix=ERR %s
+; RUN: not %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=dwarf %s 
2>&1 | FileCheck -check-prefix=ERR-BE %s
+; RUN: not %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=sjlj %s 
2>&1 | FileCheck -check-prefix=ERR-BE %s
 
 ; CHECK-LABEL: define void @test(
 
 ; ERR: error: invalid value 'invalid' in '-exception-model=invalid'
+; ERR-BE: fatal error: error in backend: -exception-model should be either 
'none' or 'wasm'
 define void @test() {
   ret void
 }
diff --git a/clang/test/CodeGenCXX/builtins-eh-wasm.cpp 
b/clang/test/CodeGenCXX/builtins-eh-wasm.cpp
index b0f763d3e54dc..9a7134c48f208 100644
--- a/clang/test/CodeGenCXX/builtins-eh-wasm.cpp
+++ b/clang/test/CodeGenCXX/builtins-eh-wa

[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)

2025-06-30 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/146343
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add tests for missing readanylane combines (PR #145910)

2025-06-30 Thread Petar Avramovic via llvm-branch-commits



@@ -0,0 +1,166 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 
-new-reg-bank-select < %s | FileCheck %s
+
+define amdgpu_ps void @readanylane_to_virtual_vgpr(ptr addrspace(1) inreg 
%ptr0, ptr addrspace(1) inreg %ptr1) {
+; CHECK-LABEL: readanylane_to_virtual_vgpr:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:v_mov_b32_e32 v0, 0
+; CHECK-NEXT:global_load_dword v1, v0, s[0:1] glc dlc
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:global_store_dword v0, v1, s[2:3]
+; CHECK-NEXT:s_endpgm
+  %load = load volatile float, ptr addrspace(1) %ptr0
+  store float %load, ptr addrspace(1) %ptr1
+  ret void
+}
+
+define amdgpu_ps float @readanylane_to_physical_vgpr(ptr addrspace(1) inreg 
%ptr) {
+; CHECK-LABEL: readanylane_to_physical_vgpr:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:v_mov_b32_e32 v0, 0
+; CHECK-NEXT:global_load_dword v0, v0, s[0:1] glc dlc
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:v_mov_b32_e32 v0, s0
+; CHECK-NEXT:; return to shader part epilog
+  %load = load volatile float, ptr addrspace(1) %ptr
+  ret float %load
+}
+
+define amdgpu_ps void @readanylane_to_bitcast_to_virtual_vgpr(ptr addrspace(1) 
inreg %ptr0, ptr addrspace(1) inreg %ptr1) {
+; CHECK-LABEL: readanylane_to_bitcast_to_virtual_vgpr:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:v_mov_b32_e32 v0, 0
+; CHECK-NEXT:global_load_dword v1, v0, s[0:1] glc dlc
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_readfirstlane_b32 s0, v1
+; CHECK-NEXT:v_mov_b32_e32 v1, s0
+; CHECK-NEXT:global_store_dword v0, v1, s[2:3]
+; CHECK-NEXT:s_endpgm
+  %load = load volatile <2 x i16>, ptr addrspace(1) %ptr0
+  %bitcast = bitcast <2 x i16> %load to i32
+  store i32 %bitcast, ptr addrspace(1) %ptr1
+  ret void
+}
+
+define amdgpu_ps float @readanylane_to_bitcast_to_physical_vgpr(ptr 
addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) {
+; CHECK-LABEL: readanylane_to_bitcast_to_physical_vgpr:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:v_mov_b32_e32 v0, 0
+; CHECK-NEXT:global_load_dword v0, v0, s[0:1] glc dlc
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:v_mov_b32_e32 v0, s0
+; CHECK-NEXT:; return to shader part epilog
+  %load = load volatile <2 x i16>, ptr addrspace(1) %ptr0
+  %bitcast = bitcast <2 x i16> %load to float
+  ret float %bitcast
+}
+
+define amdgpu_ps void @unmerge_readanylane_merge_to_virtual_vgpr(ptr 
addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) {
+; CHECK-LABEL: unmerge_readanylane_merge_to_virtual_vgpr:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:v_mov_b32_e32 v2, 0
+; CHECK-NEXT:global_load_dwordx2 v[0:1], v2, s[0:1] glc dlc
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:v_readfirstlane_b32 s1, v1
+; CHECK-NEXT:v_mov_b32_e32 v0, s0
+; CHECK-NEXT:v_mov_b32_e32 v1, s1
+; CHECK-NEXT:global_store_dwordx2 v2, v[0:1], s[2:3]
+; CHECK-NEXT:s_endpgm
+  %load = load volatile i64, ptr addrspace(1) %ptr0
+  store i64 %load, ptr addrspace(1) %ptr1
+  ret void
+}
+
+;define amdgpu_ps double @unmerge_readanylane_merge_to_physical_vgpr(ptr 
addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) {
+;  %load = load volatile double, ptr addrspace(1) %ptr0
+;  ret double %load
+;}
+
+define amdgpu_ps void @unmerge_readanylane_merge_bitcast_to_virtual_vgpr(ptr 
addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) {
+; CHECK-LABEL: unmerge_readanylane_merge_bitcast_to_virtual_vgpr:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:v_mov_b32_e32 v2, 0
+; CHECK-NEXT:global_load_dwordx2 v[0:1], v2, s[0:1] glc dlc
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:v_readfirstlane_b32 s1, v1
+; CHECK-NEXT:v_mov_b32_e32 v0, s0
+; CHECK-NEXT:v_mov_b32_e32 v1, s1
+; CHECK-NEXT:global_store_dwordx2 v2, v[0:1], s[2:3]
+; CHECK-NEXT:s_endpgm
+  %load = load volatile <2 x i32>, ptr addrspace(1) %ptr0
+  %bitcast = bitcast <2 x i32> %load to double
+  store double %bitcast, ptr addrspace(1) %ptr1
+  ret void
+}
+
+;define amdgpu_ps double 
@unmerge_readanylane_merge_bitcast_to_physical_vgpr(ptr addrspace(1) inreg 
%ptr0, ptr addrspace(1) inreg %ptr1) {
+;  %load = load volatile <2 x i32>, ptr addrspace(1) %ptr0
+;  %bitcast = bitcast <2 x i32> %load to double
+;  ret double %bitcast
+;}

petar-avramovic wrote:

There is no combine happening in commented out tests. It is path where there is 
copy to physical vgpr, but only one that exists is calling convention with 
float return, and there is none for larger physical vgprs (they return to sgpr) 
so I edited mir test with `SI_RETURN_TO_EPILOG implicit $vgpr0_vgpr1` to test 
that combine works.

https://github.com/llvm/llvm-project/pull/145910
___

[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/145484

>From b031681978e2b356c2ae8e65d6e08515c0044ac1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Tue, 24 Jun 2025 11:35:58 +0200
Subject: [PATCH 1/2] [AMDGPU] Use reverse iteration in CodeGenPrepare

In order to make this easier, I also removed all "removeFromParent" calls from 
the visitors, instead adding instructions
to a set of instructions to delete once the function has been visited.
This avoids crashes due to functions deleting their operands. In theory we 
could allow functions to delete the
instruction they visited (and only that one) but I think having one idiom for 
everything is less error-prone.

Fixes #140219
---
 .../Target/AMDGPU/AMDGPUCodeGenPrepare.cpp|  82 ---
 ...egenprepare-break-large-phis-heuristics.ll |  42 +++---
 .../AMDGPU/amdgpu-codegenprepare-fdiv.ll  | 110 +-
 llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll| 138 +-
 llvm/test/CodeGen/AMDGPU/uniform-select.ll|  64 
 5 files changed, 185 insertions(+), 251 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index 5f1983791cfae..2a3aa1ac672b6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -15,6 +15,7 @@
 #include "AMDGPU.h"
 #include "AMDGPUTargetMachine.h"
 #include "SIModeRegisterDefaults.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
@@ -109,6 +110,7 @@ class AMDGPUCodeGenPrepareImpl
   bool FlowChanged = false;
   mutable Function *SqrtF32 = nullptr;
   mutable Function *LdexpF32 = nullptr;
+  mutable SetVector DeadVals;
 
   DenseMap BreakPhiNodesCache;
 
@@ -285,28 +287,19 @@ bool AMDGPUCodeGenPrepareImpl::run() {
   BreakPhiNodesCache.clear();
   bool MadeChange = false;
 
-  Function::iterator NextBB;
-  for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; FI = NextBB) 
{
-BasicBlock *BB = &*FI;
-NextBB = std::next(FI);
-
-BasicBlock::iterator Next;
-for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;
- I = Next) {
-  Next = std::next(I);
-
-  MadeChange |= visit(*I);
-
-  if (Next != E) { // Control flow changed
-BasicBlock *NextInstBB = Next->getParent();
-if (NextInstBB != BB) {
-  BB = NextInstBB;
-  E = BB->end();
-  FE = F.end();
-}
-  }
+  for (BasicBlock &BB : reverse(F)) {
+for (Instruction &I : make_early_inc_range(reverse(BB))) {
+  if (!DeadVals.contains(&I))
+MadeChange |= visit(I);
 }
   }
+
+  while (!DeadVals.empty()) {
+RecursivelyDeleteTriviallyDeadInstructions(
+DeadVals.pop_back_val(), TLI, /*MSSAU*/ nullptr,
+[&](Value *V) { DeadVals.remove(V); });
+  }
+
   return MadeChange;
 }
 
@@ -426,7 +419,7 @@ bool 
AMDGPUCodeGenPrepareImpl::replaceMulWithMul24(BinaryOperator &I) const {
   Value *NewVal = insertValues(Builder, Ty, ResultVals);
   NewVal->takeName(&I);
   I.replaceAllUsesWith(NewVal);
-  I.eraseFromParent();
+  DeadVals.insert(&I);
 
   return true;
 }
@@ -500,10 +493,10 @@ bool 
AMDGPUCodeGenPrepareImpl::foldBinOpIntoSelect(BinaryOperator &BO) const {
   FoldedT, FoldedF);
   NewSelect->takeName(&BO);
   BO.replaceAllUsesWith(NewSelect);
-  BO.eraseFromParent();
+  DeadVals.insert(&BO);
   if (CastOp)
-CastOp->eraseFromParent();
-  Sel->eraseFromParent();
+DeadVals.insert(CastOp);
+  DeadVals.insert(Sel);
   return true;
 }
 
@@ -900,7 +893,7 @@ bool AMDGPUCodeGenPrepareImpl::visitFDiv(BinaryOperator 
&FDiv) {
   if (NewVal) {
 FDiv.replaceAllUsesWith(NewVal);
 NewVal->takeName(&FDiv);
-RecursivelyDeleteTriviallyDeadInstructions(&FDiv, TLI);
+DeadVals.insert(&FDiv);
   }
 
   return true;
@@ -1310,7 +1303,8 @@ within the byte are all 0.
 static bool tryNarrowMathIfNoOverflow(Instruction *I,
   const SITargetLowering *TLI,
   const TargetTransformInfo &TTI,
-  const DataLayout &DL) {
+  const DataLayout &DL,
+  SetVector &DeadVals) {
   unsigned Opc = I->getOpcode();
   Type *OldType = I->getType();
 
@@ -1365,7 +1359,7 @@ static bool tryNarrowMathIfNoOverflow(Instruction *I,
 
   Value *Zext = Builder.CreateZExt(Arith, OldType);
   I->replaceAllUsesWith(Zext);
-  I->eraseFromParent();
+  DeadVals.insert(I);
   return true;
 }
 
@@ -1376,7 +1370,7 @@ bool 
AMDGPUCodeGenPrepareImpl::visitBinaryOperator(BinaryOperator &I) {
   if (UseMul24Intrin && replaceMulWithMul24(I))
 return true;
   if (tryNarrowMathIfNoOverflow(&I, ST.getTargetLowering(),
-TM.getTargetTransformIn

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


Pierre-vh wrote:

> Why DAG and not InstCombine for this?

The intrinsics we want to optimize with this aren't lowered yet at IC

https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146054

>From 17ac90ad1ee167f35321e01625a207f2b94ff523 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 26 Jun 2025 13:31:37 +0200
Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences

Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or not.

It seems like a strange sequence at first but it's an idiom used by device
libs in device libs to check workitem IDs for AMDGPU.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727
---
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++-
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++--
 2 files changed, 91 insertions(+), 29 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 08dab7c697b99..a189208d3a62e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();
+
+  if (Root.getOpcode() != ISD::AND)
+return SDValue();
+
+  SDValue N0 = Root.getOperand(0);
+  SDValue N1 = Root.getOperand(1);
+
+  if (N0.getOpcode() != ISD::OR || !isa(N1))
+return SDValue();
+
+  APInt RootMask = cast(N1)->getAsAPIntVal();
+  if (!RootMask.isMask())
+return SDValue();
+
+  SDValue Src;
+  const auto IsSrc = [&](SDValue V) {
+if (!Src) {
+  Src = V;
+  return true;
+}
+
+return Src == V;
+  };
+
+  SmallVector Worklist = {N0};
+  APInt PartsMask(VT.getSizeInBits(), 0);
+  while (!Worklist.empty()) {
+SDValue V = Worklist.pop_back_val();
+if (!V.hasOneUse() && Src != V)
+  return SDValue();
+
+if (V.getOpcode() == ISD::OR) {
+  Worklist.push_back(V.getOperand(0));
+  Worklist.push_back(V.getOperand(1));
+  continue;
+}
+
+if (V.getOpcode() == ISD::SRL) {
+  SDValue ShiftSrc = V.getOperand(0);
+  SDValue ShiftAmt = V.getOperand(1);
+
+  if (!IsSrc(ShiftSrc) || !isa(ShiftAmt))
+return SDValue();
+
+  PartsMask |= (RootMask << 
cast(ShiftAmt)->getAsZExtVal());
+  continue;
+}
+
+if (IsSrc(V)) {
+  PartsMask |= RootMask;
+  continue;
+}
+
+return SDValue();
+  }
+
+  if (!RootMask.isMask() || !Src)
+return SDValue();
+
+  SDLoc DL(Root);
+  return DAG.getNode(ISD::AND, DL, VT,
+ {Src, DAG.getConstant(PartsMask, DL, VT)});
+}
+
 /// This is a stub for TargetLowering::SimplifySetCC.
 SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
ISD::CondCode Cond, const SDLoc &DL,
bool foldBooleans) {
   TargetLowering::DAGCombinerInfo
 DagCombineInfo(DAG, Level, false, this);
-  return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);
+  if (SDValue C =
+  TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, 
DL))
+return C;
+
+  if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) &&
+  N0.getOpcode() == ISD::AND && isNullConstant(N1)) {
+
+if (SDValue Res = matchMergedBFX(N0, DAG, TLI))
+  return DAG.getSetCC(DL, VT, Res, N1, Cond);
+  }
+
+  return SDValue();
 }
 
 /// Given an ISD::SDIV node expressing a divide by constant, return
diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll 
b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
index 07c4aeb1ac7df..64d055bc40e98 100644
--- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
+++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
@@ -12,11 +12,7 @@ define i1 @workitem_zero() {
 ; DAGISEL-GFX8-LABEL: workitem_zero:
 ; DAGISEL-GFX8:   ; %bb.0: ; %entry
 ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0
-; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31
 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
 ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31]
@@ -

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #145911)

2025-06-30 Thread Petar Avramovic via llvm-branch-commits



@@ -115,126 +117,233 @@ class AMDGPURegBankLegalizeCombiner {
 VgprRB(&RBI.getRegBank(AMDGPU::VGPRRegBankID)),
 VccRB(&RBI.getRegBank(AMDGPU::VCCRegBankID)) {};
 
-  bool isLaneMask(Register Reg) {
-const RegisterBank *RB = MRI.getRegBankOrNull(Reg);
-if (RB && RB->getID() == AMDGPU::VCCRegBankID)
-  return true;
+  bool isLaneMask(Register Reg);
+  std::pair tryMatch(Register Src, unsigned Opcode);
+  std::pair tryMatchRALFromUnmerge(Register Src);
+  Register getReadAnyLaneSrc(Register Src);
+  void replaceRegWithOrBuildCopy(Register Dst, Register Src);
 
-const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg);
-return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1);
-  }
+  bool tryEliminateReadAnyLane(MachineInstr &Copy);
+  void tryCombineCopy(MachineInstr &MI);
+  void tryCombineS1AnyExt(MachineInstr &MI);
+};
 
-  void cleanUpAfterCombine(MachineInstr &MI, MachineInstr *Optional0) {
-MI.eraseFromParent();
-if (Optional0 && isTriviallyDead(*Optional0, MRI))
-  Optional0->eraseFromParent();
-  }
+bool AMDGPURegBankLegalizeCombiner::isLaneMask(Register Reg) {
+  const RegisterBank *RB = MRI.getRegBankOrNull(Reg);
+  if (RB && RB->getID() == AMDGPU::VCCRegBankID)
+return true;
 
-  std::pair tryMatch(Register Src, unsigned Opcode) {
-MachineInstr *MatchMI = MRI.getVRegDef(Src);
-if (MatchMI->getOpcode() != Opcode)
-  return {nullptr, Register()};
-return {MatchMI, MatchMI->getOperand(1).getReg()};
-  }
+  const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg);
+  return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1);
+}
 
-  void tryCombineCopy(MachineInstr &MI) {
-Register Dst = MI.getOperand(0).getReg();
-Register Src = MI.getOperand(1).getReg();
-// Skip copies of physical registers.
-if (!Dst.isVirtual() || !Src.isVirtual())
-  return;
-
-// This is a cross bank copy, sgpr S1 to lane mask.
-//
-// %Src:sgpr(s1) = G_TRUNC %TruncS32Src:sgpr(s32)
-// %Dst:lane-mask(s1) = COPY %Src:sgpr(s1)
-// ->
-// %Dst:lane-mask(s1) = G_AMDGPU_COPY_VCC_SCC %TruncS32Src:sgpr(s32)
-if (isLaneMask(Dst) && MRI.getRegBankOrNull(Src) == SgprRB) {
-  auto [Trunc, TruncS32Src] = tryMatch(Src, AMDGPU::G_TRUNC);
-  assert(Trunc && MRI.getType(TruncS32Src) == S32 &&
- "sgpr S1 must be result of G_TRUNC of sgpr S32");
-
-  B.setInstr(MI);
-  // Ensure that truncated bits in BoolSrc are 0.
-  auto One = B.buildConstant({SgprRB, S32}, 1);
-  auto BoolSrc = B.buildAnd({SgprRB, S32}, TruncS32Src, One);
-  B.buildInstr(AMDGPU::G_AMDGPU_COPY_VCC_SCC, {Dst}, {BoolSrc});
-  cleanUpAfterCombine(MI, Trunc);
-  return;
-}
+std::pair
+AMDGPURegBankLegalizeCombiner::tryMatch(Register Src, unsigned Opcode) {
+  MachineInstr *MatchMI = MRI.getVRegDef(Src);
+  if (MatchMI->getOpcode() != Opcode)
+return {nullptr, Register()};
+  return {MatchMI, MatchMI->getOperand(1).getReg()};
+}
+
+std::pair
+AMDGPURegBankLegalizeCombiner::tryMatchRALFromUnmerge(Register Src) {
+  MachineInstr *ReadAnyLane = MRI.getVRegDef(Src);
+  if (ReadAnyLane->getOpcode() != AMDGPU::G_AMDGPU_READANYLANE)
+return {nullptr, -1};
+
+  Register RALSrc = ReadAnyLane->getOperand(1).getReg();
+  if (auto *UnMerge = getOpcodeDef(RALSrc, MRI))
+return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)};
 
-// Src = G_AMDGPU_READANYLANE RALSrc
-// Dst = COPY Src
-// ->
-// Dst = RALSrc
-if (MRI.getRegBankOrNull(Dst) == VgprRB &&
-MRI.getRegBankOrNull(Src) == SgprRB) {
-  auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
-  if (!RAL)
-return;
-
-  assert(MRI.getRegBank(RALSrc) == VgprRB);
-  MRI.replaceRegWith(Dst, RALSrc);
-  cleanUpAfterCombine(MI, RAL);
-  return;
+  return {nullptr, -1};
+}
+
+Register AMDGPURegBankLegalizeCombiner::getReadAnyLaneSrc(Register Src) {
+  // Src = G_AMDGPU_READANYLANE RALSrc
+  auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE);
+  if (RAL)
+return RALSrc;
+
+  // LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc
+  // LoSgpr = G_AMDGPU_READANYLANE LoVgpr
+  // HiSgpr = G_AMDGPU_READANYLANE HiVgpr
+  // Src G_MERGE_VALUES LoSgpr, HiSgpr
+  auto *Merge = getOpcodeDef(Src, MRI);
+  if (Merge) {
+unsigned NumElts = Merge->getNumSources();
+auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0));
+if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0)
+  return {};
+
+// Check if all elements are from same unmerge and there is no shuffling.
+for (unsigned i = 1; i < NumElts; ++i) {
+  auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i));
+  if (UnmergeI != Unmerge || (unsigned)IdxI != i)
+return {};
 }
+return Unmerge->getSourceReg();
   }
 
-  void tryCombineS1AnyExt(MachineInstr &MI) {
-// %Src:sgpr(S1) = G_TRUNC %TruncSrc
-

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-06-30 Thread Fabian Ritter via llvm-branch-commits



@@ -15136,6 +15136,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode 
*N,
   return Folded;
   }
 
+  // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if
+  // that transformation can't block an offset folding at any use of the 
ptradd.
+  // This should be done late, after legalization, so that it doesn't block
+  // other ptradd combines that could enable more offset folding.
+  bool HasIntermediateAssertAlign =
+  N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd();
+  // This is a hack to work around an ordering problem for DAGs like this:
+  //   (ptradd (AssertAlign (ptradd p, c1), k), c2)
+  // If the outer ptradd is handled first by the DAGCombiner, it can be
+  // transformed into a disjoint or. Then, when the generic AssertAlign combine
+  // pushes the AssertAlign through the inner ptradd, it's too late for the
+  // ptradd reassociation to trigger.
+  if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign &&
+  DAG.haveNoCommonBitsSet(N0, N1)) {
+bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) {
+  if (auto *LoadStore = dyn_cast(User);
+  LoadStore && LoadStore->getBasePtr().getNode() == N) {
+unsigned AS = LoadStore->getAddressSpace();
+// Currently, we only really need ptradds to fold offsets into flat
+// memory instructions.
+if (AS != AMDGPUAS::FLAT_ADDRESS)
+  return false;
+TargetLoweringBase::AddrMode AM;
+AM.HasBaseReg = true;
+EVT VT = LoadStore->getMemoryVT();
+Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS);
+  }
+  return false;

ritter-x2a wrote:

I'm not sure if every backend that could want to use ptradd nodes would want to 
transform them to ORs. However, there is probably not much point in developing 
for hypothetical backends, so I moved it to the generic combines for now, 
behind the `canTransformPtrArithOutOfBounds()` check (I also fixed it to 
actually check the intended addressing mode).

Dropping the target-specific `AS != AMDGPUAS::FLAT_ADDRESS` check affects the 
generated code in two lit tests 
([identical-subrange-spill-infloop.ll](https://github.com/llvm/llvm-project/pull/146076/files#diff-517b7174ca71caeed2dd13ec440ee963e4db61f01911ff1cbc86ab0e60f16721)
 and 
[store-weird-sizes.ll](https://github.com/llvm/llvm-project/pull/146076/files#diff-32010dfaf8188291719044adb5c6e927b17fe3e3657e0f27ebe2e2a10a020889)).
But, looking more into it, I find that
- the new code for `identical-subrange-spill-infloop.ll` could be argued to be 
an improvement (offsets for SMRD instructions are folded where they weren't 
before) and
- the problem for `store-weird-sizes.ll` seems to be that 
`SITargetLowering::isLegalAddressingMode` is overly optimistic when asked if 
"register + constant offset" is a valid addressing mode for AS4 on 
architectures predating `global_*` instructions. So this should be fixed there. 

We already have a [generic 
combine](https://github.com/llvm/llvm-project/blob/629126ed44bd3ce5b6f476459c805be4e4e0c2ca/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L15173-L15196)
 that pushes the AssertAlign through the (ptr)add.
The handling here was necessary because that combine would only be applied 
after the outer PTRADD was already visited and combined into an OR. However, 
that doesn't seem to happen anymore in the tests when it's a generic combine, 
so I dropped this handling as well.

https://github.com/llvm/llvm-project/pull/146075
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-06-30 Thread Fabian Ritter via llvm-branch-commits



@@ -416,6 +416,60 @@ entry:
   ret void
 }
 
+; Check that ptradds can be lowered to disjoint ORs.
+define ptr @gep_disjoint_or(ptr %base) {
+; GFX942-LABEL: gep_disjoint_or:
+; GFX942:   ; %bb.0:
+; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4
+; GFX942-NEXT:s_setpc_b64 s[30:31]
+  %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0)
+  %gep = getelementptr nuw inbounds i8, ptr %p, i64 4
+  ret ptr %gep
+}
+
+; Check that AssertAlign nodes between ptradd nodes don't block offset folding,
+; taken from preload-implicit-kernargs.ll
+define amdgpu_kernel void @random_incorrect_offset(ptr addrspace(1) inreg 
%out) #0 {
+; GFX942_PTRADD-LABEL: random_incorrect_offset:
+; GFX942_PTRADD:   ; %bb.1:
+; GFX942_PTRADD-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0)
+; GFX942_PTRADD-NEXT:s_branch .LBB21_0
+; GFX942_PTRADD-NEXT:.p2align 8
+; GFX942_PTRADD-NEXT:  ; %bb.2:
+; GFX942_PTRADD-NEXT:  .LBB21_0:
+; GFX942_PTRADD-NEXT:s_load_dword s0, s[0:1], 0xa
+; GFX942_PTRADD-NEXT:v_mov_b32_e32 v0, 0
+; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0)
+; GFX942_PTRADD-NEXT:v_mov_b32_e32 v1, s0
+; GFX942_PTRADD-NEXT:global_store_dword v0, v1, s[2:3]
+; GFX942_PTRADD-NEXT:s_endpgm
+;
+; GFX942_LEGACY-LABEL: random_incorrect_offset:
+; GFX942_LEGACY:   ; %bb.1:
+; GFX942_LEGACY-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX942_LEGACY-NEXT:s_waitcnt lgkmcnt(0)
+; GFX942_LEGACY-NEXT:s_branch .LBB21_0
+; GFX942_LEGACY-NEXT:.p2align 8
+; GFX942_LEGACY-NEXT:  ; %bb.2:
+; GFX942_LEGACY-NEXT:  .LBB21_0:
+; GFX942_LEGACY-NEXT:s_mov_b32 s4, 8
+; GFX942_LEGACY-NEXT:s_load_dword s0, s[0:1], s4 offset:0x2
+; GFX942_LEGACY-NEXT:v_mov_b32_e32 v0, 0
+; GFX942_LEGACY-NEXT:s_waitcnt lgkmcnt(0)
+; GFX942_LEGACY-NEXT:v_mov_b32_e32 v1, s0
+; GFX942_LEGACY-NEXT:global_store_dword v0, v1, s[2:3]
+; GFX942_LEGACY-NEXT:s_endpgm
+  %imp_arg_ptr = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+  %gep = getelementptr i8, ptr addrspace(4) %imp_arg_ptr, i32 2
+  %load = load i32, ptr addrspace(4) %gep
+  store i32 %load, ptr addrspace(1) %out
+  ret void
+}
+
 declare void @llvm.memcpy.p0.p4.i64(ptr noalias nocapture writeonly, ptr 
addrspace(4) noalias nocapture readonly, i64, i1 immarg)
 
 !0 = !{}
+
+attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" 
"amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" 
"amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" 
"amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" 
"amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" 
"amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" 
"uniform-work-group-size"="false" }

ritter-x2a wrote:

Removed.

https://github.com/llvm/llvm-project/pull/146075
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Robert Imschweiler (ro-i)


Changes

OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the 
num_threads clause on parallel directives, along with the message and severity 
clauses. This commit implements necessary codegen changes.

---

Patch is 223.05 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146405.diff


10 Files Affected:

- (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+38-23) 
- (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+44-10) 
- (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+35-26) 
- (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+21-5) 
- (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+9-1) 
- (modified) clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp 
(+703-35) 
- (modified) clang/test/OpenMP/parallel_num_threads_codegen.cpp (+33) 
- (modified) clang/test/OpenMP/target_parallel_num_threads_messages.cpp (+72-3) 
- (added) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp 
(+1642) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+12) 


``diff
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 8ccc37ef98a74..13a2d77bc156a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1845,11 +1845,11 @@ void CGOpenMPRuntime::emitIfClause(CodeGenFunction 
&CGF, const Expr *Cond,
   CGF.EmitBlock(ContBlock, /*IsFinished=*/true);
 }
 
-void CGOpenMPRuntime::emitParallelCall(CodeGenFunction &CGF, SourceLocation 
Loc,
-   llvm::Function *OutlinedFn,
-   ArrayRef CapturedVars,
-   const Expr *IfCond,
-   llvm::Value *NumThreads) {
+void CGOpenMPRuntime::emitParallelCall(
+CodeGenFunction &CGF, SourceLocation Loc, llvm::Function *OutlinedFn,
+ArrayRef CapturedVars, const Expr *IfCond,
+llvm::Value *NumThreads, OpenMPNumThreadsClauseModifier NumThreadsModifier,
+OpenMPSeverityClauseKind Severity, const Expr *Message) {
   if (!CGF.HaveInsertPoint())
 return;
   llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
@@ -2699,18 +2699,33 @@ llvm::Value 
*CGOpenMPRuntime::emitForNext(CodeGenFunction &CGF,
   CGF.getContext().BoolTy, Loc);
 }
 
-void CGOpenMPRuntime::emitNumThreadsClause(CodeGenFunction &CGF,
-   llvm::Value *NumThreads,
-   SourceLocation Loc) {
+void CGOpenMPRuntime::emitNumThreadsClause(
+CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
+OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
+const Expr *Message) {
   if (!CGF.HaveInsertPoint())
 return;
+  llvm::SmallVector Args(
+  {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc),
+   CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)});
   // Build call __kmpc_push_num_threads(&loc, global_tid, num_threads)
-  llvm::Value *Args[] = {
-  emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc),
-  CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)};
-  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
-  CGM.getModule(), OMPRTL___kmpc_push_num_threads),
-  Args);
+  // or __kmpc_push_num_threads_strict(&loc, global_tid, num_threads, severity,
+  // messsage) if strict modifier is used.
+  RuntimeFunction FnID = OMPRTL___kmpc_push_num_threads;
+  if (Modifier == OMPC_NUMTHREADS_strict) {
+FnID = OMPRTL___kmpc_push_num_threads_strict;
+// OpenMP 6.0, 10.4: "If no severity clause is specified then the effect is
+// as if sev-level is fatal."
+Args.push_back(llvm::ConstantInt::get(
+CGM.Int32Ty, Severity == OMPC_SEVERITY_warning ? 1 : 2));
+if (Message)
+  Args.push_back(CGF.EmitStringLiteralLValue(cast(Message))
+ .getPointer(CGF));
+else
+  Args.push_back(llvm::ConstantPointerNull::get(CGF.VoidPtrTy));
+  }
+  CGF.EmitRuntimeCall(
+  OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args);
 }
 
 void CGOpenMPRuntime::emitProcBindClause(CodeGenFunction &CGF,
@@ -11986,12 +12001,11 @@ llvm::Function 
*CGOpenMPSIMDRuntime::emitTaskOutlinedFunction(
   llvm_unreachable("Not supported in SIMD-only mode");
 }
 
-void CGOpenMPSIMDRuntime::emitParallelCall(CodeGenFunction &CGF,
-   SourceLocation Loc,
-   llvm::Function *OutlinedFn,
-   ArrayRef 
CapturedVars,
-   const Expr *IfCond,
-   llvm::Value *NumThreads) {
+void CGOpenMPSIMDRuntime::emitParallelCall(
+CodeGenFunction &CGF, SourceLocation L

[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: Robert Imschweiler (ro-i)


Changes

OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the 
num_threads clause on parallel directives, along with the message and severity 
clauses. This commit implements necessary codegen changes.

---

Patch is 223.05 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146405.diff


10 Files Affected:

- (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+38-23) 
- (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+44-10) 
- (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+35-26) 
- (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+21-5) 
- (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+9-1) 
- (modified) clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp 
(+703-35) 
- (modified) clang/test/OpenMP/parallel_num_threads_codegen.cpp (+33) 
- (modified) clang/test/OpenMP/target_parallel_num_threads_messages.cpp (+72-3) 
- (added) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp 
(+1642) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+12) 


``diff
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 8ccc37ef98a74..13a2d77bc156a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1845,11 +1845,11 @@ void CGOpenMPRuntime::emitIfClause(CodeGenFunction 
&CGF, const Expr *Cond,
   CGF.EmitBlock(ContBlock, /*IsFinished=*/true);
 }
 
-void CGOpenMPRuntime::emitParallelCall(CodeGenFunction &CGF, SourceLocation 
Loc,
-   llvm::Function *OutlinedFn,
-   ArrayRef CapturedVars,
-   const Expr *IfCond,
-   llvm::Value *NumThreads) {
+void CGOpenMPRuntime::emitParallelCall(
+CodeGenFunction &CGF, SourceLocation Loc, llvm::Function *OutlinedFn,
+ArrayRef CapturedVars, const Expr *IfCond,
+llvm::Value *NumThreads, OpenMPNumThreadsClauseModifier NumThreadsModifier,
+OpenMPSeverityClauseKind Severity, const Expr *Message) {
   if (!CGF.HaveInsertPoint())
 return;
   llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
@@ -2699,18 +2699,33 @@ llvm::Value 
*CGOpenMPRuntime::emitForNext(CodeGenFunction &CGF,
   CGF.getContext().BoolTy, Loc);
 }
 
-void CGOpenMPRuntime::emitNumThreadsClause(CodeGenFunction &CGF,
-   llvm::Value *NumThreads,
-   SourceLocation Loc) {
+void CGOpenMPRuntime::emitNumThreadsClause(
+CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc,
+OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
+const Expr *Message) {
   if (!CGF.HaveInsertPoint())
 return;
+  llvm::SmallVector Args(
+  {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc),
+   CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)});
   // Build call __kmpc_push_num_threads(&loc, global_tid, num_threads)
-  llvm::Value *Args[] = {
-  emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc),
-  CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)};
-  CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
-  CGM.getModule(), OMPRTL___kmpc_push_num_threads),
-  Args);
+  // or __kmpc_push_num_threads_strict(&loc, global_tid, num_threads, severity,
+  // messsage) if strict modifier is used.
+  RuntimeFunction FnID = OMPRTL___kmpc_push_num_threads;
+  if (Modifier == OMPC_NUMTHREADS_strict) {
+FnID = OMPRTL___kmpc_push_num_threads_strict;
+// OpenMP 6.0, 10.4: "If no severity clause is specified then the effect is
+// as if sev-level is fatal."
+Args.push_back(llvm::ConstantInt::get(
+CGM.Int32Ty, Severity == OMPC_SEVERITY_warning ? 1 : 2));
+if (Message)
+  Args.push_back(CGF.EmitStringLiteralLValue(cast(Message))
+ .getPointer(CGF));
+else
+  Args.push_back(llvm::ConstantPointerNull::get(CGF.VoidPtrTy));
+  }
+  CGF.EmitRuntimeCall(
+  OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args);
 }
 
 void CGOpenMPRuntime::emitProcBindClause(CodeGenFunction &CGF,
@@ -11986,12 +12001,11 @@ llvm::Function 
*CGOpenMPSIMDRuntime::emitTaskOutlinedFunction(
   llvm_unreachable("Not supported in SIMD-only mode");
 }
 
-void CGOpenMPSIMDRuntime::emitParallelCall(CodeGenFunction &CGF,
-   SourceLocation Loc,
-   llvm::Function *OutlinedFn,
-   ArrayRef 
CapturedVars,
-   const Expr *IfCond,
-   llvm::Value *NumThreads) {
+void CGOpenMPSIMDRuntime::emitParallelCall(
+CodeGenFunction &CGF, SourceLo

[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-offload

Author: Robert Imschweiler (ro-i)


Changes

OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the 
num_threads clause on parallel directives, along with the message and severity 
clauses. This commit implements necessary device runtime changes.

---
Full diff: https://github.com/llvm/llvm-project/pull/146404.diff


3 Files Affected:

- (modified) offload/DeviceRTL/include/DeviceTypes.h (+6) 
- (modified) offload/DeviceRTL/src/Parallelism.cpp (+60-18) 
- (modified) openmp/runtime/src/kmp.h (+1) 


``diff
diff --git a/offload/DeviceRTL/include/DeviceTypes.h 
b/offload/DeviceRTL/include/DeviceTypes.h
index 2e5d92380f040..43a5578f1 100644
--- a/offload/DeviceRTL/include/DeviceTypes.h
+++ b/offload/DeviceRTL/include/DeviceTypes.h
@@ -136,6 +136,12 @@ struct omp_lock_t {
   void *Lock;
 };
 
+// see definition in openmp/runtime kmp.h
+typedef enum omp_severity_t {
+  severity_warning = 1,
+  severity_fatal = 2
+} omp_severity_t;
+
 using InterWarpCopyFnTy = void (*)(void *src, int32_t warp_num);
 using ShuffleReductFnTy = void (*)(void *rhsData, int16_t lane_id,
int16_t lane_offset, int16_t shortCircuit);
diff --git a/offload/DeviceRTL/src/Parallelism.cpp 
b/offload/DeviceRTL/src/Parallelism.cpp
index 08ce616aee1c4..78438a60454b8 100644
--- a/offload/DeviceRTL/src/Parallelism.cpp
+++ b/offload/DeviceRTL/src/Parallelism.cpp
@@ -45,7 +45,24 @@ using namespace ompx;
 
 namespace {
 
-uint32_t determineNumberOfThreads(int32_t NumThreadsClause) {
+void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity,
+  const char *nt_message, int32_t requested,
+  int32_t actual) {
+  if (nt_message)
+printf("%s\n", nt_message);
+  else
+printf("The computed number of threads (%u) does not match the requested "
+   "number of threads (%d). Consider that it might not be supported "
+   "to select exactly %d threads on this target device.\n",
+   actual, requested, requested);
+  if (nt_severity == severity_fatal)
+__builtin_trap();
+}
+
+uint32_t determineNumberOfThreads(int32_t NumThreadsClause,
+  int32_t nt_strict = false,
+  int32_t nt_severity = severity_fatal,
+  const char *nt_message = nullptr) {
   uint32_t NThreadsICV =
   NumThreadsClause != -1 ? NumThreadsClause : icv::NThreads;
   uint32_t NumThreads = mapping::getMaxTeamThreads();
@@ -55,13 +72,17 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) 
{
 
   // SPMD mode allows any number of threads, for generic mode we round down to 
a
   // multiple of WARPSIZE since it is legal to do so in OpenMP.
-  if (mapping::isSPMDMode())
-return NumThreads;
+  if (!mapping::isSPMDMode()) {
+if (NumThreads < mapping::getWarpSize())
+  NumThreads = 1;
+else
+  NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1));
+  }
 
-  if (NumThreads < mapping::getWarpSize())
-NumThreads = 1;
-  else
-NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1));
+  if (NumThreadsClause != -1 && nt_strict &&
+  NumThreads != static_cast(NumThreadsClause))
+num_threads_strict_error(nt_strict, nt_severity, nt_message,
+ NumThreadsClause, NumThreads);
 
   return NumThreads;
 }
@@ -82,12 +103,14 @@ uint32_t determineNumberOfThreads(int32_t 
NumThreadsClause) {
 
 extern "C" {
 
-[[clang::always_inline]] void __kmpc_parallel_spmd(IdentTy *ident,
-   int32_t num_threads,
-   void *fn, void **args,
-   const int64_t nargs) {
+[[clang::always_inline]] void
+__kmpc_parallel_spmd(IdentTy *ident, int32_t num_threads, void *fn, void 
**args,
+ const int64_t nargs, int32_t nt_strict = false,
+ int32_t nt_severity = severity_fatal,
+ const char *nt_message = nullptr) {
   uint32_t TId = mapping::getThreadIdInBlock();
-  uint32_t NumThreads = determineNumberOfThreads(num_threads);
+  uint32_t NumThreads =
+  determineNumberOfThreads(num_threads, nt_strict, nt_severity, 
nt_message);
   uint32_t PTeamSize =
   NumThreads == mapping::getMaxTeamThreads() ? 0 : NumThreads;
   // Avoid the race between the read of the `icv::Level` above and the write
@@ -140,10 +163,11 @@ extern "C" {
   return;
 }
 
-[[clang::always_inline]] void
-__kmpc_parallel_51(IdentTy *ident, int32_t, int32_t if_expr,
-   int32_t num_threads, int proc_bind, void *fn,
-   void *wrapper_fn, void **args, int64_t nargs) {
+[[clang::always_inline]] void __kmpc_parallel_51(
+IdentTy *ident, int32_t, int32_t if_expr, int32_t num_threads,
+int proc_bind, void *fn, void *wrapper_fn, void

[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)

2025-06-30 Thread Robert Imschweiler via llvm-branch-commits


https://github.com/ro-i created https://github.com/llvm/llvm-project/pull/146404

OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the 
num_threads clause on parallel directives, along with the message and severity 
clauses. This commit implements necessary device runtime changes.

>From cf566c60db9eef81c39a45082645c9d44992bec5 Mon Sep 17 00:00:00 2001
From: Robert Imschweiler 
Date: Fri, 27 Jun 2025 07:54:07 -0500
Subject: [PATCH] [OpenMP][clang] 6.0: num_threads strict (part 2: device
 runtime)

OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary device runtime
changes.
---
 offload/DeviceRTL/include/DeviceTypes.h |  6 ++
 offload/DeviceRTL/src/Parallelism.cpp   | 78 +++--
 openmp/runtime/src/kmp.h|  1 +
 3 files changed, 67 insertions(+), 18 deletions(-)

diff --git a/offload/DeviceRTL/include/DeviceTypes.h 
b/offload/DeviceRTL/include/DeviceTypes.h
index 2e5d92380f040..43a5578f1 100644
--- a/offload/DeviceRTL/include/DeviceTypes.h
+++ b/offload/DeviceRTL/include/DeviceTypes.h
@@ -136,6 +136,12 @@ struct omp_lock_t {
   void *Lock;
 };
 
+// see definition in openmp/runtime kmp.h
+typedef enum omp_severity_t {
+  severity_warning = 1,
+  severity_fatal = 2
+} omp_severity_t;
+
 using InterWarpCopyFnTy = void (*)(void *src, int32_t warp_num);
 using ShuffleReductFnTy = void (*)(void *rhsData, int16_t lane_id,
int16_t lane_offset, int16_t shortCircuit);
diff --git a/offload/DeviceRTL/src/Parallelism.cpp 
b/offload/DeviceRTL/src/Parallelism.cpp
index 08ce616aee1c4..78438a60454b8 100644
--- a/offload/DeviceRTL/src/Parallelism.cpp
+++ b/offload/DeviceRTL/src/Parallelism.cpp
@@ -45,7 +45,24 @@ using namespace ompx;
 
 namespace {
 
-uint32_t determineNumberOfThreads(int32_t NumThreadsClause) {
+void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity,
+  const char *nt_message, int32_t requested,
+  int32_t actual) {
+  if (nt_message)
+printf("%s\n", nt_message);
+  else
+printf("The computed number of threads (%u) does not match the requested "
+   "number of threads (%d). Consider that it might not be supported "
+   "to select exactly %d threads on this target device.\n",
+   actual, requested, requested);
+  if (nt_severity == severity_fatal)
+__builtin_trap();
+}
+
+uint32_t determineNumberOfThreads(int32_t NumThreadsClause,
+  int32_t nt_strict = false,
+  int32_t nt_severity = severity_fatal,
+  const char *nt_message = nullptr) {
   uint32_t NThreadsICV =
   NumThreadsClause != -1 ? NumThreadsClause : icv::NThreads;
   uint32_t NumThreads = mapping::getMaxTeamThreads();
@@ -55,13 +72,17 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) 
{
 
   // SPMD mode allows any number of threads, for generic mode we round down to 
a
   // multiple of WARPSIZE since it is legal to do so in OpenMP.
-  if (mapping::isSPMDMode())
-return NumThreads;
+  if (!mapping::isSPMDMode()) {
+if (NumThreads < mapping::getWarpSize())
+  NumThreads = 1;
+else
+  NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1));
+  }
 
-  if (NumThreads < mapping::getWarpSize())
-NumThreads = 1;
-  else
-NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1));
+  if (NumThreadsClause != -1 && nt_strict &&
+  NumThreads != static_cast(NumThreadsClause))
+num_threads_strict_error(nt_strict, nt_severity, nt_message,
+ NumThreadsClause, NumThreads);
 
   return NumThreads;
 }
@@ -82,12 +103,14 @@ uint32_t determineNumberOfThreads(int32_t 
NumThreadsClause) {
 
 extern "C" {
 
-[[clang::always_inline]] void __kmpc_parallel_spmd(IdentTy *ident,
-   int32_t num_threads,
-   void *fn, void **args,
-   const int64_t nargs) {
+[[clang::always_inline]] void
+__kmpc_parallel_spmd(IdentTy *ident, int32_t num_threads, void *fn, void 
**args,
+ const int64_t nargs, int32_t nt_strict = false,
+ int32_t nt_severity = severity_fatal,
+ const char *nt_message = nullptr) {
   uint32_t TId = mapping::getThreadIdInBlock();
-  uint32_t NumThreads = determineNumberOfThreads(num_threads);
+  uint32_t NumThreads =
+  determineNumberOfThreads(num_threads, nt_strict, nt_severity, 
nt_message);
   uint32_t PTeamSize =
   NumThreads == mapping::getMaxTeamThreads() ? 0 : NumThreads;
   // Avoid the race between the read of the `icv::Level` above and the write
@@ -140,10 +163,11 @@ extern "C" {
   return;
 }
 
-[[clang::always_in

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-06-30 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a edited 
https://github.com/llvm/llvm-project/pull/146075
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray updated 
https://github.com/llvm/llvm-project/pull/146329

>From 69c97078a3e7ee1592e5e5c4b2f4eba6455dd96e Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Wed, 25 Jun 2025 21:22:43 +0100
Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file
 (2/4) (NFC)

This is a series of patches (2/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests

Co-authored-by: Virginia Cangelosi 
---
 llvm/test/MC/AArch64/armv9.2a-mec.s   | 172 ++-
 llvm/test/MC/AArch64/armv9.4-lse128.s |  98 -
 llvm/test/MC/AArch64/armv9.4a-gcs.s   | 198 +-
 .../MC/AArch64/armv9.4a-lse128-diagnostics.s  |  17 ++
 llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 
 llvm/test/MC/AArch64/armv9.5a-cpa.s   |  89 +---
 .../MC/AArch64/armv9.6a-mpam-diagnostics.s|   5 +
 llvm/test/MC/AArch64/armv9.6a-mpam.s  |  80 +--
 .../MC/Disassembler/AArch64/armv9.4a-gcs.txt  |  90 
 .../Disassembler/AArch64/armv9.4a-lse128.txt  |  98 -
 .../MC/Disassembler/AArch64/armv9.5a-cpa.txt  |  42 
 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt |  50 -
 .../MC/Disassembler/AArch64/armv9a-mec.txt|  54 -
 13 files changed, 541 insertions(+), 590 deletions(-)
 delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt

diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s 
b/llvm/test/MC/AArch64/armv9.2a-mec.s
index 42e4bf732086e..c747886f7ec3b 100644
--- a/llvm/test/MC/AArch64/armv9.2a-mec.s
+++ b/llvm/test/MC/AArch64/armv9.2a-mec.s
@@ -1,55 +1,117 @@
-// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s 
| FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck 
--check-prefix=CHECK-NO-MEC %s
-
-  mrs x0, MECIDR_EL2
-// CHECK: mrs   x0, MECIDR_EL2   // encoding: [0xe0,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P0_EL2
-// CHECK: mrs   x0, MECID_P0_EL2  // encoding: [0x00,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A0_EL2
-// CHECK: mrs   x0, MECID_A0_EL2  // encoding: [0x20,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P1_EL2
-// CHECK: mrs   x0, MECID_P1_EL2  // encoding: [0x40,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A1_EL2
-// CHECK: mrs   x0, MECID_A1_EL2  // encoding: [0x60,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_P_EL2
-// CHECK: mrs   x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_A_EL2
-// CHECK: mrs   x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_RL_A_EL3
-// CHECK: mrs   x0, MECID_RL_A_EL3   // encoding: [0x20,0xaa,0x3e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  msr MECID_P0_EL2,x0
-// CHECK: msr   MECID_P0_EL2, x0  // encoding: [0x00,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A0_EL2,x0
-// CHECK: msr   MECID_A0_EL2, x0  // encoding: [0x20,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_P1_EL2,x0
-// CHECK: msr   MECID_P1_EL2, x0  // encoding: [0x40,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A1_EL2,x0
-// CHECK: msr   MECID_A1_EL2, x0  // encoding: [0x60,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  m

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray updated 
https://github.com/llvm/llvm-project/pull/146329

>From 69c97078a3e7ee1592e5e5c4b2f4eba6455dd96e Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Wed, 25 Jun 2025 21:22:43 +0100
Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file
 (2/4) (NFC)

This is a series of patches (2/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests

Co-authored-by: Virginia Cangelosi 
---
 llvm/test/MC/AArch64/armv9.2a-mec.s   | 172 ++-
 llvm/test/MC/AArch64/armv9.4-lse128.s |  98 -
 llvm/test/MC/AArch64/armv9.4a-gcs.s   | 198 +-
 .../MC/AArch64/armv9.4a-lse128-diagnostics.s  |  17 ++
 llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 
 llvm/test/MC/AArch64/armv9.5a-cpa.s   |  89 +---
 .../MC/AArch64/armv9.6a-mpam-diagnostics.s|   5 +
 llvm/test/MC/AArch64/armv9.6a-mpam.s  |  80 +--
 .../MC/Disassembler/AArch64/armv9.4a-gcs.txt  |  90 
 .../Disassembler/AArch64/armv9.4a-lse128.txt  |  98 -
 .../MC/Disassembler/AArch64/armv9.5a-cpa.txt  |  42 
 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt |  50 -
 .../MC/Disassembler/AArch64/armv9a-mec.txt|  54 -
 13 files changed, 541 insertions(+), 590 deletions(-)
 delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt

diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s 
b/llvm/test/MC/AArch64/armv9.2a-mec.s
index 42e4bf732086e..c747886f7ec3b 100644
--- a/llvm/test/MC/AArch64/armv9.2a-mec.s
+++ b/llvm/test/MC/AArch64/armv9.2a-mec.s
@@ -1,55 +1,117 @@
-// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s 
| FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck 
--check-prefix=CHECK-NO-MEC %s
-
-  mrs x0, MECIDR_EL2
-// CHECK: mrs   x0, MECIDR_EL2   // encoding: [0xe0,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P0_EL2
-// CHECK: mrs   x0, MECID_P0_EL2  // encoding: [0x00,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A0_EL2
-// CHECK: mrs   x0, MECID_A0_EL2  // encoding: [0x20,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P1_EL2
-// CHECK: mrs   x0, MECID_P1_EL2  // encoding: [0x40,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A1_EL2
-// CHECK: mrs   x0, MECID_A1_EL2  // encoding: [0x60,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_P_EL2
-// CHECK: mrs   x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_A_EL2
-// CHECK: mrs   x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_RL_A_EL3
-// CHECK: mrs   x0, MECID_RL_A_EL3   // encoding: [0x20,0xaa,0x3e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  msr MECID_P0_EL2,x0
-// CHECK: msr   MECID_P0_EL2, x0  // encoding: [0x00,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A0_EL2,x0
-// CHECK: msr   MECID_A0_EL2, x0  // encoding: [0x20,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_P1_EL2,x0
-// CHECK: msr   MECID_P1_EL2, x0  // encoding: [0x40,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A1_EL2,x0
-// CHECK: msr   MECID_A1_EL2, x0  // encoding: [0x60,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  m

[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)

2025-06-30 Thread Erick Velez via llvm-branch-commits


https://github.com/evelez7 updated 
https://github.com/llvm/llvm-project/pull/146165

>From 318f0c85b9f984ba22873ee76a0e610b07d443e9 Mon Sep 17 00:00:00 2001
From: Erick Velez 
Date: Thu, 26 Jun 2025 20:54:03 -0700
Subject: [PATCH] [clang-doc] serialize friends

---
 clang-tools-extra/clang-doc/BitcodeReader.cpp | 46 +++
 clang-tools-extra/clang-doc/BitcodeWriter.cpp | 27 ++-
 clang-tools-extra/clang-doc/BitcodeWriter.h   |  6 +-
 clang-tools-extra/clang-doc/HTMLGenerator.cpp |  3 +
 .../clang-doc/HTMLMustacheGenerator.cpp   |  1 +
 clang-tools-extra/clang-doc/JSONGenerator.cpp | 23 +-
 clang-tools-extra/clang-doc/MDGenerator.cpp   |  4 +
 .../clang-doc/Representation.cpp  | 16 
 clang-tools-extra/clang-doc/Representation.h  | 21 -
 clang-tools-extra/clang-doc/Serialize.cpp | 55 ++
 clang-tools-extra/clang-doc/YAMLGenerator.cpp |  1 +
 .../test/clang-doc/json/class.cpp | 76 +--
 .../unittests/clang-doc/BitcodeTest.cpp   |  2 +
 13 files changed, 236 insertions(+), 45 deletions(-)

diff --git a/clang-tools-extra/clang-doc/BitcodeReader.cpp 
b/clang-tools-extra/clang-doc/BitcodeReader.cpp
index fd6f40cff1a4e..2cbf8bf6b2879 100644
--- a/clang-tools-extra/clang-doc/BitcodeReader.cpp
+++ b/clang-tools-extra/clang-doc/BitcodeReader.cpp
@@ -94,6 +94,7 @@ static llvm::Error decodeRecord(const Record &R, InfoType 
&Field,
   case InfoType::IT_typedef:
   case InfoType::IT_concept:
   case InfoType::IT_variable:
+  case InfoType::IT_friend:
 Field = IT;
 return llvm::Error::success();
   }
@@ -111,6 +112,7 @@ static llvm::Error decodeRecord(const Record &R, FieldId 
&Field,
   case FieldId::F_child_namespace:
   case FieldId::F_child_record:
   case FieldId::F_concept:
+  case FieldId::F_friend:
   case FieldId::F_default:
 Field = F;
 return llvm::Error::success();
@@ -450,6 +452,15 @@ static llvm::Error parseRecord(const Record &R, unsigned 
ID,
   }
 }
 
+static llvm::Error parseRecord(const Record &R, unsigned ID, StringRef Blob,
+   FriendInfo *F) {
+  if (ID == FRIEND_IS_CLASS) {
+return decodeRecord(R, F->IsClass, Blob);
+  }
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),
+ "invalid field for Friend");
+}
+
 template  static llvm::Expected getCommentInfo(T I) 
{
   return llvm::createStringError(llvm::inconvertibleErrorCode(),
  "invalid type cannot contain CommentInfo");
@@ -525,6 +536,18 @@ template <> llvm::Error addTypeInfo(FunctionInfo *I, 
FieldTypeInfo &&T) {
   return llvm::Error::success();
 }
 
+template <> llvm::Error addTypeInfo(FriendInfo *I, FieldTypeInfo &&T) {
+  if (!I->Params)
+I->Params.emplace();
+  I->Params->emplace_back(std::move(T));
+  return llvm::Error::success();
+}
+
+template <> llvm::Error addTypeInfo(FriendInfo *I, TypeInfo &&T) {
+  I->ReturnType.emplace(std::move(T));
+  return llvm::Error::success();
+}
+
 template <> llvm::Error addTypeInfo(EnumInfo *I, TypeInfo &&T) {
   I->BaseType = std::move(T);
   return llvm::Error::success();
@@ -667,6 +690,16 @@ llvm::Error addReference(ConstraintInfo *I, Reference &&R, 
FieldId F) {
   "ConstraintInfo cannot contain this Reference");
 }
 
+template <>
+llvm::Error addReference(FriendInfo *Friend, Reference &&R, FieldId F) {
+  if (F == FieldId::F_friend) {
+Friend->Ref = std::move(R);
+return llvm::Error::success();
+  }
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),
+ "Friend cannot contain this Reference");
+}
+
 template 
 static void addChild(T I, ChildInfoType &&R) {
   llvm::errs() << "invalid child type for info";
@@ -700,6 +733,9 @@ template <> void addChild(RecordInfo *I, EnumInfo &&R) {
 template <> void addChild(RecordInfo *I, TypedefInfo &&R) {
   I->Children.Typedefs.emplace_back(std::move(R));
 }
+template <> void addChild(RecordInfo *I, FriendInfo &&R) {
+  I->Friends.emplace_back(std::move(R));
+}
 
 // Other types of children:
 template <> void addChild(EnumInfo *I, EnumValueInfo &&R) {
@@ -741,6 +777,9 @@ template <> void addTemplate(FunctionInfo *I, TemplateInfo 
&&P) {
 template <> void addTemplate(ConceptInfo *I, TemplateInfo &&P) {
   I->Template = std::move(P);
 }
+template <> void addTemplate(FriendInfo *I, TemplateInfo &&P) {
+  I->Template.emplace(std::move(P));
+}
 
 // Template specializations go only into template records.
 template 
@@ -921,6 +960,10 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned 
ID, T I) {
   case BI_VAR_BLOCK_ID: {
 return handleSubBlock(ID, I, CreateAddFunc(addChild));
   }
+  case BI_FRIEND_BLOCK_ID: {
+return handleSubBlock(ID, I,
+  CreateAddFunc(addChild));
+  }
   default:
 return llvm::createStringError(llvm::inconvertibleErrorCode(),
"invalid subblock type");
@@ -1032,6 +1075,8 @@ ClangDocBitcodeRe

[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)

2025-06-30 Thread Erick Velez via llvm-branch-commits


https://github.com/evelez7 updated 
https://github.com/llvm/llvm-project/pull/146165

>From 318f0c85b9f984ba22873ee76a0e610b07d443e9 Mon Sep 17 00:00:00 2001
From: Erick Velez 
Date: Thu, 26 Jun 2025 20:54:03 -0700
Subject: [PATCH] [clang-doc] serialize friends

---
 clang-tools-extra/clang-doc/BitcodeReader.cpp | 46 +++
 clang-tools-extra/clang-doc/BitcodeWriter.cpp | 27 ++-
 clang-tools-extra/clang-doc/BitcodeWriter.h   |  6 +-
 clang-tools-extra/clang-doc/HTMLGenerator.cpp |  3 +
 .../clang-doc/HTMLMustacheGenerator.cpp   |  1 +
 clang-tools-extra/clang-doc/JSONGenerator.cpp | 23 +-
 clang-tools-extra/clang-doc/MDGenerator.cpp   |  4 +
 .../clang-doc/Representation.cpp  | 16 
 clang-tools-extra/clang-doc/Representation.h  | 21 -
 clang-tools-extra/clang-doc/Serialize.cpp | 55 ++
 clang-tools-extra/clang-doc/YAMLGenerator.cpp |  1 +
 .../test/clang-doc/json/class.cpp | 76 +--
 .../unittests/clang-doc/BitcodeTest.cpp   |  2 +
 13 files changed, 236 insertions(+), 45 deletions(-)

diff --git a/clang-tools-extra/clang-doc/BitcodeReader.cpp 
b/clang-tools-extra/clang-doc/BitcodeReader.cpp
index fd6f40cff1a4e..2cbf8bf6b2879 100644
--- a/clang-tools-extra/clang-doc/BitcodeReader.cpp
+++ b/clang-tools-extra/clang-doc/BitcodeReader.cpp
@@ -94,6 +94,7 @@ static llvm::Error decodeRecord(const Record &R, InfoType 
&Field,
   case InfoType::IT_typedef:
   case InfoType::IT_concept:
   case InfoType::IT_variable:
+  case InfoType::IT_friend:
 Field = IT;
 return llvm::Error::success();
   }
@@ -111,6 +112,7 @@ static llvm::Error decodeRecord(const Record &R, FieldId 
&Field,
   case FieldId::F_child_namespace:
   case FieldId::F_child_record:
   case FieldId::F_concept:
+  case FieldId::F_friend:
   case FieldId::F_default:
 Field = F;
 return llvm::Error::success();
@@ -450,6 +452,15 @@ static llvm::Error parseRecord(const Record &R, unsigned 
ID,
   }
 }
 
+static llvm::Error parseRecord(const Record &R, unsigned ID, StringRef Blob,
+   FriendInfo *F) {
+  if (ID == FRIEND_IS_CLASS) {
+return decodeRecord(R, F->IsClass, Blob);
+  }
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),
+ "invalid field for Friend");
+}
+
 template  static llvm::Expected getCommentInfo(T I) 
{
   return llvm::createStringError(llvm::inconvertibleErrorCode(),
  "invalid type cannot contain CommentInfo");
@@ -525,6 +536,18 @@ template <> llvm::Error addTypeInfo(FunctionInfo *I, 
FieldTypeInfo &&T) {
   return llvm::Error::success();
 }
 
+template <> llvm::Error addTypeInfo(FriendInfo *I, FieldTypeInfo &&T) {
+  if (!I->Params)
+I->Params.emplace();
+  I->Params->emplace_back(std::move(T));
+  return llvm::Error::success();
+}
+
+template <> llvm::Error addTypeInfo(FriendInfo *I, TypeInfo &&T) {
+  I->ReturnType.emplace(std::move(T));
+  return llvm::Error::success();
+}
+
 template <> llvm::Error addTypeInfo(EnumInfo *I, TypeInfo &&T) {
   I->BaseType = std::move(T);
   return llvm::Error::success();
@@ -667,6 +690,16 @@ llvm::Error addReference(ConstraintInfo *I, Reference &&R, 
FieldId F) {
   "ConstraintInfo cannot contain this Reference");
 }
 
+template <>
+llvm::Error addReference(FriendInfo *Friend, Reference &&R, FieldId F) {
+  if (F == FieldId::F_friend) {
+Friend->Ref = std::move(R);
+return llvm::Error::success();
+  }
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),
+ "Friend cannot contain this Reference");
+}
+
 template 
 static void addChild(T I, ChildInfoType &&R) {
   llvm::errs() << "invalid child type for info";
@@ -700,6 +733,9 @@ template <> void addChild(RecordInfo *I, EnumInfo &&R) {
 template <> void addChild(RecordInfo *I, TypedefInfo &&R) {
   I->Children.Typedefs.emplace_back(std::move(R));
 }
+template <> void addChild(RecordInfo *I, FriendInfo &&R) {
+  I->Friends.emplace_back(std::move(R));
+}
 
 // Other types of children:
 template <> void addChild(EnumInfo *I, EnumValueInfo &&R) {
@@ -741,6 +777,9 @@ template <> void addTemplate(FunctionInfo *I, TemplateInfo 
&&P) {
 template <> void addTemplate(ConceptInfo *I, TemplateInfo &&P) {
   I->Template = std::move(P);
 }
+template <> void addTemplate(FriendInfo *I, TemplateInfo &&P) {
+  I->Template.emplace(std::move(P));
+}
 
 // Template specializations go only into template records.
 template 
@@ -921,6 +960,10 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned 
ID, T I) {
   case BI_VAR_BLOCK_ID: {
 return handleSubBlock(ID, I, CreateAddFunc(addChild));
   }
+  case BI_FRIEND_BLOCK_ID: {
+return handleSubBlock(ID, I,
+  CreateAddFunc(addChild));
+  }
   default:
 return llvm::createStringError(llvm::inconvertibleErrorCode(),
"invalid subblock type");
@@ -1032,6 +1075,8 @@ ClangDocBitcodeRe

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)

2025-06-30 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146076

>From 2d8d232729769a3ca274789dee2fe542d0045ef2 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 05:38:52 -0400
Subject: [PATCH] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default

Also removes the command line option to control this feature.

There seem to be mainly two kinds of test changes:
- Some operands of addition instructions are swapped; that is to be expected
  since PTRADD is not commutative.
- Improvements in code generation, probably because the legacy lowering enabled
  some transformations that were sometimes harmful.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  10 +-
 .../identical-subrange-spill-infloop.ll   | 354 +++---
 .../AMDGPU/infer-addrspace-flat-atomic.ll |  14 +-
 llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll  |   8 +-
 .../AMDGPU/lower-module-lds-via-hybrid.ll |   4 +-
 .../AMDGPU/lower-module-lds-via-table.ll  |  16 +-
 .../match-perm-extract-vector-elt-bug.ll  |  22 +-
 llvm/test/CodeGen/AMDGPU/memmove-var-size.ll  |  16 +-
 .../AMDGPU/preload-implicit-kernargs.ll   |   6 +-
 .../AMDGPU/promote-constOffset-to-imm.ll  |   8 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |   7 +-
 .../AMDGPU/ptradd-sdag-optimizations.ll   |  94 ++---
 .../AMDGPU/ptradd-sdag-undef-poison.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   |  27 +-
 llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll |  29 +-
 15 files changed, 311 insertions(+), 310 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 822bab88c8a09..79981007c13af 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -63,14 +63,6 @@ static cl::opt UseDivergentRegisterIndexing(
 cl::desc("Use indirect register addressing for divergent indexes"),
 cl::init(false));
 
-// TODO: This option should be removed once we switch to always using PTRADD in
-// the SelectionDAG.
-static cl::opt UseSelectionDAGPTRADD(
-"amdgpu-use-sdag-ptradd", cl::Hidden,
-cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the "
- "SelectionDAG ISel"),
-cl::init(false));
-
 static bool denormalModeIsFlushAllF32(const MachineFunction &MF) {
   const SIMachineFunctionInfo *Info = MF.getInfo();
   return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign();
@@ -10599,7 +10591,7 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue 
Op,
 
 bool SITargetLowering::shouldPreservePtrArith(const Function &F,
   EVT PtrVT) const {
-  return UseSelectionDAGPTRADD && PtrVT == MVT::i64;
+  return PtrVT == MVT::i64;
 }
 
 bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F,
diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll 
b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
index 56ceba258f471..f9fcf489bd389 100644
--- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
@@ -6,97 +6,151 @@ define void @main(i1 %arg) #0 {
 ; CHECK:   ; %bb.0: ; %bb
 ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
-; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill
+; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
 ; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:v_writelane_b32 v5, s30, 0
-; CHECK-NEXT:v_writelane_b32 v5, s31, 1
-; CHECK-NEXT:v_writelane_b32 v5, s36, 2
-; CHECK-NEXT:v_writelane_b32 v5, s37, 3
-; CHECK-NEXT:v_writelane_b32 v5, s38, 4
-; CHECK-NEXT:v_writelane_b32 v5, s39, 5
-; CHECK-NEXT:v_writelane_b32 v5, s48, 6
-; CHECK-NEXT:v_writelane_b32 v5, s49, 7
-; CHECK-NEXT:v_writelane_b32 v5, s50, 8
-; CHECK-NEXT:v_writelane_b32 v5, s51, 9
-; CHECK-NEXT:v_writelane_b32 v5, s52, 10
-; CHECK-NEXT:v_writelane_b32 v5, s53, 11
-; CHECK-NEXT:v_writelane_b32 v5, s54, 12
-; CHECK-NEXT:v_writelane_b32 v5, s55, 13
-; CHECK-NEXT:s_getpc_b64 s[24:25]
-; CHECK-NEXT:v_writelane_b32 v5, s64, 14
-; CHECK-NEXT:s_movk_i32 s4, 0xf0
-; CHECK-NEXT:s_mov_b32 s5, s24
-; CHECK-NEXT:v_writelane_b32 v5, s65, 15
-; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0
-; CHECK-NEXT:s_mov_b64 s[4:5], 0
-; CHECK-NEXT:v_writelane_b32 v5, s66, 16
-; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0
-; CHECK-NEXT:v_writelane_b32 v5, s67, 17
-; CHECK-NEXT:s_waitcnt lgkmcnt(0)
-; CHECK-NEXT:s_movk_i32 s6, 0x130
-; CHECK-NEXT:s_mov_b32 s7, s24
-; CHECK-NEXT:v_writela

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)

2025-06-30 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146076

>From 2d8d232729769a3ca274789dee2fe542d0045ef2 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 05:38:52 -0400
Subject: [PATCH] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default

Also removes the command line option to control this feature.

There seem to be mainly two kinds of test changes:
- Some operands of addition instructions are swapped; that is to be expected
  since PTRADD is not commutative.
- Improvements in code generation, probably because the legacy lowering enabled
  some transformations that were sometimes harmful.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  10 +-
 .../identical-subrange-spill-infloop.ll   | 354 +++---
 .../AMDGPU/infer-addrspace-flat-atomic.ll |  14 +-
 llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll  |   8 +-
 .../AMDGPU/lower-module-lds-via-hybrid.ll |   4 +-
 .../AMDGPU/lower-module-lds-via-table.ll  |  16 +-
 .../match-perm-extract-vector-elt-bug.ll  |  22 +-
 llvm/test/CodeGen/AMDGPU/memmove-var-size.ll  |  16 +-
 .../AMDGPU/preload-implicit-kernargs.ll   |   6 +-
 .../AMDGPU/promote-constOffset-to-imm.ll  |   8 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |   7 +-
 .../AMDGPU/ptradd-sdag-optimizations.ll   |  94 ++---
 .../AMDGPU/ptradd-sdag-undef-poison.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   |  27 +-
 llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll |  29 +-
 15 files changed, 311 insertions(+), 310 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 822bab88c8a09..79981007c13af 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -63,14 +63,6 @@ static cl::opt UseDivergentRegisterIndexing(
 cl::desc("Use indirect register addressing for divergent indexes"),
 cl::init(false));
 
-// TODO: This option should be removed once we switch to always using PTRADD in
-// the SelectionDAG.
-static cl::opt UseSelectionDAGPTRADD(
-"amdgpu-use-sdag-ptradd", cl::Hidden,
-cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the "
- "SelectionDAG ISel"),
-cl::init(false));
-
 static bool denormalModeIsFlushAllF32(const MachineFunction &MF) {
   const SIMachineFunctionInfo *Info = MF.getInfo();
   return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign();
@@ -10599,7 +10591,7 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue 
Op,
 
 bool SITargetLowering::shouldPreservePtrArith(const Function &F,
   EVT PtrVT) const {
-  return UseSelectionDAGPTRADD && PtrVT == MVT::i64;
+  return PtrVT == MVT::i64;
 }
 
 bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F,
diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll 
b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
index 56ceba258f471..f9fcf489bd389 100644
--- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
@@ -6,97 +6,151 @@ define void @main(i1 %arg) #0 {
 ; CHECK:   ; %bb.0: ; %bb
 ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
-; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill
+; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
 ; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:v_writelane_b32 v5, s30, 0
-; CHECK-NEXT:v_writelane_b32 v5, s31, 1
-; CHECK-NEXT:v_writelane_b32 v5, s36, 2
-; CHECK-NEXT:v_writelane_b32 v5, s37, 3
-; CHECK-NEXT:v_writelane_b32 v5, s38, 4
-; CHECK-NEXT:v_writelane_b32 v5, s39, 5
-; CHECK-NEXT:v_writelane_b32 v5, s48, 6
-; CHECK-NEXT:v_writelane_b32 v5, s49, 7
-; CHECK-NEXT:v_writelane_b32 v5, s50, 8
-; CHECK-NEXT:v_writelane_b32 v5, s51, 9
-; CHECK-NEXT:v_writelane_b32 v5, s52, 10
-; CHECK-NEXT:v_writelane_b32 v5, s53, 11
-; CHECK-NEXT:v_writelane_b32 v5, s54, 12
-; CHECK-NEXT:v_writelane_b32 v5, s55, 13
-; CHECK-NEXT:s_getpc_b64 s[24:25]
-; CHECK-NEXT:v_writelane_b32 v5, s64, 14
-; CHECK-NEXT:s_movk_i32 s4, 0xf0
-; CHECK-NEXT:s_mov_b32 s5, s24
-; CHECK-NEXT:v_writelane_b32 v5, s65, 15
-; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0
-; CHECK-NEXT:s_mov_b64 s[4:5], 0
-; CHECK-NEXT:v_writelane_b32 v5, s66, 16
-; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0
-; CHECK-NEXT:v_writelane_b32 v5, s67, 17
-; CHECK-NEXT:s_waitcnt lgkmcnt(0)
-; CHECK-NEXT:s_movk_i32 s6, 0x130
-; CHECK-NEXT:s_mov_b32 s7, s24
-; CHECK-NEXT:v_writela

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-06-30 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146075

>From 452008111a34c815b38242272063654393261921 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 04:23:50 -0400
Subject: [PATCH 1/3] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR

If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which we will soon need to fold offsets
into FLAT instructions. Currently, disjoint ORs can still be used for
offset folding, so that part of the logic can't be tested.

The PR contains a hacky workaround for a situation where an AssertAlign
operand of a PTRADD is not DAGCombined before the PTRADD, causing the
PTRADD to be turned into a disjoint OR although reassociating it with
the operand of the AssertAlign would be better. This wouldn't be a
problem if the DAGCombiner ensured that a node is only processed after
all its operands have been processed.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 56 ++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 822bab88c8a09..71230078edc69 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -15136,6 +15136,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode 
*N,
   return Folded;
   }
 
+  // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if
+  // that transformation can't block an offset folding at any use of the 
ptradd.
+  // This should be done late, after legalization, so that it doesn't block
+  // other ptradd combines that could enable more offset folding.
+  bool HasIntermediateAssertAlign =
+  N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd();
+  // This is a hack to work around an ordering problem for DAGs like this:
+  //   (ptradd (AssertAlign (ptradd p, c1), k), c2)
+  // If the outer ptradd is handled first by the DAGCombiner, it can be
+  // transformed into a disjoint or. Then, when the generic AssertAlign combine
+  // pushes the AssertAlign through the inner ptradd, it's too late for the
+  // ptradd reassociation to trigger.
+  if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign &&
+  DAG.haveNoCommonBitsSet(N0, N1)) {
+bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) {
+  if (auto *LoadStore = dyn_cast(User);
+  LoadStore && LoadStore->getBasePtr().getNode() == N) {
+unsigned AS = LoadStore->getAddressSpace();
+// Currently, we only really need ptradds to fold offsets into flat
+// memory instructions.
+if (AS != AMDGPUAS::FLAT_ADDRESS)
+  return false;
+TargetLoweringBase::AddrMode AM;
+AM.HasBaseReg = true;
+EVT VT = LoadStore->getMemoryVT();
+Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS);
+  }
+  return false;
+});
+
+if (!TransformCanBreakAddrMode)
+  return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint);
+  }
+
   if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse())
 return SDValue();
 
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index 893deb35fe822..64e041103a563 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) {
 
 ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in 
the
 ; assertalign DAG combine.
-define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr)  #0 {
+define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) {
 ; GFX942-LABEL: llvm_amdgcn_queue_ptr:
 ; GFX942:   ; %bb.0:
 ; GFX942-NEXT:v_mov_b32_e32 v2, 0
@@ -416,6 +416,60 @@ entry:
   ret void
 }
 
+; Check that ptradds can be lowered to disjoint ORs.
+define ptr @gep_disjoint_or(ptr %base) {
+; GFX942-LABEL: gep_disjoint_or:
+; GFX942:   ; %bb.0:
+; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4
+; GFX942-NEXT:s_setpc_b64 s[30:31]
+  %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0)
+  %gep = getelementptr nuw inbounds i8, ptr %p, i64 4
+  ret ptr %gep
+}
+
+; Check that AssertAlign no

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits



@@ -1,592 +1,697 @@
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding  
 -mattr=+the -mattr=+d128 < %s | FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding 
-mattr=+v8.9a -mattr=+the -mattr=+d128 < %s | FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding 
-mattr=+v9.4a -mattr=+the -mattr=+d128 < %s | FileCheck %s
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v8.9a < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v9.4a < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \
+// RUN:| llvm-objdump -d --mattr=+the,+d128 - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \
+// RUN:   | llvm-objdump -d --mattr=-the,-d128 - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+the,+d128 -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
 
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu   < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s
 
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu   -mattr=+the < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a -mattr=+the < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a -mattr=+the < 
%s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s
+mrs x3, RCWMASK_EL1
+// CHECK-INST: mrs x3, RCWMASK_EL1
+// CHECK-ENCODING: encoding: [0xc3,0xd0,0x38,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d538d0c3  mrs x3, S3_0_C13_C0_6
 
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+the 
-mattr=+d128 < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-ZXR %s
+msr RCWMASK_EL1, x1
+// CHECK-INST: msr RCWMASK_EL1, x1
+// CHECK-ENCODING: encoding: [0xc1,0xd0,0x18,0xd5]
+// CHECK-ERROR: error: expected writable system register or pstate
+// CHECK-UNKNOWN:  d518d0c1  msr S3_0_C13_C0_6, x1
 
-mrs x3, RCWMASK_EL1
-// CHECK:   mrs x3, RCWMASK_EL1   // encoding: [0xc3,0xd0,0x38,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register
-msr RCWMASK_EL1, x1
-// CHECK:   msr RCWMASK_EL1, x1   // encoding: [0xc1,0xd0,0x18,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or 
pstate
-mrs x3, RCWSMASK_EL1
-// CHECK:   mrs x3, RCWSMASK_EL1  // encoding: [0x63,0xd0,0x38,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register
-msr RCWSMASK_EL1, x1
-// CHECK:   msr RCWSMASK_EL1, x1  // encoding: [0x61,0xd0,0x18,0xd5]
-// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or 
pstate
+mrs x3, RCWSMASK_EL1
+// CHECK-INST: mrs x3, RCWSMASK_EL1
+// CHECK-ENCODING: encoding: [0x63,0xd0,0x38,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d538d063  mrs x3, S3_0_C13_C0_3
+msr RCWSMASK_EL1, x1
+// CHECK-INST: msr RCWSMASK_EL1, x1
+// CHECK-ENCODING: encoding: [0x61,0xd0,0x18,0xd5]
+// CHECK-ERROR: error: expected writable system register or pstate
+// CHECK-UNKNOWN:  d518d061  msr S3_0_C13_C0_3, x1
 
-rcwcas   x0, x1, [x4]
-// CHECK:   rcwcas   x0, x1, [x4] // encoding: [0x81,0x08,0x20,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
-rcwcasa  x0, x1, [x4]
-// CHECK:   rcwcasa  x0, x1, [x4] // encoding: [0x81,0x08,0xa0,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
-rcwcasal x0, x1, [x4]
-// CHECK:   rcwcasal x0, x1, [x4] // encoding: [0x81,0x08,0xe0,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
-rcwcasl  x0, x1, [x4]
-// CHECK:   rcwcasl  x0, x1, [x4] // encoding: [0x81,0x08,0x60,0x19]
-// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits



@@ -16,28 +16,41 @@
 // RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,-clrbhb 
< %s | FileCheck %s --check-prefix=HINT_22
 
 // Optional, off by default, manually enabled
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | 
FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < 
%s | FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb 
< %s | FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < 
%s | FileCheck %s --check-prefix=CLRBHB
-// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb 
< %s | FileCheck %s --check-prefix=CLRBHB
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | 
FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < 
%s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb 
< %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < 
%s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb 
< %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST

jthackray wrote:

I'm not sure (Virginia converted this file) but I suspect because it has fairly 
extensive version-dependent tests at the top.

https://github.com/llvm/llvm-project/pull/146330
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray updated 
https://github.com/llvm/llvm-project/pull/146331

>From 8c9eccdc95e465fdbfe833080afb1ad1099c224c Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Fri, 27 Jun 2025 20:16:06 +0100
Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file
 (4/4) (NFC)

This is a series of patches (4/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests whose .s tests have functions
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly

Co-authored-by: Virginia Cangelosi 
---
 llvm/test/MC/AArch64/armv9.6a-lsui.s  | 1073 +++--
 llvm/test/MC/AArch64/armv9.6a-occmo.s |   54 +-
 llvm/test/MC/AArch64/armv9.6a-pcdphint.s  |   37 +-
 llvm/test/MC/AArch64/armv9.6a-rme-gpc3.s  |   46 +-
 .../MC/Disassembler/AArch64/armv9.6a-lsui.txt |  323 -
 .../Disassembler/AArch64/armv9.6a-occmo.txt   |   11 -
 .../AArch64/armv9.6a-pcdphint.txt |8 -
 .../AArch64/armv9.6a-rme-gpc3.txt |   18 -
 8 files changed, 805 insertions(+), 765 deletions(-)
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-lsui.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-occmo.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-pcdphint.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-rme-gpc3.txt

diff --git a/llvm/test/MC/AArch64/armv9.6a-lsui.s 
b/llvm/test/MC/AArch64/armv9.6a-lsui.s
index d4a5e1f980560..264a869b6d286 100644
--- a/llvm/test/MC/AArch64/armv9.6a-lsui.s
+++ b/llvm/test/MC/AArch64/armv9.6a-lsui.s
@@ -1,408 +1,751 @@
-// RUN: llvm-mc -triple aarch64 -mattr=+lsui -show-encoding %s  | FileCheck %s
-// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>&1  | FileCheck %s 
--check-prefix=ERROR
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \
+// RUN:  | llvm-objdump -d --mattr=+lsui --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \
+// RUN:   | llvm-objdump -d --mattr=-lsui --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+lsui -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
 
-_func:
-// CHECK: _func:
 
//--
 // Unprivileged load/store operations
 
//--
-  ldtxr   x9, [sp]
-// CHECK: ldtxrx9, [sp]// encoding: 
[0xe9,0x7f,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x9, [sp, #0]
-// CHECK: ldtxrx9, [sp]// encoding: 
[0xe9,0x7f,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x10, [x11]
-// CHECK: ldtxrx10, [x11]  // encoding: 
[0x6a,0x7d,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x10, [x11, #0]
-// CHECK: ldtxrx10, [x11]  // encoding: 
[0x6a,0x7d,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  ldatxr  x9, [sp]
-// CHECK: ldatxr   x9, [sp]// encoding: 
[0xe9,0xff,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldatxr  x10, [x11]
-// CHECK: ldatxr   x10, [x11]  // encoding: 
[0x6a,0xfd,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  sttxr   wzr, w4, [sp]
-// CHECK: sttxrwzr, w4, [sp]   // encoding: 
[0xe4,0x7f,0x1f,0x89]
-// ERROR: error: instruction requires: lsui
-  sttxr   wzr, w4, [sp, #0]
-// CHECK: sttxrwzr, w4, [sp]   // encoding: 
[0xe4,0x7f,0x1f,0x89]
-// ERROR: error: instruction requires: lsui
-  sttxr   w5, x6, [x7]
-// CHECK: sttxrw5, x6, [x7]// encoding: 
[0xe6,0x7c,0x05,0xc9]
-// ERROR: error: instruction requires: lsui
-  sttxr   w5, x6, [x7, #0]
-// CHECK: sttxrw5, x6, [x7]// encoding: 
[0xe6,0x7c,0x05,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  stltxr  w2, w4, [sp]
-// CHECK: stltxr   w2, w4, [sp]// encoding: 
[0xe4,0xff,0x02,0x89]

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray updated 
https://github.com/llvm/llvm-project/pull/146329

>From be8bcdead883ec9bac8bebf6b3382974fc988c28 Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Wed, 25 Jun 2025 21:22:43 +0100
Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file
 (2/4) (NFC)

This is a series of patches (2/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests

Co-authored-by: Virginia Cangelosi 
---
 llvm/test/MC/AArch64/armv9.2a-mec.s   | 172 ++-
 llvm/test/MC/AArch64/armv9.4-lse128.s |  98 -
 llvm/test/MC/AArch64/armv9.4a-gcs.s   | 198 +-
 .../MC/AArch64/armv9.4a-lse128-diagnostics.s  |  17 ++
 llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 
 llvm/test/MC/AArch64/armv9.5a-cpa.s   |  89 +---
 .../MC/AArch64/armv9.6a-mpam-diagnostics.s|   5 +
 llvm/test/MC/AArch64/armv9.6a-mpam.s  |  80 +--
 .../MC/Disassembler/AArch64/armv9.4a-gcs.txt  |  90 
 .../Disassembler/AArch64/armv9.4a-lse128.txt  |  98 -
 .../MC/Disassembler/AArch64/armv9.5a-cpa.txt  |  42 
 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt |  50 -
 .../MC/Disassembler/AArch64/armv9a-mec.txt|  54 -
 13 files changed, 541 insertions(+), 590 deletions(-)
 delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt

diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s 
b/llvm/test/MC/AArch64/armv9.2a-mec.s
index 42e4bf732086e..c747886f7ec3b 100644
--- a/llvm/test/MC/AArch64/armv9.2a-mec.s
+++ b/llvm/test/MC/AArch64/armv9.2a-mec.s
@@ -1,55 +1,117 @@
-// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s 
| FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck 
--check-prefix=CHECK-NO-MEC %s
-
-  mrs x0, MECIDR_EL2
-// CHECK: mrs   x0, MECIDR_EL2   // encoding: [0xe0,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P0_EL2
-// CHECK: mrs   x0, MECID_P0_EL2  // encoding: [0x00,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A0_EL2
-// CHECK: mrs   x0, MECID_A0_EL2  // encoding: [0x20,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P1_EL2
-// CHECK: mrs   x0, MECID_P1_EL2  // encoding: [0x40,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A1_EL2
-// CHECK: mrs   x0, MECID_A1_EL2  // encoding: [0x60,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_P_EL2
-// CHECK: mrs   x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_A_EL2
-// CHECK: mrs   x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_RL_A_EL3
-// CHECK: mrs   x0, MECID_RL_A_EL3   // encoding: [0x20,0xaa,0x3e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  msr MECID_P0_EL2,x0
-// CHECK: msr   MECID_P0_EL2, x0  // encoding: [0x00,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A0_EL2,x0
-// CHECK: msr   MECID_A0_EL2, x0  // encoding: [0x20,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_P1_EL2,x0
-// CHECK: msr   MECID_P1_EL2, x0  // encoding: [0x40,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A1_EL2,x0
-// CHECK: msr   MECID_A1_EL2, x0  // encoding: [0x60,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  m

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits



@@ -1,55 +1,117 @@
-// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s 
| FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck 
--check-prefix=CHECK-NO-MEC %s
-
-  mrs x0, MECIDR_EL2
-// CHECK: mrs   x0, MECIDR_EL2   // encoding: [0xe0,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P0_EL2
-// CHECK: mrs   x0, MECID_P0_EL2  // encoding: [0x00,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A0_EL2
-// CHECK: mrs   x0, MECID_A0_EL2  // encoding: [0x20,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P1_EL2
-// CHECK: mrs   x0, MECID_P1_EL2  // encoding: [0x40,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A1_EL2
-// CHECK: mrs   x0, MECID_A1_EL2  // encoding: [0x60,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_P_EL2
-// CHECK: mrs   x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_A_EL2
-// CHECK: mrs   x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_RL_A_EL3
-// CHECK: mrs   x0, MECID_RL_A_EL3   // encoding: [0x20,0xaa,0x3e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  msr MECID_P0_EL2,x0
-// CHECK: msr   MECID_P0_EL2, x0  // encoding: [0x00,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A0_EL2,x0
-// CHECK: msr   MECID_A0_EL2, x0  // encoding: [0x20,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_P1_EL2,x0
-// CHECK: msr   MECID_P1_EL2, x0  // encoding: [0x40,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A1_EL2,x0
-// CHECK: msr   MECID_A1_EL2, x0  // encoding: [0x60,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr VMECID_P_EL2,   x0
-// CHECK: msr   VMECID_P_EL2, x0 // encoding: [0x00,0xa9,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr VMECID_A_EL2,   x0
-// CHECK: msr   VMECID_A_EL2, x0 // encoding: [0x20,0xa9,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_RL_A_EL3, x0
-// CHECK: msr   MECID_RL_A_EL3, x0   // encoding: [0x20,0xaa,0x1e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-
-  dc cigdpae, x0
-// CHECK: dc cigdpae, x0 // encoding: [0xe0,0x7e,0x0c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIGDPAE requires: mec
-  dc cipae, x0
-// CHECK: dc cipae, x0   // encoding: [0x00,0x7e,0x0c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIPAE requires: mec
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \
+// RUN:| llvm-objdump -d --mattr=+mec --no-print-imm-hex - | FileCheck 
%s --check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \
+// RUN:   | llvm-objdump -d --mattr=-mec --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+mec -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
+
+mrs x0, MECIDR_EL2
+// CHECK-INST: mrs x0, MECIDR_EL2
+// CHECK-ENCODING: encoding: [0xe0,0xa8,0x3c,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d53ca8e0 mrs x0, S3_4_C10_C8_7
+
+mrs x0, MECID_P0_EL2
+// CHECK-INST: mrs x0, MECID_P0_EL2
+// CHECK-ENCODING: encoding: [0x00,0xa8,0x3c,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d53ca800 mrs x0, S3_4_C10_C8_0
+
+mrs x0, MECID_A0_EL2
+// CHECK-INST: mrs x0, MECID_A0_EL2
+// CHECK-ENCODING: encoding: [0x20,0xa8,0x3c,0xd5]
+// CHECK-ERROR: error: expected readable system register
+// CHECK-UNKNOWN:  d53ca820 mrs

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits



@@ -0,0 +1,138 @@
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \
+// RUN:| llvm-objdump -d --mattr=+lse128 - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \
+// RUN:   | llvm-objdump -d --mattr=-lse128 - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+lse128 -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
+
+ldclrp   x1, x2, [x11]
+// CHECK-INST: ldclrp x1, x2, [x11]
+// CHECK-ENCODING: encoding: [0x61,0x11,0x22,0x19]
+// CHECK-ERROR: :[[@LINE-3]]:1: error: instruction requires: lse128
+// CHECK-UNKNOWN:  19221161 
+ldclrp   x21, x22, [sp]

jthackray wrote:

Thanks, now fixed.

https://github.com/llvm/llvm-project/pull/146329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray edited 
https://github.com/llvm/llvm-project/pull/146329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang-tools-extra

Author: Erick Velez (evelez7)


Changes



---

Patch is 24.39 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146165.diff


13 Files Affected:

- (modified) clang-tools-extra/clang-doc/BitcodeReader.cpp (+46) 
- (modified) clang-tools-extra/clang-doc/BitcodeWriter.cpp (+24-3) 
- (modified) clang-tools-extra/clang-doc/BitcodeWriter.h (+5-1) 
- (modified) clang-tools-extra/clang-doc/HTMLGenerator.cpp (+3) 
- (modified) clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp (+1) 
- (modified) clang-tools-extra/clang-doc/JSONGenerator.cpp (+21-2) 
- (modified) clang-tools-extra/clang-doc/MDGenerator.cpp (+4) 
- (modified) clang-tools-extra/clang-doc/Representation.cpp (+16) 
- (modified) clang-tools-extra/clang-doc/Representation.h (+20-1) 
- (modified) clang-tools-extra/clang-doc/Serialize.cpp (+53) 
- (modified) clang-tools-extra/clang-doc/YAMLGenerator.cpp (+1) 
- (modified) clang-tools-extra/test/clang-doc/json/class.cpp (+38-38) 
- (modified) clang-tools-extra/unittests/clang-doc/BitcodeTest.cpp (+2) 


``diff
diff --git a/clang-tools-extra/clang-doc/BitcodeReader.cpp 
b/clang-tools-extra/clang-doc/BitcodeReader.cpp
index fd6f40cff1a4e..2cbf8bf6b2879 100644
--- a/clang-tools-extra/clang-doc/BitcodeReader.cpp
+++ b/clang-tools-extra/clang-doc/BitcodeReader.cpp
@@ -94,6 +94,7 @@ static llvm::Error decodeRecord(const Record &R, InfoType 
&Field,
   case InfoType::IT_typedef:
   case InfoType::IT_concept:
   case InfoType::IT_variable:
+  case InfoType::IT_friend:
 Field = IT;
 return llvm::Error::success();
   }
@@ -111,6 +112,7 @@ static llvm::Error decodeRecord(const Record &R, FieldId 
&Field,
   case FieldId::F_child_namespace:
   case FieldId::F_child_record:
   case FieldId::F_concept:
+  case FieldId::F_friend:
   case FieldId::F_default:
 Field = F;
 return llvm::Error::success();
@@ -450,6 +452,15 @@ static llvm::Error parseRecord(const Record &R, unsigned 
ID,
   }
 }
 
+static llvm::Error parseRecord(const Record &R, unsigned ID, StringRef Blob,
+   FriendInfo *F) {
+  if (ID == FRIEND_IS_CLASS) {
+return decodeRecord(R, F->IsClass, Blob);
+  }
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),
+ "invalid field for Friend");
+}
+
 template  static llvm::Expected getCommentInfo(T I) 
{
   return llvm::createStringError(llvm::inconvertibleErrorCode(),
  "invalid type cannot contain CommentInfo");
@@ -525,6 +536,18 @@ template <> llvm::Error addTypeInfo(FunctionInfo *I, 
FieldTypeInfo &&T) {
   return llvm::Error::success();
 }
 
+template <> llvm::Error addTypeInfo(FriendInfo *I, FieldTypeInfo &&T) {
+  if (!I->Params)
+I->Params.emplace();
+  I->Params->emplace_back(std::move(T));
+  return llvm::Error::success();
+}
+
+template <> llvm::Error addTypeInfo(FriendInfo *I, TypeInfo &&T) {
+  I->ReturnType.emplace(std::move(T));
+  return llvm::Error::success();
+}
+
 template <> llvm::Error addTypeInfo(EnumInfo *I, TypeInfo &&T) {
   I->BaseType = std::move(T);
   return llvm::Error::success();
@@ -667,6 +690,16 @@ llvm::Error addReference(ConstraintInfo *I, Reference &&R, 
FieldId F) {
   "ConstraintInfo cannot contain this Reference");
 }
 
+template <>
+llvm::Error addReference(FriendInfo *Friend, Reference &&R, FieldId F) {
+  if (F == FieldId::F_friend) {
+Friend->Ref = std::move(R);
+return llvm::Error::success();
+  }
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),
+ "Friend cannot contain this Reference");
+}
+
 template 
 static void addChild(T I, ChildInfoType &&R) {
   llvm::errs() << "invalid child type for info";
@@ -700,6 +733,9 @@ template <> void addChild(RecordInfo *I, EnumInfo &&R) {
 template <> void addChild(RecordInfo *I, TypedefInfo &&R) {
   I->Children.Typedefs.emplace_back(std::move(R));
 }
+template <> void addChild(RecordInfo *I, FriendInfo &&R) {
+  I->Friends.emplace_back(std::move(R));
+}
 
 // Other types of children:
 template <> void addChild(EnumInfo *I, EnumValueInfo &&R) {
@@ -741,6 +777,9 @@ template <> void addTemplate(FunctionInfo *I, TemplateInfo 
&&P) {
 template <> void addTemplate(ConceptInfo *I, TemplateInfo &&P) {
   I->Template = std::move(P);
 }
+template <> void addTemplate(FriendInfo *I, TemplateInfo &&P) {
+  I->Template.emplace(std::move(P));
+}
 
 // Template specializations go only into template records.
 template 
@@ -921,6 +960,10 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned 
ID, T I) {
   case BI_VAR_BLOCK_ID: {
 return handleSubBlock(ID, I, CreateAddFunc(addChild));
   }
+  case BI_FRIEND_BLOCK_ID: {
+return handleSubBlock(ID, I,
+  CreateAddFunc(addChild));
+  }
   default:
 return llvm::createStringError(llvm::inconvertibleErrorCode(),
"invalid su

[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)

2025-06-30 Thread Erick Velez via llvm-branch-commits


https://github.com/evelez7 ready_for_review 
https://github.com/llvm/llvm-project/pull/146165
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits



@@ -1,115 +1,203 @@
-// RUN: llvm-mc -triple aarch64 -mattr +gcs -show-encoding %s | FileCheck %s
-// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>%t | FileCheck %s 
--check-prefix=NO-GCS
-// RUN: FileCheck --check-prefix=ERROR-NO-GCS %s < %t
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \
+// RUN:| llvm-objdump -d --mattr=+gcs --no-print-imm-hex - | FileCheck 
%s --check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \
+// RUN:   | llvm-objdump -d --mattr=-gcs --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+gcs -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
 
 msr GCSCR_EL1, x0
+// CHECK-INST: msr GCSCR_EL1, x0
+// CHECK-ENCODING: encoding: [0x00,0x25,0x18,0xd5]
+// CHECK-UNKNOWN:  d5182500 msr GCSCR_EL1, x0
+
 mrs x1, GCSCR_EL1
-// CHECK: msr GCSCR_EL1, x0   // encoding: 
[0x00,0x25,0x18,0xd5]
-// CHECK: mrs x1, GCSCR_EL1   // encoding: 
[0x01,0x25,0x38,0xd5]
+// CHECK-INST: mrs x1, GCSCR_EL1
+// CHECK-ENCODING: encoding: [0x01,0x25,0x38,0xd5]
+// CHECK-UNKNOWN:  d5382501 mrs x1, GCSCR_EL1
 
 msr GCSPR_EL1, x2
+// CHECK-INST: msr GCSPR_EL1, x2
+// CHECK-ENCODING: encoding: [0x22,0x25,0x18,0xd5]
+// CHECK-UNKNOWN:  d5182522 msr GCSPR_EL1, x2
+
 mrs x3, GCSPR_EL1
-// CHECK: msr GCSPR_EL1, x2   // encoding: 
[0x22,0x25,0x18,0xd5]
-// CHECK: mrs x3, GCSPR_EL1   // encoding: 
[0x23,0x25,0x38,0xd5]
+// CHECK-INST: mrs x3, GCSPR_EL1
+// CHECK-ENCODING: encoding: [0x23,0x25,0x38,0xd5]
+// CHECK-UNKNOWN:  d5382523 mrs x3, GCSPR_EL1
 
 msr GCSCRE0_EL1, x4
+// CHECK-INST: msr GCSCRE0_EL1, x4
+// CHECK-ENCODING: encoding: [0x44,0x25,0x18,0xd5]
+// CHECK-UNKNOWN:  d5182544 msr GCSCRE0_EL1, x4
+
 mrs x5, GCSCRE0_EL1
-// CHECK: msr GCSCRE0_EL1, x4 // encoding: 
[0x44,0x25,0x18,0xd5]
-// CHECK: mrs x5, GCSCRE0_EL1 // encoding: 
[0x45,0x25,0x38,0xd5]
+// CHECK-INST: mrs x5, GCSCRE0_EL1
+// CHECK-ENCODING: encoding: [0x45,0x25,0x38,0xd5]
+// CHECK-UNKNOWN:  d5382545 mrs x5, GCSCRE0_EL1
 
 msr GCSPR_EL0, x6
+// CHECK-INST: msr GCSPR_EL0, x6
+// CHECK-ENCODING: encoding: [0x26,0x25,0x1b,0xd5]
+// CHECK-UNKNOWN:  d51b2526 msr GCSPR_EL0, x6
+
 mrs x7, GCSPR_EL0
-// CHECK: msr GCSPR_EL0, x6   // encoding: 
[0x26,0x25,0x1b,0xd5]
-// CHECK: mrs x7, GCSPR_EL0   // encoding: 
[0x27,0x25,0x3b,0xd5]
+// CHECK-INST: mrs x7, GCSPR_EL0
+// CHECK-ENCODING: encoding: [0x27,0x25,0x3b,0xd5]
+// CHECK-UNKNOWN:  d53b2527 mrs x7, GCSPR_EL0
 
 msr GCSCR_EL2, x10
+// CHECK-INST: msr GCSCR_EL2, x10
+// CHECK-ENCODING: encoding: [0x0a,0x25,0x1c,0xd5]
+// CHECK-UNKNOWN:  d51c250a msr GCSCR_EL2, x10
+
 mrs x11, GCSCR_EL2
-// CHECK: msr GCSCR_EL2, x10  // encoding: 
[0x0a,0x25,0x1c,0xd5]
-// CHECK: mrs x11, GCSCR_EL2  // encoding: 
[0x0b,0x25,0x3c,0xd5]
+// CHECK-INST: mrs x11, GCSCR_EL2
+// CHECK-ENCODING: encoding: [0x0b,0x25,0x3c,0xd5]
+// CHECK-UNKNOWN:  d53c250b mrs x11, GCSCR_EL2
 
 msr GCSPR_EL2, x12
+// CHECK-INST: msr GCSPR_EL2, x12
+// CHECK-ENCODING: encoding: [0x2c,0x25,0x1c,0xd5]
+// CHECK-UNKNOWN:  d51c252c msr GCSPR_EL2, x12
+
 mrs x13, GCSPR_EL2
-// CHECK: msr GCSPR_EL2, x12  // encoding: 
[0x2c,0x25,0x1c,0xd5]
-// CHECK: mrs x13, GCSPR_EL2  // encoding: 
[0x2d,0x25,0x3c,0xd5]
+// CHECK-INST: mrs x13, GCSPR_EL2
+// CHECK-ENCODING: encoding: [0x2d,0x25,0x3c,0xd5]
+// CHECK-UNKNOWN:  d53c252d mrs x13, GCSPR_EL2
 
 msr GCSCR_EL12, x14
+// CHECK-INST: msr GCSCR_EL12, x14
+// CHECK-ENCODING: encoding: [0x0e,0x25,0x1d,0xd5]
+// CHECK-UNKNOWN:  d51d250e msr GCSCR_EL12, x14
+
 mrs x15, GCSCR_EL12
-// CHECK: msr GCSCR_EL12, x14 // encoding: 
[0x0e,0x25,0x1d,0xd5]
-// CHECK: mrs x15, GCSCR_EL12 // encoding: 
[0x0f,0x25,0x3d,0xd5]
+// CHECK-INST: mrs x15, GCSCR_EL12
+// CHECK-ENCODING: encoding: [0x0f,0x25,0x3d,0xd5]
+// CHECK-UNKNOWN:  d53d250f mrs x15, GCSCR_EL12
 
 msr GCSPR_EL12, x16
+// CHECK-INST: msr GCSPR_EL12, x16
+// CHECK-ENCODING: encoding: [0x30,0x25,0x1d,0xd5]
+// CHECK-UNKNOWN:  d51d2530 msr GCSPR_EL12, x16
+
 mrs x17, GCSPR_EL12
-// CHECK: msr GCSPR_EL12, x16 // encoding: 
[0x30,0x25,0x1d,0xd5]
-// CHECK: mrs x17, GCSPR_EL12 // encoding: 
[0

[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)

2025-06-30 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian commented:

There doesn't seem to be any test case for the new added `__kmpc_parallel_60`. 
If it is orthogonal to the `__kmpc_push_num_threads_strict` change, I'd prefer 
to make it a separate PR and have tests there.

https://github.com/llvm/llvm-project/pull/146405
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)

2025-06-30 Thread Robert Imschweiler via llvm-branch-commits



@@ -45,7 +45,24 @@ using namespace ompx;
 
 namespace {
 
-uint32_t determineNumberOfThreads(int32_t NumThreadsClause) {
+void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity,

ro-i wrote:

sorry, done

https://github.com/llvm/llvm-project/pull/146404
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)

2025-06-30 Thread Shilei Tian via llvm-branch-commits



@@ -45,7 +45,24 @@ using namespace ompx;
 
 namespace {
 
-uint32_t determineNumberOfThreads(int32_t NumThreadsClause) {
+void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity,

shiltian wrote:

Please use LLVM code style for device runtime.

https://github.com/llvm/llvm-project/pull/146404
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)

2025-06-30 Thread Robert Imschweiler via llvm-branch-commits


https://github.com/ro-i updated https://github.com/llvm/llvm-project/pull/146404

>From cf566c60db9eef81c39a45082645c9d44992bec5 Mon Sep 17 00:00:00 2001
From: Robert Imschweiler 
Date: Fri, 27 Jun 2025 07:54:07 -0500
Subject: [PATCH 1/2] [OpenMP][clang] 6.0: num_threads strict (part 2: device
 runtime)

OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary device runtime
changes.
---
 offload/DeviceRTL/include/DeviceTypes.h |  6 ++
 offload/DeviceRTL/src/Parallelism.cpp   | 78 +++--
 openmp/runtime/src/kmp.h|  1 +
 3 files changed, 67 insertions(+), 18 deletions(-)

diff --git a/offload/DeviceRTL/include/DeviceTypes.h 
b/offload/DeviceRTL/include/DeviceTypes.h
index 2e5d92380f040..43a5578f1 100644
--- a/offload/DeviceRTL/include/DeviceTypes.h
+++ b/offload/DeviceRTL/include/DeviceTypes.h
@@ -136,6 +136,12 @@ struct omp_lock_t {
   void *Lock;
 };
 
+// see definition in openmp/runtime kmp.h
+typedef enum omp_severity_t {
+  severity_warning = 1,
+  severity_fatal = 2
+} omp_severity_t;
+
 using InterWarpCopyFnTy = void (*)(void *src, int32_t warp_num);
 using ShuffleReductFnTy = void (*)(void *rhsData, int16_t lane_id,
int16_t lane_offset, int16_t shortCircuit);
diff --git a/offload/DeviceRTL/src/Parallelism.cpp 
b/offload/DeviceRTL/src/Parallelism.cpp
index 08ce616aee1c4..78438a60454b8 100644
--- a/offload/DeviceRTL/src/Parallelism.cpp
+++ b/offload/DeviceRTL/src/Parallelism.cpp
@@ -45,7 +45,24 @@ using namespace ompx;
 
 namespace {
 
-uint32_t determineNumberOfThreads(int32_t NumThreadsClause) {
+void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity,
+  const char *nt_message, int32_t requested,
+  int32_t actual) {
+  if (nt_message)
+printf("%s\n", nt_message);
+  else
+printf("The computed number of threads (%u) does not match the requested "
+   "number of threads (%d). Consider that it might not be supported "
+   "to select exactly %d threads on this target device.\n",
+   actual, requested, requested);
+  if (nt_severity == severity_fatal)
+__builtin_trap();
+}
+
+uint32_t determineNumberOfThreads(int32_t NumThreadsClause,
+  int32_t nt_strict = false,
+  int32_t nt_severity = severity_fatal,
+  const char *nt_message = nullptr) {
   uint32_t NThreadsICV =
   NumThreadsClause != -1 ? NumThreadsClause : icv::NThreads;
   uint32_t NumThreads = mapping::getMaxTeamThreads();
@@ -55,13 +72,17 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) 
{
 
   // SPMD mode allows any number of threads, for generic mode we round down to 
a
   // multiple of WARPSIZE since it is legal to do so in OpenMP.
-  if (mapping::isSPMDMode())
-return NumThreads;
+  if (!mapping::isSPMDMode()) {
+if (NumThreads < mapping::getWarpSize())
+  NumThreads = 1;
+else
+  NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1));
+  }
 
-  if (NumThreads < mapping::getWarpSize())
-NumThreads = 1;
-  else
-NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1));
+  if (NumThreadsClause != -1 && nt_strict &&
+  NumThreads != static_cast(NumThreadsClause))
+num_threads_strict_error(nt_strict, nt_severity, nt_message,
+ NumThreadsClause, NumThreads);
 
   return NumThreads;
 }
@@ -82,12 +103,14 @@ uint32_t determineNumberOfThreads(int32_t 
NumThreadsClause) {
 
 extern "C" {
 
-[[clang::always_inline]] void __kmpc_parallel_spmd(IdentTy *ident,
-   int32_t num_threads,
-   void *fn, void **args,
-   const int64_t nargs) {
+[[clang::always_inline]] void
+__kmpc_parallel_spmd(IdentTy *ident, int32_t num_threads, void *fn, void 
**args,
+ const int64_t nargs, int32_t nt_strict = false,
+ int32_t nt_severity = severity_fatal,
+ const char *nt_message = nullptr) {
   uint32_t TId = mapping::getThreadIdInBlock();
-  uint32_t NumThreads = determineNumberOfThreads(num_threads);
+  uint32_t NumThreads =
+  determineNumberOfThreads(num_threads, nt_strict, nt_severity, 
nt_message);
   uint32_t PTeamSize =
   NumThreads == mapping::getMaxTeamThreads() ? 0 : NumThreads;
   // Avoid the race between the read of the `icv::Level` above and the write
@@ -140,10 +163,11 @@ extern "C" {
   return;
 }
 
-[[clang::always_inline]] void
-__kmpc_parallel_51(IdentTy *ident, int32_t, int32_t if_expr,
-   int32_t num_threads, int proc_bind, void *fn,
-   void *wrapper_fn, void **args, int64_t nargs) {
+[[clang

[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)

2025-06-30 Thread Shilei Tian via llvm-branch-commits


shiltian wrote:

Even after I expanded all folded files, when I search for `__kmpc_parallel_60`, 
my browser only shows three matches. Did I miss anything here?

https://github.com/llvm/llvm-project/pull/146405
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146053

>From 3f62ab3beb30abbf8c8c32dd79c0133f7ca122e0 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 26 Jun 2025 13:08:31 +0200
Subject: [PATCH 1/2] [AMDGPU] Add tests for workgroup/workitem intrinsic
 optimizations

---
 .../AMDGPU/workitems-intrinsics-opts.ll   | 553 ++
 1 file changed, 553 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll

diff --git a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll 
b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
new file mode 100644
index 0..14120680216fc
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
@@ -0,0 +1,553 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s 
--check-prefixes=GFX8,DAGISEL-GFX9
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s 
--check-prefixes=GFX942,DAGISEL-GFX942
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s 
--check-prefixes=GFX12,DAGISEL-GFX12
+
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s 
--check-prefixes=GFX8,GISEL-GFX8
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck 
%s --check-prefixes=GFX942,GISEL-GFX942
+; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck 
%s --check-prefixes=GFX12,GISEL-GFX12
+
+; (workitem_id_x | workitem_id_y | workitem_id_z) == 0
+define i1 @workitem_zero() {
+; DAGISEL-GFX9-LABEL: workitem_zero:
+; DAGISEL-GFX9:   ; %bb.0: ; %entry
+; DAGISEL-GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX9-NEXT:v_or_b32_e32 v1, v31, v1
+; DAGISEL-GFX9-NEXT:v_or_b32_e32 v0, v1, v0
+; DAGISEL-GFX9-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX9-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; DAGISEL-GFX9-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; DAGISEL-GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; DAGISEL-GFX942-LABEL: workitem_zero:
+; DAGISEL-GFX942:   ; %bb.0: ; %entry
+; DAGISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX942-NEXT:v_or3_b32 v0, v31, v1, v0
+; DAGISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; DAGISEL-GFX942-NEXT:s_nop 1
+; DAGISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; DAGISEL-GFX942-NEXT:s_setpc_b64 s[30:31]
+;
+; DAGISEL-GFX12-LABEL: workitem_zero:
+; DAGISEL-GFX12:   ; %bb.0: ; %entry
+; DAGISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_expcnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_samplecnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_bvhcnt 0x0
+; DAGISEL-GFX12-NEXT:s_wait_kmcnt 0x0
+; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v0, 20, v31
+; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v1, 10, v31
+; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
+; DAGISEL-GFX12-NEXT:v_or3_b32 v0, v31, v1, v0
+; DAGISEL-GFX12-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1)
+; DAGISEL-GFX12-NEXT:v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; DAGISEL-GFX12-NEXT:s_wait_alu 0xfffd
+; DAGISEL-GFX12-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc_lo
+; DAGISEL-GFX12-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX8-LABEL: workitem_zero:
+; GISEL-GFX8:   ; %bb.0: ; %entry
+; GISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v31
+; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 10, 10
+; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1
+; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 20, 10
+; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1
+; GISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; GISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; GISEL-GFX8-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX942-LABEL: workitem_zero:
+; GISEL-GFX942:   ; %bb.0: ; %entry
+; GISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v31
+; GISEL-GFX942-NEXT:v_bfe_u32 v1, v31, 10, 10
+; GISEL-GFX942-NEXT:v_bfe_u32 v2, v31, 20, 10
+; GISEL-GFX942-NEXT:v_or3_b32 v0, v0, v1, v2
+; GISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
+; GISEL-GFX942-NEXT:s_nop 1
+; GISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
+; GISEL-GFX942-NEXT:s_setpc_b64 s[30:31]
+;
+; GISEL-GFX12-LABEL: workitem_zero:
+; GISEL-GFX12:   ; %bb.0: ; %entry
+; GISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0
+; GISEL-GFX12-NEXT:s_wait_expcnt 0x0
+; GISEL-GFX12-NEXT:s_wait_samplecnt 0x0
+; GISEL-GFX12-NEXT:s_wait_bvhcnt 0x0
+; GISEL-GFX1

[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146055

>From da05cc2d920917f0cb6f171b0d9e2e535836ca3c Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 27 Jun 2025 12:04:53 +0200
Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together.

Equivalent of the previous DAG patch for GISel.
The shifts are BFXs in GISel, so the canonical form of the entire expression
is different than in the DAG. The mask is not at the root of the expression, it
remains on the leaves instead.

See #136727
---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   2 +
 .../include/llvm/Target/GlobalISel/Combine.td |  11 +-
 .../GlobalISel/CombinerHelperCompares.cpp |  89 +
 .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++
 5 files changed, 483 insertions(+), 139 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index c15263e0b06f8..5ec82c30f268f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -641,6 +641,8 @@ class CombinerHelper {
   /// KnownBits information.
   bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  bool combineMergedBFXCompare(MachineInstr &MI) const;
+
   /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2)
   bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 4a92dc16c1bf4..cba46a5edf9ec 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule<
  (G_ICMP $root, $p, $ordst, 0))
 >;
 
+// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0)
+// into a compare of a extract/mask of X
+def icmp_merged_bfx_combine: GICombineRule<
+  (defs root:$root),
+  (combine (G_ICMP $dst, $p, $src, 0):$root,
+   [{ return Helper.combineMergedBFXCompare(*${root}); }])
+>;
+
 def and_or_disjoint_mask : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$info),
   (match (wip_match_opcode G_AND):$root,
@@ -2052,7 +2060,8 @@ def all_combines : 
GICombineGroup<[integer_reassoc_combines, trivial_combines,
 fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
 simplify_neg_minmax, combine_concat_vector,
 sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines,
-combine_use_vector_truncate, merge_combines, overflow_combines]>;
+combine_use_vector_truncate, merge_combines, overflow_combines,
+icmp_merged_bfx_combine]>;
 
 // A combine group used to for prelegalizer combiners at -O0. The combines in
 // this group have been selected based on experiments to balance code size and
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
index fc40533cf3dc9..e1d43f37bac13 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const 
MachineInstr &MI,
 
   return false;
 }
+
+bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const {
+  const GICmp *Cmp = cast(&MI);
+
+  ICmpInst::Predicate CC = Cmp->getCond();
+  if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE)
+return false;
+
+  Register CmpLHS = Cmp->getLHSReg();
+  Register CmpRHS = Cmp->getRHSReg();
+
+  LLT OpTy = MRI.getType(CmpLHS);
+  if (!OpTy.isScalar() || OpTy.isPointer())
+return false;
+
+  assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false));
+
+  Register Src;
+  const auto IsSrc = [&](Register R) {
+if (!Src) {
+  Src = R;
+  return true;
+}
+
+return Src == R;
+  };
+
+  MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS);
+  if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR)
+return false;
+
+  APInt PartsMask(OpTy.getSizeInBits(), 0);
+  SmallVector Worklist = {CmpLHSDef};
+  while (!Worklist.empty()) {
+MachineInstr *Cur = Worklist.pop_back_val();
+
+Register Dst = Cur->getOperand(0).getReg();
+if (!MRI.hasOneUse(Dst) && Dst != Src)
+  return false;
+
+if (Cur->getOpcode() == TargetOpcode::G_OR) {
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg()));
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg()));
+  continue;
+}
+
+if (Cur->getOpcode() == TargetOpcode::G_UBFX) {
+  Register Op = Cur->getOperand(1).getReg();
+  Register Width = Cur->getOperand(2).getReg();
+  Register Off = Cur->getOperand(3).getReg();
+
+  auto WidthCst = getIConstantVRegVal(Width, MRI);
+  auto

[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146055

>From da05cc2d920917f0cb6f171b0d9e2e535836ca3c Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 27 Jun 2025 12:04:53 +0200
Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together.

Equivalent of the previous DAG patch for GISel.
The shifts are BFXs in GISel, so the canonical form of the entire expression
is different than in the DAG. The mask is not at the root of the expression, it
remains on the leaves instead.

See #136727
---
 .../llvm/CodeGen/GlobalISel/CombinerHelper.h  |   2 +
 .../include/llvm/Target/GlobalISel/Combine.td |  11 +-
 .../GlobalISel/CombinerHelperCompares.cpp |  89 +
 .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++
 5 files changed, 483 insertions(+), 139 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index c15263e0b06f8..5ec82c30f268f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -641,6 +641,8 @@ class CombinerHelper {
   /// KnownBits information.
   bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
+  bool combineMergedBFXCompare(MachineInstr &MI) const;
+
   /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2)
   bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const;
 
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 4a92dc16c1bf4..cba46a5edf9ec 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule<
  (G_ICMP $root, $p, $ordst, 0))
 >;
 
+// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0)
+// into a compare of a extract/mask of X
+def icmp_merged_bfx_combine: GICombineRule<
+  (defs root:$root),
+  (combine (G_ICMP $dst, $p, $src, 0):$root,
+   [{ return Helper.combineMergedBFXCompare(*${root}); }])
+>;
+
 def and_or_disjoint_mask : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$info),
   (match (wip_match_opcode G_AND):$root,
@@ -2052,7 +2060,8 @@ def all_combines : 
GICombineGroup<[integer_reassoc_combines, trivial_combines,
 fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
 simplify_neg_minmax, combine_concat_vector,
 sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines,
-combine_use_vector_truncate, merge_combines, overflow_combines]>;
+combine_use_vector_truncate, merge_combines, overflow_combines,
+icmp_merged_bfx_combine]>;
 
 // A combine group used to for prelegalizer combiners at -O0. The combines in
 // this group have been selected based on experiments to balance code size and
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp 
b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
index fc40533cf3dc9..e1d43f37bac13 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp
@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const 
MachineInstr &MI,
 
   return false;
 }
+
+bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const {
+  const GICmp *Cmp = cast(&MI);
+
+  ICmpInst::Predicate CC = Cmp->getCond();
+  if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE)
+return false;
+
+  Register CmpLHS = Cmp->getLHSReg();
+  Register CmpRHS = Cmp->getRHSReg();
+
+  LLT OpTy = MRI.getType(CmpLHS);
+  if (!OpTy.isScalar() || OpTy.isPointer())
+return false;
+
+  assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false));
+
+  Register Src;
+  const auto IsSrc = [&](Register R) {
+if (!Src) {
+  Src = R;
+  return true;
+}
+
+return Src == R;
+  };
+
+  MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS);
+  if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR)
+return false;
+
+  APInt PartsMask(OpTy.getSizeInBits(), 0);
+  SmallVector Worklist = {CmpLHSDef};
+  while (!Worklist.empty()) {
+MachineInstr *Cur = Worklist.pop_back_val();
+
+Register Dst = Cur->getOperand(0).getReg();
+if (!MRI.hasOneUse(Dst) && Dst != Src)
+  return false;
+
+if (Cur->getOpcode() == TargetOpcode::G_OR) {
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg()));
+  Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg()));
+  continue;
+}
+
+if (Cur->getOpcode() == TargetOpcode::G_UBFX) {
+  Register Op = Cur->getOperand(1).getReg();
+  Register Width = Cur->getOperand(2).getReg();
+  Register Off = Cur->getOperand(3).getReg();
+
+  auto WidthCst = getIConstantVRegVal(Width, MRI);
+  auto

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/146054

>From 17ac90ad1ee167f35321e01625a207f2b94ff523 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 26 Jun 2025 13:31:37 +0200
Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences

Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or not.

It seems like a strange sequence at first but it's an idiom used by device
libs in device libs to check workitem IDs for AMDGPU.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727
---
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++-
 .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++--
 2 files changed, 91 insertions(+), 29 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 08dab7c697b99..a189208d3a62e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();
+
+  if (Root.getOpcode() != ISD::AND)
+return SDValue();
+
+  SDValue N0 = Root.getOperand(0);
+  SDValue N1 = Root.getOperand(1);
+
+  if (N0.getOpcode() != ISD::OR || !isa(N1))
+return SDValue();
+
+  APInt RootMask = cast(N1)->getAsAPIntVal();
+  if (!RootMask.isMask())
+return SDValue();
+
+  SDValue Src;
+  const auto IsSrc = [&](SDValue V) {
+if (!Src) {
+  Src = V;
+  return true;
+}
+
+return Src == V;
+  };
+
+  SmallVector Worklist = {N0};
+  APInt PartsMask(VT.getSizeInBits(), 0);
+  while (!Worklist.empty()) {
+SDValue V = Worklist.pop_back_val();
+if (!V.hasOneUse() && Src != V)
+  return SDValue();
+
+if (V.getOpcode() == ISD::OR) {
+  Worklist.push_back(V.getOperand(0));
+  Worklist.push_back(V.getOperand(1));
+  continue;
+}
+
+if (V.getOpcode() == ISD::SRL) {
+  SDValue ShiftSrc = V.getOperand(0);
+  SDValue ShiftAmt = V.getOperand(1);
+
+  if (!IsSrc(ShiftSrc) || !isa(ShiftAmt))
+return SDValue();
+
+  PartsMask |= (RootMask << 
cast(ShiftAmt)->getAsZExtVal());
+  continue;
+}
+
+if (IsSrc(V)) {
+  PartsMask |= RootMask;
+  continue;
+}
+
+return SDValue();
+  }
+
+  if (!RootMask.isMask() || !Src)
+return SDValue();
+
+  SDLoc DL(Root);
+  return DAG.getNode(ISD::AND, DL, VT,
+ {Src, DAG.getConstant(PartsMask, DL, VT)});
+}
+
 /// This is a stub for TargetLowering::SimplifySetCC.
 SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
ISD::CondCode Cond, const SDLoc &DL,
bool foldBooleans) {
   TargetLowering::DAGCombinerInfo
 DagCombineInfo(DAG, Level, false, this);
-  return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);
+  if (SDValue C =
+  TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, 
DL))
+return C;
+
+  if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) &&
+  N0.getOpcode() == ISD::AND && isNullConstant(N1)) {
+
+if (SDValue Res = matchMergedBFX(N0, DAG, TLI))
+  return DAG.getSetCC(DL, VT, Res, N1, Cond);
+  }
+
+  return SDValue();
 }
 
 /// Given an ISD::SDIV node expressing a divide by constant, return
diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll 
b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
index 07c4aeb1ac7df..64d055bc40e98 100644
--- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
+++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll
@@ -12,11 +12,7 @@ define i1 @workitem_zero() {
 ; DAGISEL-GFX8-LABEL: workitem_zero:
 ; DAGISEL-GFX8:   ; %bb.0: ; %entry
 ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1
-; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0
-; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31
 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc
 ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31]
@@ -

[llvm-branch-commits] [mlir] [mlir][tblgen] Fix test definition names to reflect expected valid results (NFC) (PR #146243)

2025-06-30 Thread Markus Böck via llvm-branch-commits


https://github.com/zero9178 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/146243
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits



@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();
+
+  if (Root.getOpcode() != ISD::AND)
+return SDValue();
+
+  SDValue N0 = Root.getOperand(0);
+  SDValue N1 = Root.getOperand(1);
+
+  if (N0.getOpcode() != ISD::OR || !isa(N1))
+return SDValue();

Pierre-vh wrote:

I don't think so, except maybe for the shift part but even then it doesn't make 
the code much shorter. I don't check for a tree of node, just one node at a time

https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits



@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc 
&DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+  const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before 
comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();

Pierre-vh wrote:

I'll update it. Should I bother supporting vector types here? I think nothing's 
stopping it except testing coverage. On AMDGPU we scalarize the vector compares

https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mc

Author: Jonathan Thackray (jthackray)


Changes

This is a series of patches (3/4) to unify assembly/disassembly of recent 
AArch64 tests into a single file. The aim is to improve consistency, so that 
all instructions and system registers are thoroughly tested, and future test 
cases will be in a unified format.

This patch:
 * removes .txt tests which have multiple feature dependancies
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed

---

Patch is 461.48 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146330.diff


29 Files Affected:

- (modified) llvm/test/MC/AArch64/armv8.6a-fgt.s (+105-47) 
- (added) llvm/test/MC/AArch64/armv8.8a-mops-diagnostics.s (+227) 
- (modified) llvm/test/MC/AArch64/armv8.8a-mops.s (+569-488) 
- (modified) llvm/test/MC/AArch64/armv8.9a-clrbhb.s (+29-16) 
- (modified) llvm/test/MC/AArch64/armv8.9a-debug-pmu.s (+1560-467) 
- (modified) llvm/test/MC/AArch64/armv8.9a-lrcpc3.s (+237-138) 
- (modified) llvm/test/MC/AArch64/armv8.9a-specres2.s (+27-8) 
- (added) llvm/test/MC/AArch64/armv8.9a-the-diagnostics.s (+103) 
- (modified) llvm/test/MC/AArch64/armv8.9a-the.s (+677-572) 
- (added) llvm/test/MC/AArch64/armv9-mrrs-diagnostics.s (+30) 
- (modified) llvm/test/MC/AArch64/armv9-mrrs.s (+235-92) 
- (added) llvm/test/MC/AArch64/armv9-msrr-diagnostics.s (+30) 
- (modified) llvm/test/MC/AArch64/armv9-msrr.s (+125-95) 
- (added) llvm/test/MC/AArch64/armv9-sysp-diagnostics.s (+35) 
- (removed) llvm/test/MC/AArch64/armv9-sysp.s (-538) 
- (modified) llvm/test/MC/AArch64/armv9.4a-chk.s (+26-9) 
- (modified) llvm/test/MC/AArch64/armv9.5a-tlbiw.s (+38-15) 
- (added) llvm/test/MC/AArch64/armv9a-sysp.s (+834) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.6a-fgt.txt (-75) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.8a-mops.txt (-434) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-clrbhb.txt (-16) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-debug-pmu.txt (-730) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-lrcpc3.txt (-113) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-specres2.txt (-16) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-the.txt (-482) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysp.txt (-562) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysreg128.txt (-147) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-chk.txt (-8) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.5a-tlbiw.txt (-27) 


``diff
diff --git a/llvm/test/MC/AArch64/armv8.6a-fgt.s 
b/llvm/test/MC/AArch64/armv8.6a-fgt.s
index 11002aca5e1a0..4b825ea191a68 100644
--- a/llvm/test/MC/AArch64/armv8.6a-fgt.s
+++ b/llvm/test/MC/AArch64/armv8.6a-fgt.s
@@ -1,75 +1,133 @@
-// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+fgt   < %s | 
FileCheck %s
-// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+v8.6a < %s | 
FileCheck %s
-// RUN: not llvm-mc -triple aarch64 -show-encoding  < %s 2>&1 | 
FileCheck %s --check-prefix=NOFGT
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+v8.6a < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \
+// RUN:| llvm-objdump -d --mattr=+fgt - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \
+// RUN:   | llvm-objdump -d --mattr=-fgt - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+fgt -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
 
 msr HFGRTR_EL2, x0
+// CHECK-INST: msr HFGRTR_EL2, x0
+// CHECK-ENCODING: encoding: [0x80,0x11,0x1c,0xd5]
+// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or 
pstate
+// CHECK-UNKNOWN:  d51c1180  msr S3_4_C1_C1_4, x0
 msr HFGWTR_EL2, x5
+// CHECK-INST: msr HFGWTR_EL2, x5
+// CHECK-ENCODING: encoding: [0xa5,0x11,0x1c,0xd5]
+// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or 
pstate
+// CHECK-UNKNOWN:  d51c11a5  msr S3_4_C1_C1_5, x5
 msr HFGITR_EL2, x10
+// CHECK-INST: msr HFGITR_EL2, x10
+// CHECK-ENCODING: encoding: [0xca,0x11,0x1c,0xd5]
+// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or 
pstate
+// CHECK-UNKNOWN:  d51c11ca  msr S3_4_C1_C1_6, x10
 msr HDFGRTR_EL2, x15
+/

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-aarch64

Author: Jonathan Thackray (jthackray)


Changes

This is a series of patches (4/4) to unify assembly/disassembly of recent 
AArch64 tests into a single file. The aim is to improve consistency, so that 
all instructions and system registers are thoroughly tested, and future test 
cases will be in a unified format.

This patch:
 * removes .txt tests whose .s tests have functions
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly

---

Patch is 67.28 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146331.diff


8 Files Affected:

- (modified) llvm/test/MC/AArch64/armv9.6a-lsui.s (+708-365) 
- (modified) llvm/test/MC/AArch64/armv9.6a-occmo.s (+38-16) 
- (modified) llvm/test/MC/AArch64/armv9.6a-pcdphint.s (+25-12) 
- (modified) llvm/test/MC/AArch64/armv9.6a-rme-gpc3.s (+34-12) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-lsui.txt (-323) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-occmo.txt (-11) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-pcdphint.txt (-8) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-rme-gpc3.txt (-18) 


``diff
diff --git a/llvm/test/MC/AArch64/armv9.6a-lsui.s 
b/llvm/test/MC/AArch64/armv9.6a-lsui.s
index d4a5e1f980560..264a869b6d286 100644
--- a/llvm/test/MC/AArch64/armv9.6a-lsui.s
+++ b/llvm/test/MC/AArch64/armv9.6a-lsui.s
@@ -1,408 +1,751 @@
-// RUN: llvm-mc -triple aarch64 -mattr=+lsui -show-encoding %s  | FileCheck %s
-// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>&1  | FileCheck %s 
--check-prefix=ERROR
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \
+// RUN:  | llvm-objdump -d --mattr=+lsui --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \
+// RUN:   | llvm-objdump -d --mattr=-lsui --no-print-imm-hex - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+lsui -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
 
-_func:
-// CHECK: _func:
 
//--
 // Unprivileged load/store operations
 
//--
-  ldtxr   x9, [sp]
-// CHECK: ldtxrx9, [sp]// encoding: 
[0xe9,0x7f,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x9, [sp, #0]
-// CHECK: ldtxrx9, [sp]// encoding: 
[0xe9,0x7f,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x10, [x11]
-// CHECK: ldtxrx10, [x11]  // encoding: 
[0x6a,0x7d,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldtxr   x10, [x11, #0]
-// CHECK: ldtxrx10, [x11]  // encoding: 
[0x6a,0x7d,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  ldatxr  x9, [sp]
-// CHECK: ldatxr   x9, [sp]// encoding: 
[0xe9,0xff,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-  ldatxr  x10, [x11]
-// CHECK: ldatxr   x10, [x11]  // encoding: 
[0x6a,0xfd,0x5f,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  sttxr   wzr, w4, [sp]
-// CHECK: sttxrwzr, w4, [sp]   // encoding: 
[0xe4,0x7f,0x1f,0x89]
-// ERROR: error: instruction requires: lsui
-  sttxr   wzr, w4, [sp, #0]
-// CHECK: sttxrwzr, w4, [sp]   // encoding: 
[0xe4,0x7f,0x1f,0x89]
-// ERROR: error: instruction requires: lsui
-  sttxr   w5, x6, [x7]
-// CHECK: sttxrw5, x6, [x7]// encoding: 
[0xe6,0x7c,0x05,0xc9]
-// ERROR: error: instruction requires: lsui
-  sttxr   w5, x6, [x7, #0]
-// CHECK: sttxrw5, x6, [x7]// encoding: 
[0xe6,0x7c,0x05,0xc9]
-// ERROR: error: instruction requires: lsui
-
-  stltxr  w2, w4, [sp]
-// CHECK: stltxr   w2, w4, [sp]// encoding: 
[0xe4,0xff,0x02,0x89]
-// ERROR: error: instruction requires: lsui
-  stltxr  w5, x6, [x7]
-// CHECK: stltxr   w5, x6, [x7]// encoding: 
[0xe6,0xfc,0x05,0xc9]
-// ERROR: error: instruction requires: lsui
+ldtxr x9, [sp]
+// CHECK-INST: ldtxr x9, [sp]
+// CHECK-ENCODING: encoding: [0xe9,0x7f,0x5f,0xc9]
+// CHECK-ERROR: error: ins

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Jonathan Thackray (jthackray)


Changes

This is a series of patches (3/4) to unify assembly/disassembly of recent 
AArch64 tests into a single file. The aim is to improve consistency, so that 
all instructions and system registers are thoroughly tested, and future test 
cases will be in a unified format.

This patch:
 * removes .txt tests which have multiple feature dependancies
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed

---

Patch is 461.48 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146330.diff


29 Files Affected:

- (modified) llvm/test/MC/AArch64/armv8.6a-fgt.s (+105-47) 
- (added) llvm/test/MC/AArch64/armv8.8a-mops-diagnostics.s (+227) 
- (modified) llvm/test/MC/AArch64/armv8.8a-mops.s (+569-488) 
- (modified) llvm/test/MC/AArch64/armv8.9a-clrbhb.s (+29-16) 
- (modified) llvm/test/MC/AArch64/armv8.9a-debug-pmu.s (+1560-467) 
- (modified) llvm/test/MC/AArch64/armv8.9a-lrcpc3.s (+237-138) 
- (modified) llvm/test/MC/AArch64/armv8.9a-specres2.s (+27-8) 
- (added) llvm/test/MC/AArch64/armv8.9a-the-diagnostics.s (+103) 
- (modified) llvm/test/MC/AArch64/armv8.9a-the.s (+677-572) 
- (added) llvm/test/MC/AArch64/armv9-mrrs-diagnostics.s (+30) 
- (modified) llvm/test/MC/AArch64/armv9-mrrs.s (+235-92) 
- (added) llvm/test/MC/AArch64/armv9-msrr-diagnostics.s (+30) 
- (modified) llvm/test/MC/AArch64/armv9-msrr.s (+125-95) 
- (added) llvm/test/MC/AArch64/armv9-sysp-diagnostics.s (+35) 
- (removed) llvm/test/MC/AArch64/armv9-sysp.s (-538) 
- (modified) llvm/test/MC/AArch64/armv9.4a-chk.s (+26-9) 
- (modified) llvm/test/MC/AArch64/armv9.5a-tlbiw.s (+38-15) 
- (added) llvm/test/MC/AArch64/armv9a-sysp.s (+834) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.6a-fgt.txt (-75) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.8a-mops.txt (-434) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-clrbhb.txt (-16) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-debug-pmu.txt (-730) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-lrcpc3.txt (-113) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-specres2.txt (-16) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-the.txt (-482) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysp.txt (-562) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysreg128.txt (-147) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-chk.txt (-8) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.5a-tlbiw.txt (-27) 


``diff
diff --git a/llvm/test/MC/AArch64/armv8.6a-fgt.s 
b/llvm/test/MC/AArch64/armv8.6a-fgt.s
index 11002aca5e1a0..4b825ea191a68 100644
--- a/llvm/test/MC/AArch64/armv8.6a-fgt.s
+++ b/llvm/test/MC/AArch64/armv8.6a-fgt.s
@@ -1,75 +1,133 @@
-// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+fgt   < %s | 
FileCheck %s
-// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+v8.6a < %s | 
FileCheck %s
-// RUN: not llvm-mc -triple aarch64 -show-encoding  < %s 2>&1 | 
FileCheck %s --check-prefix=NOFGT
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+v8.6a < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \
+// RUN:| llvm-objdump -d --mattr=+fgt - | FileCheck %s 
--check-prefix=CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \
+// RUN:   | llvm-objdump -d --mattr=-fgt - | FileCheck %s 
--check-prefix=CHECK-UNKNOWN
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \
+// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN:| llvm-mc -triple=aarch64 -mattr=+fgt -disassemble 
-show-encoding \
+// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+
 
 msr HFGRTR_EL2, x0
+// CHECK-INST: msr HFGRTR_EL2, x0
+// CHECK-ENCODING: encoding: [0x80,0x11,0x1c,0xd5]
+// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or 
pstate
+// CHECK-UNKNOWN:  d51c1180  msr S3_4_C1_C1_4, x0
 msr HFGWTR_EL2, x5
+// CHECK-INST: msr HFGWTR_EL2, x5
+// CHECK-ENCODING: encoding: [0xa5,0x11,0x1c,0xd5]
+// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or 
pstate
+// CHECK-UNKNOWN:  d51c11a5  msr S3_4_C1_C1_5, x5
 msr HFGITR_EL2, x10
+// CHECK-INST: msr HFGITR_EL2, x10
+// CHECK-ENCODING: encoding: [0xca,0x11,0x1c,0xd5]
+// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or 
pstate
+// CHECK-UNKNOWN:  d51c11ca  msr S3_4_C1_C1_6, x10
 msr HDFGRT

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray created 
https://github.com/llvm/llvm-project/pull/146329

This is a series of patches (2/4) to unify assembly/disassembly of recent 
AArch64 tests into a single file. The aim is to improve consistency, so that 
all instructions and system registers are thoroughly tested, and future test 
cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests

>From be8bcdead883ec9bac8bebf6b3382974fc988c28 Mon Sep 17 00:00:00 2001
From: Jonathan Thackray 
Date: Wed, 25 Jun 2025 21:22:43 +0100
Subject: [PATCH] [AArch64][llvm] Unify AArch64 tests into a single file (2/4)
 (NFC)

This is a series of patches (2/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests

Co-authored-by: Virginia Cangelosi 
---
 llvm/test/MC/AArch64/armv9.2a-mec.s   | 172 ++-
 llvm/test/MC/AArch64/armv9.4-lse128.s |  98 -
 llvm/test/MC/AArch64/armv9.4a-gcs.s   | 198 +-
 .../MC/AArch64/armv9.4a-lse128-diagnostics.s  |  17 ++
 llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 
 llvm/test/MC/AArch64/armv9.5a-cpa.s   |  89 +---
 .../MC/AArch64/armv9.6a-mpam-diagnostics.s|   5 +
 llvm/test/MC/AArch64/armv9.6a-mpam.s  |  80 +--
 .../MC/Disassembler/AArch64/armv9.4a-gcs.txt  |  90 
 .../Disassembler/AArch64/armv9.4a-lse128.txt  |  98 -
 .../MC/Disassembler/AArch64/armv9.5a-cpa.txt  |  42 
 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt |  50 -
 .../MC/Disassembler/AArch64/armv9a-mec.txt|  54 -
 13 files changed, 541 insertions(+), 590 deletions(-)
 delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s
 create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s
 create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt
 delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt

diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s 
b/llvm/test/MC/AArch64/armv9.2a-mec.s
index 42e4bf732086e..c747886f7ec3b 100644
--- a/llvm/test/MC/AArch64/armv9.2a-mec.s
+++ b/llvm/test/MC/AArch64/armv9.2a-mec.s
@@ -1,55 +1,117 @@
-// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s 
| FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck 
--check-prefix=CHECK-NO-MEC %s
-
-  mrs x0, MECIDR_EL2
-// CHECK: mrs   x0, MECIDR_EL2   // encoding: [0xe0,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P0_EL2
-// CHECK: mrs   x0, MECID_P0_EL2  // encoding: [0x00,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A0_EL2
-// CHECK: mrs   x0, MECID_A0_EL2  // encoding: [0x20,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P1_EL2
-// CHECK: mrs   x0, MECID_P1_EL2  // encoding: [0x40,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A1_EL2
-// CHECK: mrs   x0, MECID_A1_EL2  // encoding: [0x60,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_P_EL2
-// CHECK: mrs   x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_A_EL2
-// CHECK: mrs   x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_RL_A_EL3
-// CHECK: mrs   x0, MECID_RL_A_EL3   // encoding: [0x20,0xaa,0x3e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  msr MECID_P0_EL2,x0
-// CHECK: msr   MECID_P0_EL2, x0  // encoding: [0x00,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A0_EL2,x0
-// CHECK: msr   MECID_A0_EL2, x0  // encoding: [0x20,0xa8,0x1

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Jonathan Thackray (jthackray)


Changes

This is a series of patches (2/4) to unify assembly/disassembly of recent 
AArch64 tests into a single file. The aim is to improve consistency, so that 
all instructions and system registers are thoroughly tested, and future test 
cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
 * makes the .s tests have a roundabout run line to test both encoding and 
assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests

---

Patch is 52.94 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/146329.diff


13 Files Affected:

- (modified) llvm/test/MC/AArch64/armv9.2a-mec.s (+117-55) 
- (removed) llvm/test/MC/AArch64/armv9.4-lse128.s (-98) 
- (modified) llvm/test/MC/AArch64/armv9.4a-gcs.s (+143-55) 
- (added) llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s (+17) 
- (added) llvm/test/MC/AArch64/armv9.4a-lse128.s (+138) 
- (modified) llvm/test/MC/AArch64/armv9.5a-cpa.s (+63-26) 
- (added) llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s (+5) 
- (modified) llvm/test/MC/AArch64/armv9.6a-mpam.s (+58-22) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt (-90) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt (-98) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt (-42) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt (-50) 
- (removed) llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt (-54) 


``diff
diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s 
b/llvm/test/MC/AArch64/armv9.2a-mec.s
index 42e4bf732086e..c747886f7ec3b 100644
--- a/llvm/test/MC/AArch64/armv9.2a-mec.s
+++ b/llvm/test/MC/AArch64/armv9.2a-mec.s
@@ -1,55 +1,117 @@
-// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s 
| FileCheck %s
-// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck 
--check-prefix=CHECK-NO-MEC %s
-
-  mrs x0, MECIDR_EL2
-// CHECK: mrs   x0, MECIDR_EL2   // encoding: [0xe0,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P0_EL2
-// CHECK: mrs   x0, MECID_P0_EL2  // encoding: [0x00,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A0_EL2
-// CHECK: mrs   x0, MECID_A0_EL2  // encoding: [0x20,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_P1_EL2
-// CHECK: mrs   x0, MECID_P1_EL2  // encoding: [0x40,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_A1_EL2
-// CHECK: mrs   x0, MECID_A1_EL2  // encoding: [0x60,0xa8,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_P_EL2
-// CHECK: mrs   x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, VMECID_A_EL2
-// CHECK: mrs   x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  mrs x0, MECID_RL_A_EL3
-// CHECK: mrs   x0, MECID_RL_A_EL3   // encoding: [0x20,0xaa,0x3e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register
-  msr MECID_P0_EL2,x0
-// CHECK: msr   MECID_P0_EL2, x0  // encoding: [0x00,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A0_EL2,x0
-// CHECK: msr   MECID_A0_EL2, x0  // encoding: [0x20,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_P1_EL2,x0
-// CHECK: msr   MECID_P1_EL2, x0  // encoding: [0x40,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_A1_EL2,x0
-// CHECK: msr   MECID_A1_EL2, x0  // encoding: [0x60,0xa8,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr VMECID_P_EL2,   x0
-// CHECK: msr   VMECID_P_EL2, x0 // encoding: [0x00,0xa9,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr VMECID_A_EL2,   x0
-// CHECK: msr   VMECID_A_EL2, x0 // encoding: [0x20,0xa9,0x1c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-  msr MECID_RL_A_EL3, x0
-// CHECK: msr   MECID_RL_A_EL3, x0   // encoding: [0x20,0xaa,0x1e,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or 
pstate
-
-  dc cigdpae, x0
-// CHECK: dc cigdpae, x0 // encoding: [0xe0,0x7e,0x0c,0xd5]
-// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIGDPA

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-06-30 Thread Pierre van Houtryve via llvm-branch-commits


Pierre-vh wrote:

ping

https://github.com/llvm/llvm-project/pull/141589
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray edited 
https://github.com/llvm/llvm-project/pull/146329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray edited 
https://github.com/llvm/llvm-project/pull/146331
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)

2025-06-30 Thread Jonathan Thackray via llvm-branch-commits


https://github.com/jthackray edited 
https://github.com/llvm/llvm-project/pull/146330
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

79 matches

Mail list logo