[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)
https://github.com/jthackray created https://github.com/llvm/llvm-project/pull/146331 This is a series of patches (4/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests whose .s tests have functions * makes the .s tests have a roundabout run line to test both encoding and assembly >From 8c9eccdc95e465fdbfe833080afb1ad1099c224c Mon Sep 17 00:00:00 2001 From: Jonathan Thackray Date: Fri, 27 Jun 2025 20:16:06 +0100 Subject: [PATCH] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) This is a series of patches (4/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests whose .s tests have functions * makes the .s tests have a roundabout run line to test both encoding and assembly Co-authored-by: Virginia Cangelosi --- llvm/test/MC/AArch64/armv9.6a-lsui.s | 1073 +++-- llvm/test/MC/AArch64/armv9.6a-occmo.s | 54 +- llvm/test/MC/AArch64/armv9.6a-pcdphint.s | 37 +- llvm/test/MC/AArch64/armv9.6a-rme-gpc3.s | 46 +- .../MC/Disassembler/AArch64/armv9.6a-lsui.txt | 323 - .../Disassembler/AArch64/armv9.6a-occmo.txt | 11 - .../AArch64/armv9.6a-pcdphint.txt |8 - .../AArch64/armv9.6a-rme-gpc3.txt | 18 - 8 files changed, 805 insertions(+), 765 deletions(-) delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-lsui.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-occmo.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-pcdphint.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-rme-gpc3.txt diff --git a/llvm/test/MC/AArch64/armv9.6a-lsui.s b/llvm/test/MC/AArch64/armv9.6a-lsui.s index d4a5e1f980560..264a869b6d286 100644 --- a/llvm/test/MC/AArch64/armv9.6a-lsui.s +++ b/llvm/test/MC/AArch64/armv9.6a-lsui.s @@ -1,408 +1,751 @@ -// RUN: llvm-mc -triple aarch64 -mattr=+lsui -show-encoding %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>&1 | FileCheck %s --check-prefix=ERROR +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \ +// RUN: | llvm-objdump -d --mattr=+lsui --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \ +// RUN: | llvm-objdump -d --mattr=-lsui --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+lsui -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + -_func: -// CHECK: _func: //-- // Unprivileged load/store operations //-- - ldtxr x9, [sp] -// CHECK: ldtxrx9, [sp]// encoding: [0xe9,0x7f,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x9, [sp, #0] -// CHECK: ldtxrx9, [sp]// encoding: [0xe9,0x7f,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x10, [x11] -// CHECK: ldtxrx10, [x11] // encoding: [0x6a,0x7d,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x10, [x11, #0] -// CHECK: ldtxrx10, [x11] // encoding: [0x6a,0x7d,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - - ldatxr x9, [sp] -// CHECK: ldatxr x9, [sp]// encoding: [0xe9,0xff,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldatxr x10, [x11] -// CHECK: ldatxr x10, [x11] // encoding: [0x6a,0xfd,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - - sttxr wzr, w4, [sp] -// CHECK: sttxrwzr, w4, [sp] // encoding: [0xe4,0x7f,0x1f,0x89] -// ERROR: error: instruction requires: lsui - sttxr wzr, w4, [sp, #0] -// CHECK: sttxrwzr, w4, [sp] // encoding: [0xe4,0x7f,0x1f,0x89] -// ERROR: error: instruction requires: lsui - sttxr w5, x6, [x7] -// CHECK: stt
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
https://github.com/jthackray edited https://github.com/llvm/llvm-project/pull/146329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)
@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const MachineInstr &MI, return false; } + +bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const { + const GICmp *Cmp = cast(&MI); + + ICmpInst::Predicate CC = Cmp->getCond(); + if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE) jayfoad wrote: Use `CmpInst::isEquality` https://github.com/llvm/llvm-project/pull/146055 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); jayfoad wrote: > Should I bother supporting vector types here? I don't have a strong opinion on that. You could leave a TODO? https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
@@ -28909,13 +28909,99 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); + + if (!VT.isScalarInteger() || Root.getOpcode() != ISD::AND) +return SDValue(); + + SDValue N0 = Root.getOperand(0); + SDValue N1 = Root.getOperand(1); + + if (N0.getOpcode() != ISD::OR || !isa(N1)) +return SDValue(); + + APInt RootMask = cast(N1)->getAsAPIntVal(); + + SDValue Src; + const auto IsSrc = [&](SDValue V) { +if (!Src) { + Src = V; + return true; +} + +return Src == V; + }; + + SmallVector Worklist = {N0}; + APInt PartsMask(VT.getSizeInBits(), 0); + while (!Worklist.empty()) { +SDValue V = Worklist.pop_back_val(); +if (!V.hasOneUse() && (Src && Src != V)) + return SDValue(); + +if (V.getOpcode() == ISD::OR) { + Worklist.push_back(V.getOperand(0)); + Worklist.push_back(V.getOperand(1)); + continue; +} + +if (V.getOpcode() == ISD::SRL) { + SDValue ShiftSrc = V.getOperand(0); + SDValue ShiftAmt = V.getOperand(1); + + if (!IsSrc(ShiftSrc) || !isa(ShiftAmt)) +return SDValue(); + + auto ShiftAmtVal = cast(ShiftAmt)->getAsZExtVal(); + if (ShiftAmtVal > RootMask.getBitWidth()) +return SDValue(); + + PartsMask |= (RootMask << ShiftAmtVal); + continue; +} + +if (IsSrc(V)) { + PartsMask |= RootMask; + continue; +} + +return SDValue(); + } + + if (!Src) +return SDValue(); + + SDLoc DL(Root); + return DAG.getNode(ISD::AND, DL, VT, + {Src, DAG.getConstant(PartsMask, DL, VT)}); +} + /// This is a stub for TargetLowering::SimplifySetCC. SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond, const SDLoc &DL, bool foldBooleans) { TargetLowering::DAGCombinerInfo DagCombineInfo(DAG, Level, false, this); - return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL); + if (SDValue C = + TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL)) +return C; + + if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) && jayfoad wrote: Use `isIntEqualitySetCC` https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146053 >From f137136b2f527aaf1b2f2847e821085aabfc299e Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 26 Jun 2025 13:08:31 +0200 Subject: [PATCH 1/2] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations --- .../AMDGPU/workitems-intrinsics-opts.ll | 553 ++ 1 file changed, 553 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll diff --git a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll new file mode 100644 index 0..14120680216fc --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll @@ -0,0 +1,553 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s --check-prefixes=GFX8,DAGISEL-GFX9 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s --check-prefixes=GFX942,DAGISEL-GFX942 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s --check-prefixes=GFX12,DAGISEL-GFX12 + +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s --check-prefixes=GFX8,GISEL-GFX8 +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s --check-prefixes=GFX942,GISEL-GFX942 +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s --check-prefixes=GFX12,GISEL-GFX12 + +; (workitem_id_x | workitem_id_y | workitem_id_z) == 0 +define i1 @workitem_zero() { +; DAGISEL-GFX9-LABEL: workitem_zero: +; DAGISEL-GFX9: ; %bb.0: ; %entry +; DAGISEL-GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX9-NEXT:v_or_b32_e32 v1, v31, v1 +; DAGISEL-GFX9-NEXT:v_or_b32_e32 v0, v1, v0 +; DAGISEL-GFX9-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX9-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; DAGISEL-GFX9-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; DAGISEL-GFX9-NEXT:s_setpc_b64 s[30:31] +; +; DAGISEL-GFX942-LABEL: workitem_zero: +; DAGISEL-GFX942: ; %bb.0: ; %entry +; DAGISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX942-NEXT:v_or3_b32 v0, v31, v1, v0 +; DAGISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; DAGISEL-GFX942-NEXT:s_nop 1 +; DAGISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; DAGISEL-GFX942-NEXT:s_setpc_b64 s[30:31] +; +; DAGISEL-GFX12-LABEL: workitem_zero: +; DAGISEL-GFX12: ; %bb.0: ; %entry +; DAGISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_expcnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_samplecnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_bvhcnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_kmcnt 0x0 +; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) +; DAGISEL-GFX12-NEXT:v_or3_b32 v0, v31, v1, v0 +; DAGISEL-GFX12-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) +; DAGISEL-GFX12-NEXT:v_cmp_eq_u32_e32 vcc_lo, 0, v0 +; DAGISEL-GFX12-NEXT:s_wait_alu 0xfffd +; DAGISEL-GFX12-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc_lo +; DAGISEL-GFX12-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX8-LABEL: workitem_zero: +; GISEL-GFX8: ; %bb.0: ; %entry +; GISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v31 +; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 10, 10 +; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1 +; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 20, 10 +; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1 +; GISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; GISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; GISEL-GFX8-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX942-LABEL: workitem_zero: +; GISEL-GFX942: ; %bb.0: ; %entry +; GISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v31 +; GISEL-GFX942-NEXT:v_bfe_u32 v1, v31, 10, 10 +; GISEL-GFX942-NEXT:v_bfe_u32 v2, v31, 20, 10 +; GISEL-GFX942-NEXT:v_or3_b32 v0, v0, v1, v2 +; GISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; GISEL-GFX942-NEXT:s_nop 1 +; GISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; GISEL-GFX942-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX12-LABEL: workitem_zero: +; GISEL-GFX12: ; %bb.0: ; %entry +; GISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0 +; GISEL-GFX12-NEXT:s_wait_expcnt 0x0 +; GISEL-GFX12-NEXT:s_wait_samplecnt 0x0 +; GISEL-GFX12-NEXT:s_wait_bvhcnt 0x0 +; GISEL-GFX1
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
jayfoad wrote: Does this also handle the case where _all_ of the values ORed together are shifted, like `(setcc ((x >> c0 | x >> c1 | ...) & mask))` ? https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146055 >From d97992ef24abae69878fd1e49270bf0f7372ca39 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 27 Jun 2025 12:04:53 +0200 Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together. Equivalent of the previous DAG patch for GISel. The shifts are BFXs in GISel, so the canonical form of the entire expression is different than in the DAG. The mask is not at the root of the expression, it remains on the leaves instead. See #136727 --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 2 + .../include/llvm/Target/GlobalISel/Combine.td | 11 +- .../GlobalISel/CombinerHelperCompares.cpp | 89 + .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++ .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++ 5 files changed, 483 insertions(+), 139 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index c15263e0b06f8..5ec82c30f268f 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -641,6 +641,8 @@ class CombinerHelper { /// KnownBits information. bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const; + bool combineMergedBFXCompare(MachineInstr &MI) const; + /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2) bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 4a92dc16c1bf4..cba46a5edf9ec 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule< (G_ICMP $root, $p, $ordst, 0)) >; +// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0) +// into a compare of a extract/mask of X +def icmp_merged_bfx_combine: GICombineRule< + (defs root:$root), + (combine (G_ICMP $dst, $p, $src, 0):$root, + [{ return Helper.combineMergedBFXCompare(*${root}); }]) +>; + def and_or_disjoint_mask : GICombineRule< (defs root:$root, build_fn_matchinfo:$info), (match (wip_match_opcode G_AND):$root, @@ -2052,7 +2060,8 @@ def all_combines : GICombineGroup<[integer_reassoc_combines, trivial_combines, fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors, simplify_neg_minmax, combine_concat_vector, sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines, -combine_use_vector_truncate, merge_combines, overflow_combines]>; +combine_use_vector_truncate, merge_combines, overflow_combines, +icmp_merged_bfx_combine]>; // A combine group used to for prelegalizer combiners at -O0. The combines in // this group have been selected based on experiments to balance code size and diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp index fc40533cf3dc9..e1d43f37bac13 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp @@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const MachineInstr &MI, return false; } + +bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const { + const GICmp *Cmp = cast(&MI); + + ICmpInst::Predicate CC = Cmp->getCond(); + if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE) +return false; + + Register CmpLHS = Cmp->getLHSReg(); + Register CmpRHS = Cmp->getRHSReg(); + + LLT OpTy = MRI.getType(CmpLHS); + if (!OpTy.isScalar() || OpTy.isPointer()) +return false; + + assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false)); + + Register Src; + const auto IsSrc = [&](Register R) { +if (!Src) { + Src = R; + return true; +} + +return Src == R; + }; + + MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS); + if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR) +return false; + + APInt PartsMask(OpTy.getSizeInBits(), 0); + SmallVector Worklist = {CmpLHSDef}; + while (!Worklist.empty()) { +MachineInstr *Cur = Worklist.pop_back_val(); + +Register Dst = Cur->getOperand(0).getReg(); +if (!MRI.hasOneUse(Dst) && Dst != Src) + return false; + +if (Cur->getOpcode() == TargetOpcode::G_OR) { + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg())); + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg())); + continue; +} + +if (Cur->getOpcode() == TargetOpcode::G_UBFX) { + Register Op = Cur->getOperand(1).getReg(); + Register Width = Cur->getOperand(2).getReg(); + Register Off = Cur->getOperand(3).getReg(); + + auto WidthCst = getIConstantVRegVal(Width, MRI); + auto
[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146053 >From f137136b2f527aaf1b2f2847e821085aabfc299e Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 26 Jun 2025 13:08:31 +0200 Subject: [PATCH 1/2] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations --- .../AMDGPU/workitems-intrinsics-opts.ll | 553 ++ 1 file changed, 553 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll diff --git a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll new file mode 100644 index 0..14120680216fc --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll @@ -0,0 +1,553 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s --check-prefixes=GFX8,DAGISEL-GFX9 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s --check-prefixes=GFX942,DAGISEL-GFX942 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s --check-prefixes=GFX12,DAGISEL-GFX12 + +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s --check-prefixes=GFX8,GISEL-GFX8 +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s --check-prefixes=GFX942,GISEL-GFX942 +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s --check-prefixes=GFX12,GISEL-GFX12 + +; (workitem_id_x | workitem_id_y | workitem_id_z) == 0 +define i1 @workitem_zero() { +; DAGISEL-GFX9-LABEL: workitem_zero: +; DAGISEL-GFX9: ; %bb.0: ; %entry +; DAGISEL-GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX9-NEXT:v_or_b32_e32 v1, v31, v1 +; DAGISEL-GFX9-NEXT:v_or_b32_e32 v0, v1, v0 +; DAGISEL-GFX9-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX9-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; DAGISEL-GFX9-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; DAGISEL-GFX9-NEXT:s_setpc_b64 s[30:31] +; +; DAGISEL-GFX942-LABEL: workitem_zero: +; DAGISEL-GFX942: ; %bb.0: ; %entry +; DAGISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX942-NEXT:v_or3_b32 v0, v31, v1, v0 +; DAGISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; DAGISEL-GFX942-NEXT:s_nop 1 +; DAGISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; DAGISEL-GFX942-NEXT:s_setpc_b64 s[30:31] +; +; DAGISEL-GFX12-LABEL: workitem_zero: +; DAGISEL-GFX12: ; %bb.0: ; %entry +; DAGISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_expcnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_samplecnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_bvhcnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_kmcnt 0x0 +; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) +; DAGISEL-GFX12-NEXT:v_or3_b32 v0, v31, v1, v0 +; DAGISEL-GFX12-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) +; DAGISEL-GFX12-NEXT:v_cmp_eq_u32_e32 vcc_lo, 0, v0 +; DAGISEL-GFX12-NEXT:s_wait_alu 0xfffd +; DAGISEL-GFX12-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc_lo +; DAGISEL-GFX12-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX8-LABEL: workitem_zero: +; GISEL-GFX8: ; %bb.0: ; %entry +; GISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v31 +; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 10, 10 +; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1 +; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 20, 10 +; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1 +; GISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; GISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; GISEL-GFX8-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX942-LABEL: workitem_zero: +; GISEL-GFX942: ; %bb.0: ; %entry +; GISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v31 +; GISEL-GFX942-NEXT:v_bfe_u32 v1, v31, 10, 10 +; GISEL-GFX942-NEXT:v_bfe_u32 v2, v31, 20, 10 +; GISEL-GFX942-NEXT:v_or3_b32 v0, v0, v1, v2 +; GISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; GISEL-GFX942-NEXT:s_nop 1 +; GISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; GISEL-GFX942-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX12-LABEL: workitem_zero: +; GISEL-GFX12: ; %bb.0: ; %entry +; GISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0 +; GISEL-GFX12-NEXT:s_wait_expcnt 0x0 +; GISEL-GFX12-NEXT:s_wait_samplecnt 0x0 +; GISEL-GFX12-NEXT:s_wait_bvhcnt 0x0 +; GISEL-GFX1
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146054 >From 26615132899d40b8d245fd98d093ef8c26cdc3e1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 26 Jun 2025 13:31:37 +0200 Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences Fold sequences where we extract a bunch of contiguous bits from a value, merge them into the low bit and then check if the low bits are zero or not. It seems like a strange sequence at first but it's an idiom used by device libs in device libs to check workitem IDs for AMDGPU. The reason I put this in DAGCombiner instead of the target combiner is because this is a generic, valid transform that's also fairly niche, so there isn't much risk of a combine loop I think. See #136727 --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++- .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++-- 2 files changed, 91 insertions(+), 29 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 6ca243990c468..a6eb214762fcb 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -28912,13 +28912,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); + + if (Root.getOpcode() != ISD::AND) +return SDValue(); + + SDValue N0 = Root.getOperand(0); + SDValue N1 = Root.getOperand(1); + + if (N0.getOpcode() != ISD::OR || !isa(N1)) +return SDValue(); + + APInt RootMask = cast(N1)->getAsAPIntVal(); + if (!RootMask.isMask()) +return SDValue(); + + SDValue Src; + const auto IsSrc = [&](SDValue V) { +if (!Src) { + Src = V; + return true; +} + +return Src == V; + }; + + SmallVector Worklist = {N0}; + APInt PartsMask(VT.getSizeInBits(), 0); + while (!Worklist.empty()) { +SDValue V = Worklist.pop_back_val(); +if (!V.hasOneUse() && Src != V) + return SDValue(); + +if (V.getOpcode() == ISD::OR) { + Worklist.push_back(V.getOperand(0)); + Worklist.push_back(V.getOperand(1)); + continue; +} + +if (V.getOpcode() == ISD::SRL) { + SDValue ShiftSrc = V.getOperand(0); + SDValue ShiftAmt = V.getOperand(1); + + if (!IsSrc(ShiftSrc) || !isa(ShiftAmt)) +return SDValue(); + + PartsMask |= (RootMask << cast(ShiftAmt)->getAsZExtVal()); + continue; +} + +if (IsSrc(V)) { + PartsMask |= RootMask; + continue; +} + +return SDValue(); + } + + if (!RootMask.isMask() || !Src) +return SDValue(); + + SDLoc DL(Root); + return DAG.getNode(ISD::AND, DL, VT, + {Src, DAG.getConstant(PartsMask, DL, VT)}); +} + /// This is a stub for TargetLowering::SimplifySetCC. SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond, const SDLoc &DL, bool foldBooleans) { TargetLowering::DAGCombinerInfo DagCombineInfo(DAG, Level, false, this); - return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL); + if (SDValue C = + TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL)) +return C; + + if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) && + N0.getOpcode() == ISD::AND && isNullConstant(N1)) { + +if (SDValue Res = matchMergedBFX(N0, DAG, TLI)) + return DAG.getSetCC(DL, VT, Res, N1, Cond); + } + + return SDValue(); } /// Given an ISD::SDIV node expressing a divide by constant, return diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll index 07c4aeb1ac7df..64d055bc40e98 100644 --- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll +++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll @@ -12,11 +12,7 @@ define i1 @workitem_zero() { ; DAGISEL-GFX8-LABEL: workitem_zero: ; DAGISEL-GFX8: ; %bb.0: ; %entry ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31 -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0 -; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31] @@ -
[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146055 >From d97992ef24abae69878fd1e49270bf0f7372ca39 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 27 Jun 2025 12:04:53 +0200 Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together. Equivalent of the previous DAG patch for GISel. The shifts are BFXs in GISel, so the canonical form of the entire expression is different than in the DAG. The mask is not at the root of the expression, it remains on the leaves instead. See #136727 --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 2 + .../include/llvm/Target/GlobalISel/Combine.td | 11 +- .../GlobalISel/CombinerHelperCompares.cpp | 89 + .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++ .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++ 5 files changed, 483 insertions(+), 139 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index c15263e0b06f8..5ec82c30f268f 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -641,6 +641,8 @@ class CombinerHelper { /// KnownBits information. bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const; + bool combineMergedBFXCompare(MachineInstr &MI) const; + /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2) bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 4a92dc16c1bf4..cba46a5edf9ec 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule< (G_ICMP $root, $p, $ordst, 0)) >; +// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0) +// into a compare of a extract/mask of X +def icmp_merged_bfx_combine: GICombineRule< + (defs root:$root), + (combine (G_ICMP $dst, $p, $src, 0):$root, + [{ return Helper.combineMergedBFXCompare(*${root}); }]) +>; + def and_or_disjoint_mask : GICombineRule< (defs root:$root, build_fn_matchinfo:$info), (match (wip_match_opcode G_AND):$root, @@ -2052,7 +2060,8 @@ def all_combines : GICombineGroup<[integer_reassoc_combines, trivial_combines, fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors, simplify_neg_minmax, combine_concat_vector, sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines, -combine_use_vector_truncate, merge_combines, overflow_combines]>; +combine_use_vector_truncate, merge_combines, overflow_combines, +icmp_merged_bfx_combine]>; // A combine group used to for prelegalizer combiners at -O0. The combines in // this group have been selected based on experiments to balance code size and diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp index fc40533cf3dc9..e1d43f37bac13 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp @@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const MachineInstr &MI, return false; } + +bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const { + const GICmp *Cmp = cast(&MI); + + ICmpInst::Predicate CC = Cmp->getCond(); + if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE) +return false; + + Register CmpLHS = Cmp->getLHSReg(); + Register CmpRHS = Cmp->getRHSReg(); + + LLT OpTy = MRI.getType(CmpLHS); + if (!OpTy.isScalar() || OpTy.isPointer()) +return false; + + assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false)); + + Register Src; + const auto IsSrc = [&](Register R) { +if (!Src) { + Src = R; + return true; +} + +return Src == R; + }; + + MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS); + if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR) +return false; + + APInt PartsMask(OpTy.getSizeInBits(), 0); + SmallVector Worklist = {CmpLHSDef}; + while (!Worklist.empty()) { +MachineInstr *Cur = Worklist.pop_back_val(); + +Register Dst = Cur->getOperand(0).getReg(); +if (!MRI.hasOneUse(Dst) && Dst != Src) + return false; + +if (Cur->getOpcode() == TargetOpcode::G_OR) { + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg())); + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg())); + continue; +} + +if (Cur->getOpcode() == TargetOpcode::G_UBFX) { + Register Op = Cur->getOperand(1).getReg(); + Register Width = Cur->getOperand(2).getReg(); + Register Off = Cur->getOperand(3).getReg(); + + auto WidthCst = getIConstantVRegVal(Width, MRI); + auto
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146054 >From 26615132899d40b8d245fd98d093ef8c26cdc3e1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 26 Jun 2025 13:31:37 +0200 Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences Fold sequences where we extract a bunch of contiguous bits from a value, merge them into the low bit and then check if the low bits are zero or not. It seems like a strange sequence at first but it's an idiom used by device libs in device libs to check workitem IDs for AMDGPU. The reason I put this in DAGCombiner instead of the target combiner is because this is a generic, valid transform that's also fairly niche, so there isn't much risk of a combine loop I think. See #136727 --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++- .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++-- 2 files changed, 91 insertions(+), 29 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 6ca243990c468..a6eb214762fcb 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -28912,13 +28912,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); + + if (Root.getOpcode() != ISD::AND) +return SDValue(); + + SDValue N0 = Root.getOperand(0); + SDValue N1 = Root.getOperand(1); + + if (N0.getOpcode() != ISD::OR || !isa(N1)) +return SDValue(); + + APInt RootMask = cast(N1)->getAsAPIntVal(); + if (!RootMask.isMask()) +return SDValue(); + + SDValue Src; + const auto IsSrc = [&](SDValue V) { +if (!Src) { + Src = V; + return true; +} + +return Src == V; + }; + + SmallVector Worklist = {N0}; + APInt PartsMask(VT.getSizeInBits(), 0); + while (!Worklist.empty()) { +SDValue V = Worklist.pop_back_val(); +if (!V.hasOneUse() && Src != V) + return SDValue(); + +if (V.getOpcode() == ISD::OR) { + Worklist.push_back(V.getOperand(0)); + Worklist.push_back(V.getOperand(1)); + continue; +} + +if (V.getOpcode() == ISD::SRL) { + SDValue ShiftSrc = V.getOperand(0); + SDValue ShiftAmt = V.getOperand(1); + + if (!IsSrc(ShiftSrc) || !isa(ShiftAmt)) +return SDValue(); + + PartsMask |= (RootMask << cast(ShiftAmt)->getAsZExtVal()); + continue; +} + +if (IsSrc(V)) { + PartsMask |= RootMask; + continue; +} + +return SDValue(); + } + + if (!RootMask.isMask() || !Src) +return SDValue(); + + SDLoc DL(Root); + return DAG.getNode(ISD::AND, DL, VT, + {Src, DAG.getConstant(PartsMask, DL, VT)}); +} + /// This is a stub for TargetLowering::SimplifySetCC. SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond, const SDLoc &DL, bool foldBooleans) { TargetLowering::DAGCombinerInfo DagCombineInfo(DAG, Level, false, this); - return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL); + if (SDValue C = + TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL)) +return C; + + if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) && + N0.getOpcode() == ISD::AND && isNullConstant(N1)) { + +if (SDValue Res = matchMergedBFX(N0, DAG, TLI)) + return DAG.getSetCC(DL, VT, Res, N1, Cond); + } + + return SDValue(); } /// Given an ISD::SDIV node expressing a divide by constant, return diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll index 07c4aeb1ac7df..64d055bc40e98 100644 --- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll +++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll @@ -12,11 +12,7 @@ define i1 @workitem_zero() { ; DAGISEL-GFX8-LABEL: workitem_zero: ; DAGISEL-GFX8: ; %bb.0: ; %entry ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31 -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0 -; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31] @@ -
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #145911)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/145911 >From 046418f7ccd46a2b0c2ea3c9ab15e659de709b27 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 5 Jun 2025 12:17:13 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize --- .../Target/AMDGPU/AMDGPURegBankLegalize.cpp | 317 -- .../AMDGPU/GlobalISel/readanylane-combines.ll | 25 +- .../GlobalISel/readanylane-combines.mir | 78 ++--- .../GlobalISel/regbankselect-and-s1.mir | 6 + .../GlobalISel/regbankselect-anyext.mir | 4 + .../AMDGPU/GlobalISel/regbankselect-trunc.mir | 2 + 6 files changed, 246 insertions(+), 186 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp index ba661348ca5b5..e1879598f098a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp @@ -23,6 +23,8 @@ #include "GCNSubtarget.h" #include "llvm/CodeGen/GlobalISel/CSEInfo.h" #include "llvm/CodeGen/GlobalISel/CSEMIRBuilder.h" +#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h" +#include "llvm/CodeGen/GlobalISel/Utils.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineUniformityAnalysis.h" #include "llvm/CodeGen/TargetPassConfig.h" @@ -115,126 +117,233 @@ class AMDGPURegBankLegalizeCombiner { VgprRB(&RBI.getRegBank(AMDGPU::VGPRRegBankID)), VccRB(&RBI.getRegBank(AMDGPU::VCCRegBankID)) {}; - bool isLaneMask(Register Reg) { -const RegisterBank *RB = MRI.getRegBankOrNull(Reg); -if (RB && RB->getID() == AMDGPU::VCCRegBankID) - return true; + bool isLaneMask(Register Reg); + std::pair tryMatch(Register Src, unsigned Opcode); + std::pair tryMatchRALFromUnmerge(Register Src); + Register getReadAnyLaneSrc(Register Src); + void replaceRegWithOrBuildCopy(Register Dst, Register Src); -const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg); -return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1); - } + bool tryEliminateReadAnyLane(MachineInstr &Copy); + void tryCombineCopy(MachineInstr &MI); + void tryCombineS1AnyExt(MachineInstr &MI); +}; - void cleanUpAfterCombine(MachineInstr &MI, MachineInstr *Optional0) { -MI.eraseFromParent(); -if (Optional0 && isTriviallyDead(*Optional0, MRI)) - Optional0->eraseFromParent(); - } +bool AMDGPURegBankLegalizeCombiner::isLaneMask(Register Reg) { + const RegisterBank *RB = MRI.getRegBankOrNull(Reg); + if (RB && RB->getID() == AMDGPU::VCCRegBankID) +return true; - std::pair tryMatch(Register Src, unsigned Opcode) { -MachineInstr *MatchMI = MRI.getVRegDef(Src); -if (MatchMI->getOpcode() != Opcode) - return {nullptr, Register()}; -return {MatchMI, MatchMI->getOperand(1).getReg()}; - } + const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg); + return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1); +} - void tryCombineCopy(MachineInstr &MI) { -Register Dst = MI.getOperand(0).getReg(); -Register Src = MI.getOperand(1).getReg(); -// Skip copies of physical registers. -if (!Dst.isVirtual() || !Src.isVirtual()) - return; - -// This is a cross bank copy, sgpr S1 to lane mask. -// -// %Src:sgpr(s1) = G_TRUNC %TruncS32Src:sgpr(s32) -// %Dst:lane-mask(s1) = COPY %Src:sgpr(s1) -// -> -// %Dst:lane-mask(s1) = G_AMDGPU_COPY_VCC_SCC %TruncS32Src:sgpr(s32) -if (isLaneMask(Dst) && MRI.getRegBankOrNull(Src) == SgprRB) { - auto [Trunc, TruncS32Src] = tryMatch(Src, AMDGPU::G_TRUNC); - assert(Trunc && MRI.getType(TruncS32Src) == S32 && - "sgpr S1 must be result of G_TRUNC of sgpr S32"); - - B.setInstr(MI); - // Ensure that truncated bits in BoolSrc are 0. - auto One = B.buildConstant({SgprRB, S32}, 1); - auto BoolSrc = B.buildAnd({SgprRB, S32}, TruncS32Src, One); - B.buildInstr(AMDGPU::G_AMDGPU_COPY_VCC_SCC, {Dst}, {BoolSrc}); - cleanUpAfterCombine(MI, Trunc); - return; -} +std::pair +AMDGPURegBankLegalizeCombiner::tryMatch(Register Src, unsigned Opcode) { + MachineInstr *MatchMI = MRI.getVRegDef(Src); + if (MatchMI->getOpcode() != Opcode) +return {nullptr, Register()}; + return {MatchMI, MatchMI->getOperand(1).getReg()}; +} + +std::pair +AMDGPURegBankLegalizeCombiner::tryMatchRALFromUnmerge(Register Src) { + MachineInstr *ReadAnyLane = MRI.getVRegDef(Src); + if (ReadAnyLane->getOpcode() != AMDGPU::G_AMDGPU_READANYLANE) +return {nullptr, -1}; + + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + if (auto *UnMerge = getOpcodeDef(RALSrc, MRI)) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; -// Src = G_AMDGPU_READANYLANE RALSrc -// Dst = COPY Src -// -> -// Dst = RALSrc -if (MRI.getRegBankOrNull(Dst) == VgprRB && -MRI.getRegBankOrNull(Src) == SgprRB) { -
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #145912)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/145912 >From 7c5c7bf98afe91f015b36e42536a8a700b27b686 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 26 Jun 2025 16:03:56 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering for divergent operands that must be sgpr. --- .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | 61 +++-- .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h | 2 + .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 22 +- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 6 +- .../AMDGPU/GlobalISel/buffer-schedule.ll | 2 +- .../llvm.amdgcn.make.buffer.rsrc.ll | 2 +- .../regbankselect-amdgcn.raw.buffer.load.ll | 59 ++--- ...egbankselect-amdgcn.raw.ptr.buffer.load.ll | 59 ++--- ...regbankselect-amdgcn.struct.buffer.load.ll | 59 ++--- ...ankselect-amdgcn.struct.ptr.buffer.load.ll | 59 ++--- .../llvm.amdgcn.buffer.load-last-use.ll | 2 +- .../llvm.amdgcn.raw.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.struct.atomic.buffer.load.ll | 48 ++-- ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll | 48 ++-- .../CodeGen/AMDGPU/swizzle.bit.extract.ll | 4 +- 18 files changed, 514 insertions(+), 243 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp index 00979f44f9d34..f36935d8c0e8f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp @@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); - -static void unmergeReadAnyLane(MachineIRBuilder &B, - SmallVectorImpl &SgprDstParts, - LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { +template +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); + +template +static void +unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, + LLT UnmergeTy, Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +template +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +template +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { LLT Ty = B.getMRI()->getType(VgprSrc); if (Ty.getSizeInBits() == 32) { -B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); +BuildReadLane(B, SgprDst, VgprSrc); return; } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildReadLane); B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0); } + +void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize (PR #145912)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/145912 >From 7c5c7bf98afe91f015b36e42536a8a700b27b686 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 26 Jun 2025 16:03:56 +0200 Subject: [PATCH] AMDGPU/GlobalISel: Add waterfall lowering in regbanklegalize Add rules for G_AMDGPU_BUFFER_LOAD and implement waterfall lowering for divergent operands that must be sgpr. --- .../Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | 61 +++-- .../lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h | 2 + .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 239 +- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 22 +- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 6 +- .../AMDGPU/GlobalISel/buffer-schedule.ll | 2 +- .../llvm.amdgcn.make.buffer.rsrc.ll | 2 +- .../regbankselect-amdgcn.raw.buffer.load.ll | 59 ++--- ...egbankselect-amdgcn.raw.ptr.buffer.load.ll | 59 ++--- ...regbankselect-amdgcn.struct.buffer.load.ll | 59 ++--- ...ankselect-amdgcn.struct.ptr.buffer.load.ll | 59 ++--- .../llvm.amdgcn.buffer.load-last-use.ll | 2 +- .../llvm.amdgcn.raw.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.raw.ptr.atomic.buffer.load.ll | 42 +-- .../llvm.amdgcn.struct.atomic.buffer.load.ll | 48 ++-- ...vm.amdgcn.struct.ptr.atomic.buffer.load.ll | 48 ++-- .../CodeGen/AMDGPU/swizzle.bit.extract.ll | 4 +- 18 files changed, 514 insertions(+), 243 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp index 00979f44f9d34..f36935d8c0e8f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp @@ -117,45 +117,72 @@ static LLT getReadAnyLaneSplitTy(LLT Ty) { return LLT::scalar(32); } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI); - -static void unmergeReadAnyLane(MachineIRBuilder &B, - SmallVectorImpl &SgprDstParts, - LLT UnmergeTy, Register VgprSrc, - const RegisterBankInfo &RBI) { +template +static Register buildReadLane(MachineIRBuilder &, Register, + const RegisterBankInfo &, ReadLaneFnTy); + +template +static void +unmergeReadAnyLane(MachineIRBuilder &B, SmallVectorImpl &SgprDstParts, + LLT UnmergeTy, Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { const RegisterBank *VgprRB = &RBI.getRegBank(AMDGPU::VGPRRegBankID); auto Unmerge = B.buildUnmerge({VgprRB, UnmergeTy}, VgprSrc); for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) { -SgprDstParts.push_back(buildReadAnyLane(B, Unmerge.getReg(i), RBI)); +SgprDstParts.push_back(buildReadLane(B, Unmerge.getReg(i), RBI, BuildRL)); } } -static Register buildReadAnyLane(MachineIRBuilder &B, Register VgprSrc, - const RegisterBankInfo &RBI) { +template +static Register buildReadLane(MachineIRBuilder &B, Register VgprSrc, + const RegisterBankInfo &RBI, + ReadLaneFnTy BuildRL) { LLT Ty = B.getMRI()->getType(VgprSrc); const RegisterBank *SgprRB = &RBI.getRegBank(AMDGPU::SGPRRegBankID); if (Ty.getSizeInBits() == 32) { -return B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {{SgprRB, Ty}}, {VgprSrc}) -.getReg(0); +Register SgprDst = B.getMRI()->createVirtualRegister({SgprRB, Ty}); +return BuildRL(B, SgprDst, VgprSrc).getReg(0); } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildRL); return B.buildMergeLikeInstr({SgprRB, Ty}, SgprDstParts).getReg(0); } -void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, - Register VgprSrc, const RegisterBankInfo &RBI) { +template +static void buildReadLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc, const RegisterBankInfo &RBI, + ReadLaneFnTy BuildReadLane) { LLT Ty = B.getMRI()->getType(VgprSrc); if (Ty.getSizeInBits() == 32) { -B.buildInstr(AMDGPU::G_AMDGPU_READANYLANE, {SgprDst}, {VgprSrc}); +BuildReadLane(B, SgprDst, VgprSrc); return; } SmallVector SgprDstParts; - unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI); + unmergeReadAnyLane(B, SgprDstParts, getReadAnyLaneSplitTy(Ty), VgprSrc, RBI, + BuildReadLane); B.buildMergeLikeInstr(SgprDst, SgprDstParts).getReg(0); } + +void AMDGPU::buildReadAnyLane(MachineIRBuilder &B, Register SgprDst, + Register VgprSrc
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #145911)
@@ -115,126 +117,233 @@ class AMDGPURegBankLegalizeCombiner { VgprRB(&RBI.getRegBank(AMDGPU::VGPRRegBankID)), VccRB(&RBI.getRegBank(AMDGPU::VCCRegBankID)) {}; - bool isLaneMask(Register Reg) { -const RegisterBank *RB = MRI.getRegBankOrNull(Reg); -if (RB && RB->getID() == AMDGPU::VCCRegBankID) - return true; + bool isLaneMask(Register Reg); + std::pair tryMatch(Register Src, unsigned Opcode); + std::pair tryMatchRALFromUnmerge(Register Src); + Register getReadAnyLaneSrc(Register Src); + void replaceRegWithOrBuildCopy(Register Dst, Register Src); -const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg); -return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1); - } + bool tryEliminateReadAnyLane(MachineInstr &Copy); + void tryCombineCopy(MachineInstr &MI); + void tryCombineS1AnyExt(MachineInstr &MI); +}; - void cleanUpAfterCombine(MachineInstr &MI, MachineInstr *Optional0) { -MI.eraseFromParent(); -if (Optional0 && isTriviallyDead(*Optional0, MRI)) - Optional0->eraseFromParent(); - } +bool AMDGPURegBankLegalizeCombiner::isLaneMask(Register Reg) { + const RegisterBank *RB = MRI.getRegBankOrNull(Reg); + if (RB && RB->getID() == AMDGPU::VCCRegBankID) +return true; - std::pair tryMatch(Register Src, unsigned Opcode) { -MachineInstr *MatchMI = MRI.getVRegDef(Src); -if (MatchMI->getOpcode() != Opcode) - return {nullptr, Register()}; -return {MatchMI, MatchMI->getOperand(1).getReg()}; - } + const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg); + return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1); +} - void tryCombineCopy(MachineInstr &MI) { -Register Dst = MI.getOperand(0).getReg(); -Register Src = MI.getOperand(1).getReg(); -// Skip copies of physical registers. -if (!Dst.isVirtual() || !Src.isVirtual()) - return; - -// This is a cross bank copy, sgpr S1 to lane mask. -// -// %Src:sgpr(s1) = G_TRUNC %TruncS32Src:sgpr(s32) -// %Dst:lane-mask(s1) = COPY %Src:sgpr(s1) -// -> -// %Dst:lane-mask(s1) = G_AMDGPU_COPY_VCC_SCC %TruncS32Src:sgpr(s32) -if (isLaneMask(Dst) && MRI.getRegBankOrNull(Src) == SgprRB) { - auto [Trunc, TruncS32Src] = tryMatch(Src, AMDGPU::G_TRUNC); - assert(Trunc && MRI.getType(TruncS32Src) == S32 && - "sgpr S1 must be result of G_TRUNC of sgpr S32"); - - B.setInstr(MI); - // Ensure that truncated bits in BoolSrc are 0. - auto One = B.buildConstant({SgprRB, S32}, 1); - auto BoolSrc = B.buildAnd({SgprRB, S32}, TruncS32Src, One); - B.buildInstr(AMDGPU::G_AMDGPU_COPY_VCC_SCC, {Dst}, {BoolSrc}); - cleanUpAfterCombine(MI, Trunc); - return; -} +std::pair +AMDGPURegBankLegalizeCombiner::tryMatch(Register Src, unsigned Opcode) { + MachineInstr *MatchMI = MRI.getVRegDef(Src); + if (MatchMI->getOpcode() != Opcode) +return {nullptr, Register()}; + return {MatchMI, MatchMI->getOperand(1).getReg()}; +} petar-avramovic wrote: Can use mi_match, this is shorter because we use auto instead of declaring what we want to capture. To me at least, this has nicer formatting. How about matchInstAndGetSrc? https://github.com/llvm/llvm-project/pull/145911 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
@@ -1,592 +1,697 @@ -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+the -mattr=+d128 < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+v8.9a -mattr=+the -mattr=+d128 < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+v9.4a -mattr=+the -mattr=+d128 < %s | FileCheck %s +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v8.9a < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v9.4a < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \ +// RUN:| llvm-objdump -d --mattr=+the,+d128 - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \ +// RUN: | llvm-objdump -d --mattr=-the,-d128 - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+the,+d128 -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+the < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a -mattr=+the < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a -mattr=+the < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s +mrs x3, RCWMASK_EL1 +// CHECK-INST: mrs x3, RCWMASK_EL1 +// CHECK-ENCODING: encoding: [0xc3,0xd0,0x38,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d538d0c3 mrs x3, S3_0_C13_C0_6 -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+the -mattr=+d128 < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-ZXR %s +msr RCWMASK_EL1, x1 +// CHECK-INST: msr RCWMASK_EL1, x1 +// CHECK-ENCODING: encoding: [0xc1,0xd0,0x18,0xd5] +// CHECK-ERROR: error: expected writable system register or pstate +// CHECK-UNKNOWN: d518d0c1 msr S3_0_C13_C0_6, x1 -mrs x3, RCWMASK_EL1 -// CHECK: mrs x3, RCWMASK_EL1 // encoding: [0xc3,0xd0,0x38,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register -msr RCWMASK_EL1, x1 -// CHECK: msr RCWMASK_EL1, x1 // encoding: [0xc1,0xd0,0x18,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or pstate -mrs x3, RCWSMASK_EL1 -// CHECK: mrs x3, RCWSMASK_EL1 // encoding: [0x63,0xd0,0x38,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register -msr RCWSMASK_EL1, x1 -// CHECK: msr RCWSMASK_EL1, x1 // encoding: [0x61,0xd0,0x18,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or pstate +mrs x3, RCWSMASK_EL1 +// CHECK-INST: mrs x3, RCWSMASK_EL1 +// CHECK-ENCODING: encoding: [0x63,0xd0,0x38,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d538d063 mrs x3, S3_0_C13_C0_3 +msr RCWSMASK_EL1, x1 +// CHECK-INST: msr RCWSMASK_EL1, x1 +// CHECK-ENCODING: encoding: [0x61,0xd0,0x18,0xd5] +// CHECK-ERROR: error: expected writable system register or pstate +// CHECK-UNKNOWN: d518d061 msr S3_0_C13_C0_3, x1 -rcwcas x0, x1, [x4] -// CHECK: rcwcas x0, x1, [x4] // encoding: [0x81,0x08,0x20,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the -rcwcasa x0, x1, [x4] -// CHECK: rcwcasa x0, x1, [x4] // encoding: [0x81,0x08,0xa0,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the -rcwcasal x0, x1, [x4] -// CHECK: rcwcasal x0, x1, [x4] // encoding: [0x81,0x08,0xe0,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the -rcwcasl x0, x1, [x4] -// CHECK: rcwcasl x0, x1, [x4] // encoding: [0x81,0x08,0x60,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
@@ -16,28 +16,41 @@ // RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,-clrbhb < %s | FileCheck %s --check-prefix=HINT_22 // Optional, off by default, manually enabled -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST tmatheson-arm wrote: Why keeping a different test format for CLRBHB? https://github.com/llvm/llvm-project/pull/146330 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
@@ -1,55 +1,117 @@ -// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-MEC %s - - mrs x0, MECIDR_EL2 -// CHECK: mrs x0, MECIDR_EL2 // encoding: [0xe0,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P0_EL2 -// CHECK: mrs x0, MECID_P0_EL2 // encoding: [0x00,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A0_EL2 -// CHECK: mrs x0, MECID_A0_EL2 // encoding: [0x20,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P1_EL2 -// CHECK: mrs x0, MECID_P1_EL2 // encoding: [0x40,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A1_EL2 -// CHECK: mrs x0, MECID_A1_EL2 // encoding: [0x60,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_P_EL2 -// CHECK: mrs x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_A_EL2 -// CHECK: mrs x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_RL_A_EL3 -// CHECK: mrs x0, MECID_RL_A_EL3 // encoding: [0x20,0xaa,0x3e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - msr MECID_P0_EL2,x0 -// CHECK: msr MECID_P0_EL2, x0 // encoding: [0x00,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A0_EL2,x0 -// CHECK: msr MECID_A0_EL2, x0 // encoding: [0x20,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_P1_EL2,x0 -// CHECK: msr MECID_P1_EL2, x0 // encoding: [0x40,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A1_EL2,x0 -// CHECK: msr MECID_A1_EL2, x0 // encoding: [0x60,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr VMECID_P_EL2, x0 -// CHECK: msr VMECID_P_EL2, x0 // encoding: [0x00,0xa9,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr VMECID_A_EL2, x0 -// CHECK: msr VMECID_A_EL2, x0 // encoding: [0x20,0xa9,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_RL_A_EL3, x0 -// CHECK: msr MECID_RL_A_EL3, x0 // encoding: [0x20,0xaa,0x1e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - - dc cigdpae, x0 -// CHECK: dc cigdpae, x0 // encoding: [0xe0,0x7e,0x0c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIGDPAE requires: mec - dc cipae, x0 -// CHECK: dc cipae, x0 // encoding: [0x00,0x7e,0x0c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIPAE requires: mec +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \ +// RUN:| llvm-objdump -d --mattr=+mec --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \ +// RUN: | llvm-objdump -d --mattr=-mec --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+mec -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + + +mrs x0, MECIDR_EL2 +// CHECK-INST: mrs x0, MECIDR_EL2 +// CHECK-ENCODING: encoding: [0xe0,0xa8,0x3c,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d53ca8e0 mrs x0, S3_4_C10_C8_7 + +mrs x0, MECID_P0_EL2 +// CHECK-INST: mrs x0, MECID_P0_EL2 +// CHECK-ENCODING: encoding: [0x00,0xa8,0x3c,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d53ca800 mrs x0, S3_4_C10_C8_0 + +mrs x0, MECID_A0_EL2 +// CHECK-INST: mrs x0, MECID_A0_EL2 +// CHECK-ENCODING: encoding: [0x20,0xa8,0x3c,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d53ca820 mrs
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
@@ -1,115 +1,203 @@ -// RUN: llvm-mc -triple aarch64 -mattr +gcs -show-encoding %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>%t | FileCheck %s --check-prefix=NO-GCS -// RUN: FileCheck --check-prefix=ERROR-NO-GCS %s < %t +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \ +// RUN:| llvm-objdump -d --mattr=+gcs --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \ +// RUN: | llvm-objdump -d --mattr=-gcs --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+gcs -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + msr GCSCR_EL1, x0 +// CHECK-INST: msr GCSCR_EL1, x0 +// CHECK-ENCODING: encoding: [0x00,0x25,0x18,0xd5] +// CHECK-UNKNOWN: d5182500 msr GCSCR_EL1, x0 + mrs x1, GCSCR_EL1 -// CHECK: msr GCSCR_EL1, x0 // encoding: [0x00,0x25,0x18,0xd5] -// CHECK: mrs x1, GCSCR_EL1 // encoding: [0x01,0x25,0x38,0xd5] +// CHECK-INST: mrs x1, GCSCR_EL1 +// CHECK-ENCODING: encoding: [0x01,0x25,0x38,0xd5] +// CHECK-UNKNOWN: d5382501 mrs x1, GCSCR_EL1 msr GCSPR_EL1, x2 +// CHECK-INST: msr GCSPR_EL1, x2 +// CHECK-ENCODING: encoding: [0x22,0x25,0x18,0xd5] +// CHECK-UNKNOWN: d5182522 msr GCSPR_EL1, x2 + mrs x3, GCSPR_EL1 -// CHECK: msr GCSPR_EL1, x2 // encoding: [0x22,0x25,0x18,0xd5] -// CHECK: mrs x3, GCSPR_EL1 // encoding: [0x23,0x25,0x38,0xd5] +// CHECK-INST: mrs x3, GCSPR_EL1 +// CHECK-ENCODING: encoding: [0x23,0x25,0x38,0xd5] +// CHECK-UNKNOWN: d5382523 mrs x3, GCSPR_EL1 msr GCSCRE0_EL1, x4 +// CHECK-INST: msr GCSCRE0_EL1, x4 +// CHECK-ENCODING: encoding: [0x44,0x25,0x18,0xd5] +// CHECK-UNKNOWN: d5182544 msr GCSCRE0_EL1, x4 + mrs x5, GCSCRE0_EL1 -// CHECK: msr GCSCRE0_EL1, x4 // encoding: [0x44,0x25,0x18,0xd5] -// CHECK: mrs x5, GCSCRE0_EL1 // encoding: [0x45,0x25,0x38,0xd5] +// CHECK-INST: mrs x5, GCSCRE0_EL1 +// CHECK-ENCODING: encoding: [0x45,0x25,0x38,0xd5] +// CHECK-UNKNOWN: d5382545 mrs x5, GCSCRE0_EL1 msr GCSPR_EL0, x6 +// CHECK-INST: msr GCSPR_EL0, x6 +// CHECK-ENCODING: encoding: [0x26,0x25,0x1b,0xd5] +// CHECK-UNKNOWN: d51b2526 msr GCSPR_EL0, x6 + mrs x7, GCSPR_EL0 -// CHECK: msr GCSPR_EL0, x6 // encoding: [0x26,0x25,0x1b,0xd5] -// CHECK: mrs x7, GCSPR_EL0 // encoding: [0x27,0x25,0x3b,0xd5] +// CHECK-INST: mrs x7, GCSPR_EL0 +// CHECK-ENCODING: encoding: [0x27,0x25,0x3b,0xd5] +// CHECK-UNKNOWN: d53b2527 mrs x7, GCSPR_EL0 msr GCSCR_EL2, x10 +// CHECK-INST: msr GCSCR_EL2, x10 +// CHECK-ENCODING: encoding: [0x0a,0x25,0x1c,0xd5] +// CHECK-UNKNOWN: d51c250a msr GCSCR_EL2, x10 + mrs x11, GCSCR_EL2 -// CHECK: msr GCSCR_EL2, x10 // encoding: [0x0a,0x25,0x1c,0xd5] -// CHECK: mrs x11, GCSCR_EL2 // encoding: [0x0b,0x25,0x3c,0xd5] +// CHECK-INST: mrs x11, GCSCR_EL2 +// CHECK-ENCODING: encoding: [0x0b,0x25,0x3c,0xd5] +// CHECK-UNKNOWN: d53c250b mrs x11, GCSCR_EL2 msr GCSPR_EL2, x12 +// CHECK-INST: msr GCSPR_EL2, x12 +// CHECK-ENCODING: encoding: [0x2c,0x25,0x1c,0xd5] +// CHECK-UNKNOWN: d51c252c msr GCSPR_EL2, x12 + mrs x13, GCSPR_EL2 -// CHECK: msr GCSPR_EL2, x12 // encoding: [0x2c,0x25,0x1c,0xd5] -// CHECK: mrs x13, GCSPR_EL2 // encoding: [0x2d,0x25,0x3c,0xd5] +// CHECK-INST: mrs x13, GCSPR_EL2 +// CHECK-ENCODING: encoding: [0x2d,0x25,0x3c,0xd5] +// CHECK-UNKNOWN: d53c252d mrs x13, GCSPR_EL2 msr GCSCR_EL12, x14 +// CHECK-INST: msr GCSCR_EL12, x14 +// CHECK-ENCODING: encoding: [0x0e,0x25,0x1d,0xd5] +// CHECK-UNKNOWN: d51d250e msr GCSCR_EL12, x14 + mrs x15, GCSCR_EL12 -// CHECK: msr GCSCR_EL12, x14 // encoding: [0x0e,0x25,0x1d,0xd5] -// CHECK: mrs x15, GCSCR_EL12 // encoding: [0x0f,0x25,0x3d,0xd5] +// CHECK-INST: mrs x15, GCSCR_EL12 +// CHECK-ENCODING: encoding: [0x0f,0x25,0x3d,0xd5] +// CHECK-UNKNOWN: d53d250f mrs x15, GCSCR_EL12 msr GCSPR_EL12, x16 +// CHECK-INST: msr GCSPR_EL12, x16 +// CHECK-ENCODING: encoding: [0x30,0x25,0x1d,0xd5] +// CHECK-UNKNOWN: d51d2530 msr GCSPR_EL12, x16 + mrs x17, GCSPR_EL12 -// CHECK: msr GCSPR_EL12, x16 // encoding: [0x30,0x25,0x1d,0xd5] -// CHECK: mrs x17, GCSPR_EL12 // encoding: [0
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
@@ -0,0 +1,138 @@ +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \ +// RUN:| llvm-objdump -d --mattr=+lse128 - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \ +// RUN: | llvm-objdump -d --mattr=-lse128 - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+lse128 -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + + +ldclrp x1, x2, [x11] +// CHECK-INST: ldclrp x1, x2, [x11] +// CHECK-ENCODING: encoding: [0x61,0x11,0x22,0x19] +// CHECK-ERROR: :[[@LINE-3]]:1: error: instruction requires: lse128 +// CHECK-UNKNOWN: 19221161 +ldclrp x21, x22, [sp] tmatheson-arm wrote: No spaces between cases? https://github.com/llvm/llvm-project/pull/146329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [AMDGPU] Add support for `v_cvt_f16_bf8` on gfx1250 (PR #146305)
shiltian wrote: ### Merge activity * **Jun 30, 11:47 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/146305). https://github.com/llvm/llvm-project/pull/146305 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [gtest] Fix building on OpenBSD/sparc64 (#145225) (PR #146155)
https://github.com/AaronBallman approved this pull request. LGTM! https://github.com/llvm/llvm-project/pull/146155 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
https://github.com/jthackray edited https://github.com/llvm/llvm-project/pull/146330 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Intrinsic for launching whole wave functions (PR #145859)
@@ -297,8 +297,13 @@ namespace CallingConv { /// directly or indirectly via a call-like instruction. constexpr bool isCallableCC(CallingConv::ID CC) { switch (CC) { + // Called with special intrinsics: + // llvm.amdgcn.cs.chain case CallingConv::AMDGPU_CS_Chain: case CallingConv::AMDGPU_CS_ChainPreserve: + // llvm.amdgcn.call.whole.wave + case CallingConv::AMDGPU_Gfx_WholeWave: rovka wrote: Yeah, that's in the [previous patch](https://github.com/llvm/llvm-project/pull/145858) in this stack. I've added some more tests like you requested :) https://github.com/llvm/llvm-project/pull/145859 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/146343 Currently wasm adds an extra level of options that work backwards from the standard options, and overwrites them. The ExceptionModel field in TM->Options is the standard user configuration option for the exception model to use. MCAsmInfo's ExceptionsType is a constant for the default to use for the triple if not explicitly set in the TargetOptions ExceptionModel. This was adding 2 custom flags, changing the MCAsmInfo default, and overwriting the ExceptionModel from the custom flags. These comments about compiling bitcode with clang are describing a toolchain bug or user error. TargetOptions is bad, and we should move to eliminating it. It is module state not captured in the IR. Ideally the exception model should either come implied from the triple, or a module flag and not depend on this side state. Currently it is the responsibility of the toolchain and/or user to ensure the same command line flags are used at each phase of the compilation. It is not the backend's responsibilty to try to second guess these options. -wasm-enable-eh and -wasm-enable-sjlj should also be removed in favor of the standard exception control. I'm a bit confused by how all of these fields are supposed to interact, but there are a few uses in the backend that are directly looking at these flags instead of the already parsed ExceptionModel which need to be cleaned up. Additionally, this was enforcing some rules about the combinations of flags at a random point in the IR pass pipeline configuration. This is a module property that should be handled at TargetMachine construction time at the latest. This required adding flags to a few mir and clang tests which never got this far to avoid hitting the errors. >From 9868f97f4e1f71dfbb2c12b3f6a9a0f04f5bd42c Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 30 Jun 2025 15:26:44 +0900 Subject: [PATCH] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags Currently wasm adds an extra level of options that work backwards from the standard options, and overwrites them. The ExceptionModel field in TM->Options is the standard user configuration option for the exception model to use. MCAsmInfo's ExceptionsType is a constant for the default to use for the triple if not explicitly set in the TargetOptions ExceptionModel. This was adding 2 custom flags, changing the MCAsmInfo default, and overwriting the ExceptionModel from the custom flags. These comments about compiling bitcode with clang are describing a toolchain bug or user error. TargetOptions is bad, and we should move to eliminating it. It is module state not captured in the IR. Ideally the exception model should either come implied from the triple, or a module flag and not depend on this side state. Currently it is the responsibility of the toolchain and/or user to ensure the same command line flags are used at each phase of the compilation. It is not the backend's responsibilty to try to second guess these options. -wasm-enable-eh and -wasm-enable-sjlj should also be removed in favor of the standard exception control. I'm a bit confused by how all of these fields are supposed to interact, but there are a few uses in the backend that are directly looking at these flags instead of the already parsed ExceptionModel which need to be cleaned up. Additionally, this was enforcing some rules about the combinations of flags at a random point in the IR pass pipeline configuration. This is a module property that should be handled at TargetMachine construction time at the latest. This required adding flags to a few mir and clang tests which never got this far to avoid hitting the errors. --- ...asm-exception-model-flag-parse-ir-input.ll | 7 +- clang/test/CodeGenCXX/builtins-eh-wasm.cpp| 2 +- clang/test/CodeGenCXX/wasm-eh.cpp | 6 +- .../MCTargetDesc/WebAssemblyMCAsmInfo.cpp | 9 +- .../MCTargetDesc/WebAssemblyMCTargetDesc.cpp | 29 .../MCTargetDesc/WebAssemblyMCTargetDesc.h| 7 - .../WebAssembly/WebAssemblyAsmPrinter.cpp | 13 +- .../WebAssembly/WebAssemblyAsmPrinter.h | 2 +- .../WebAssembly/WebAssemblyCFGStackify.cpp| 2 +- .../WebAssembly/WebAssemblyLateEHPrepare.cpp | 2 +- .../WebAssembly/WebAssemblyMCInstLower.cpp| 4 +- .../WebAssembly/WebAssemblyTargetMachine.cpp | 139 ++ .../WebAssembly/WebAssemblyTargetMachine.h| 9 ++ .../WebAssembly/cfg-stackify-eh-legacy.mir| 2 +- .../CodeGen/WebAssembly/exception-legacy.mir | 2 +- .../CodeGen/WebAssembly/function-info.mir | 2 +- 16 files changed, 113 insertions(+), 124 deletions(-) diff --git a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll b/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll index 4a7eeece58717..85bfc7f74daed 100644 --- a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse
[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/146343?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#146343** https://app.graphite.dev/github/pr/llvm/llvm-project/146343?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/146343?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#146342** https://app.graphite.dev/github/pr/llvm/llvm-project/146342?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/146343 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)
llvmbot wrote: @llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-backend-webassembly Author: Matt Arsenault (arsenm) Changes Currently wasm adds an extra level of options that work backwards from the standard options, and overwrites them. The ExceptionModel field in TM->Options is the standard user configuration option for the exception model to use. MCAsmInfo's ExceptionsType is a constant for the default to use for the triple if not explicitly set in the TargetOptions ExceptionModel. This was adding 2 custom flags, changing the MCAsmInfo default, and overwriting the ExceptionModel from the custom flags. These comments about compiling bitcode with clang are describing a toolchain bug or user error. TargetOptions is bad, and we should move to eliminating it. It is module state not captured in the IR. Ideally the exception model should either come implied from the triple, or a module flag and not depend on this side state. Currently it is the responsibility of the toolchain and/or user to ensure the same command line flags are used at each phase of the compilation. It is not the backend's responsibilty to try to second guess these options. -wasm-enable-eh and -wasm-enable-sjlj should also be removed in favor of the standard exception control. I'm a bit confused by how all of these fields are supposed to interact, but there are a few uses in the backend that are directly looking at these flags instead of the already parsed ExceptionModel which need to be cleaned up. Additionally, this was enforcing some rules about the combinations of flags at a random point in the IR pass pipeline configuration. This is a module property that should be handled at TargetMachine construction time at the latest. This required adding flags to a few mir and clang tests which never got this far to avoid hitting the errors. --- Patch is 25.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146343.diff 16 Files Affected: - (modified) clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll (+4-3) - (modified) clang/test/CodeGenCXX/builtins-eh-wasm.cpp (+1-1) - (modified) clang/test/CodeGenCXX/wasm-eh.cpp (+3-3) - (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCAsmInfo.cpp (+1-8) - (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.cpp (-29) - (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h (-7) - (modified) llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp (+7-6) - (modified) llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h (+1-1) - (modified) llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (+1-1) - (modified) llvm/lib/Target/WebAssembly/WebAssemblyLateEHPrepare.cpp (+1-1) - (modified) llvm/lib/Target/WebAssembly/WebAssemblyMCInstLower.cpp (+1-3) - (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp (+81-58) - (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.h (+9) - (modified) llvm/test/CodeGen/WebAssembly/cfg-stackify-eh-legacy.mir (+1-1) - (modified) llvm/test/CodeGen/WebAssembly/exception-legacy.mir (+1-1) - (modified) llvm/test/CodeGen/WebAssembly/function-info.mir (+1-1) ``diff diff --git a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll b/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll index 4a7eeece58717..85bfc7f74daed 100644 --- a/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll +++ b/clang/test/CodeGen/WebAssembly/wasm-exception-model-flag-parse-ir-input.ll @@ -2,15 +2,16 @@ ; Check all the options parse ; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=none %s | FileCheck %s -; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=wasm %s | FileCheck %s -; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=dwarf %s | FileCheck %s -; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=sjlj %s | FileCheck %s +; RUN: %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=wasm -mllvm -wasm-enable-eh %s | FileCheck %s ; RUN: not %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=invalid %s 2>&1 | FileCheck -check-prefix=ERR %s +; RUN: not %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=dwarf %s 2>&1 | FileCheck -check-prefix=ERR-BE %s +; RUN: not %clang_cc1 -triple wasm32 -o - -emit-llvm -exception-model=sjlj %s 2>&1 | FileCheck -check-prefix=ERR-BE %s ; CHECK-LABEL: define void @test( ; ERR: error: invalid value 'invalid' in '-exception-model=invalid' +; ERR-BE: fatal error: error in backend: -exception-model should be either 'none' or 'wasm' define void @test() { ret void } diff --git a/clang/test/CodeGenCXX/builtins-eh-wasm.cpp b/clang/test/CodeGenCXX/builtins-eh-wasm.cpp index b0f763d3e54dc..9a7134c48f208 100644 --- a/clang/test/CodeGenCXX/builtins-eh-wasm.cpp +++ b/clang/test/CodeGenCXX/builtins-eh-wa
[llvm-branch-commits] [clang] [llvm] WebAssembly: Stop changing MCAsmInfo's ExceptionsType based on flags (PR #146343)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/146343 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Add tests for missing readanylane combines (PR #145910)
@@ -0,0 +1,166 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -new-reg-bank-select < %s | FileCheck %s + +define amdgpu_ps void @readanylane_to_virtual_vgpr(ptr addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) { +; CHECK-LABEL: readanylane_to_virtual_vgpr: +; CHECK: ; %bb.0: +; CHECK-NEXT:v_mov_b32_e32 v0, 0 +; CHECK-NEXT:global_load_dword v1, v0, s[0:1] glc dlc +; CHECK-NEXT:s_waitcnt vmcnt(0) +; CHECK-NEXT:global_store_dword v0, v1, s[2:3] +; CHECK-NEXT:s_endpgm + %load = load volatile float, ptr addrspace(1) %ptr0 + store float %load, ptr addrspace(1) %ptr1 + ret void +} + +define amdgpu_ps float @readanylane_to_physical_vgpr(ptr addrspace(1) inreg %ptr) { +; CHECK-LABEL: readanylane_to_physical_vgpr: +; CHECK: ; %bb.0: +; CHECK-NEXT:v_mov_b32_e32 v0, 0 +; CHECK-NEXT:global_load_dword v0, v0, s[0:1] glc dlc +; CHECK-NEXT:s_waitcnt vmcnt(0) +; CHECK-NEXT:v_readfirstlane_b32 s0, v0 +; CHECK-NEXT:v_mov_b32_e32 v0, s0 +; CHECK-NEXT:; return to shader part epilog + %load = load volatile float, ptr addrspace(1) %ptr + ret float %load +} + +define amdgpu_ps void @readanylane_to_bitcast_to_virtual_vgpr(ptr addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) { +; CHECK-LABEL: readanylane_to_bitcast_to_virtual_vgpr: +; CHECK: ; %bb.0: +; CHECK-NEXT:v_mov_b32_e32 v0, 0 +; CHECK-NEXT:global_load_dword v1, v0, s[0:1] glc dlc +; CHECK-NEXT:s_waitcnt vmcnt(0) +; CHECK-NEXT:v_readfirstlane_b32 s0, v1 +; CHECK-NEXT:v_mov_b32_e32 v1, s0 +; CHECK-NEXT:global_store_dword v0, v1, s[2:3] +; CHECK-NEXT:s_endpgm + %load = load volatile <2 x i16>, ptr addrspace(1) %ptr0 + %bitcast = bitcast <2 x i16> %load to i32 + store i32 %bitcast, ptr addrspace(1) %ptr1 + ret void +} + +define amdgpu_ps float @readanylane_to_bitcast_to_physical_vgpr(ptr addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) { +; CHECK-LABEL: readanylane_to_bitcast_to_physical_vgpr: +; CHECK: ; %bb.0: +; CHECK-NEXT:v_mov_b32_e32 v0, 0 +; CHECK-NEXT:global_load_dword v0, v0, s[0:1] glc dlc +; CHECK-NEXT:s_waitcnt vmcnt(0) +; CHECK-NEXT:v_readfirstlane_b32 s0, v0 +; CHECK-NEXT:v_mov_b32_e32 v0, s0 +; CHECK-NEXT:; return to shader part epilog + %load = load volatile <2 x i16>, ptr addrspace(1) %ptr0 + %bitcast = bitcast <2 x i16> %load to float + ret float %bitcast +} + +define amdgpu_ps void @unmerge_readanylane_merge_to_virtual_vgpr(ptr addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) { +; CHECK-LABEL: unmerge_readanylane_merge_to_virtual_vgpr: +; CHECK: ; %bb.0: +; CHECK-NEXT:v_mov_b32_e32 v2, 0 +; CHECK-NEXT:global_load_dwordx2 v[0:1], v2, s[0:1] glc dlc +; CHECK-NEXT:s_waitcnt vmcnt(0) +; CHECK-NEXT:v_readfirstlane_b32 s0, v0 +; CHECK-NEXT:v_readfirstlane_b32 s1, v1 +; CHECK-NEXT:v_mov_b32_e32 v0, s0 +; CHECK-NEXT:v_mov_b32_e32 v1, s1 +; CHECK-NEXT:global_store_dwordx2 v2, v[0:1], s[2:3] +; CHECK-NEXT:s_endpgm + %load = load volatile i64, ptr addrspace(1) %ptr0 + store i64 %load, ptr addrspace(1) %ptr1 + ret void +} + +;define amdgpu_ps double @unmerge_readanylane_merge_to_physical_vgpr(ptr addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) { +; %load = load volatile double, ptr addrspace(1) %ptr0 +; ret double %load +;} + +define amdgpu_ps void @unmerge_readanylane_merge_bitcast_to_virtual_vgpr(ptr addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) { +; CHECK-LABEL: unmerge_readanylane_merge_bitcast_to_virtual_vgpr: +; CHECK: ; %bb.0: +; CHECK-NEXT:v_mov_b32_e32 v2, 0 +; CHECK-NEXT:global_load_dwordx2 v[0:1], v2, s[0:1] glc dlc +; CHECK-NEXT:s_waitcnt vmcnt(0) +; CHECK-NEXT:v_readfirstlane_b32 s0, v0 +; CHECK-NEXT:v_readfirstlane_b32 s1, v1 +; CHECK-NEXT:v_mov_b32_e32 v0, s0 +; CHECK-NEXT:v_mov_b32_e32 v1, s1 +; CHECK-NEXT:global_store_dwordx2 v2, v[0:1], s[2:3] +; CHECK-NEXT:s_endpgm + %load = load volatile <2 x i32>, ptr addrspace(1) %ptr0 + %bitcast = bitcast <2 x i32> %load to double + store double %bitcast, ptr addrspace(1) %ptr1 + ret void +} + +;define amdgpu_ps double @unmerge_readanylane_merge_bitcast_to_physical_vgpr(ptr addrspace(1) inreg %ptr0, ptr addrspace(1) inreg %ptr1) { +; %load = load volatile <2 x i32>, ptr addrspace(1) %ptr0 +; %bitcast = bitcast <2 x i32> %load to double +; ret double %bitcast +;} petar-avramovic wrote: There is no combine happening in commented out tests. It is path where there is copy to physical vgpr, but only one that exists is calling convention with float return, and there is none for larger physical vgprs (they return to sgpr) so I edited mir test with `SI_RETURN_TO_EPILOG implicit $vgpr0_vgpr1` to test that combine works. https://github.com/llvm/llvm-project/pull/145910 ___
[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/145484 >From b031681978e2b356c2ae8e65d6e08515c0044ac1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Tue, 24 Jun 2025 11:35:58 +0200 Subject: [PATCH 1/2] [AMDGPU] Use reverse iteration in CodeGenPrepare In order to make this easier, I also removed all "removeFromParent" calls from the visitors, instead adding instructions to a set of instructions to delete once the function has been visited. This avoids crashes due to functions deleting their operands. In theory we could allow functions to delete the instruction they visited (and only that one) but I think having one idiom for everything is less error-prone. Fixes #140219 --- .../Target/AMDGPU/AMDGPUCodeGenPrepare.cpp| 82 --- ...egenprepare-break-large-phis-heuristics.ll | 42 +++--- .../AMDGPU/amdgpu-codegenprepare-fdiv.ll | 110 +- llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll| 138 +- llvm/test/CodeGen/AMDGPU/uniform-select.ll| 64 5 files changed, 185 insertions(+), 251 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp index 5f1983791cfae..2a3aa1ac672b6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp @@ -15,6 +15,7 @@ #include "AMDGPU.h" #include "AMDGPUTargetMachine.h" #include "SIModeRegisterDefaults.h" +#include "llvm/ADT/SetVector.h" #include "llvm/Analysis/AssumptionCache.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/TargetLibraryInfo.h" @@ -109,6 +110,7 @@ class AMDGPUCodeGenPrepareImpl bool FlowChanged = false; mutable Function *SqrtF32 = nullptr; mutable Function *LdexpF32 = nullptr; + mutable SetVector DeadVals; DenseMap BreakPhiNodesCache; @@ -285,28 +287,19 @@ bool AMDGPUCodeGenPrepareImpl::run() { BreakPhiNodesCache.clear(); bool MadeChange = false; - Function::iterator NextBB; - for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; FI = NextBB) { -BasicBlock *BB = &*FI; -NextBB = std::next(FI); - -BasicBlock::iterator Next; -for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; - I = Next) { - Next = std::next(I); - - MadeChange |= visit(*I); - - if (Next != E) { // Control flow changed -BasicBlock *NextInstBB = Next->getParent(); -if (NextInstBB != BB) { - BB = NextInstBB; - E = BB->end(); - FE = F.end(); -} - } + for (BasicBlock &BB : reverse(F)) { +for (Instruction &I : make_early_inc_range(reverse(BB))) { + if (!DeadVals.contains(&I)) +MadeChange |= visit(I); } } + + while (!DeadVals.empty()) { +RecursivelyDeleteTriviallyDeadInstructions( +DeadVals.pop_back_val(), TLI, /*MSSAU*/ nullptr, +[&](Value *V) { DeadVals.remove(V); }); + } + return MadeChange; } @@ -426,7 +419,7 @@ bool AMDGPUCodeGenPrepareImpl::replaceMulWithMul24(BinaryOperator &I) const { Value *NewVal = insertValues(Builder, Ty, ResultVals); NewVal->takeName(&I); I.replaceAllUsesWith(NewVal); - I.eraseFromParent(); + DeadVals.insert(&I); return true; } @@ -500,10 +493,10 @@ bool AMDGPUCodeGenPrepareImpl::foldBinOpIntoSelect(BinaryOperator &BO) const { FoldedT, FoldedF); NewSelect->takeName(&BO); BO.replaceAllUsesWith(NewSelect); - BO.eraseFromParent(); + DeadVals.insert(&BO); if (CastOp) -CastOp->eraseFromParent(); - Sel->eraseFromParent(); +DeadVals.insert(CastOp); + DeadVals.insert(Sel); return true; } @@ -900,7 +893,7 @@ bool AMDGPUCodeGenPrepareImpl::visitFDiv(BinaryOperator &FDiv) { if (NewVal) { FDiv.replaceAllUsesWith(NewVal); NewVal->takeName(&FDiv); -RecursivelyDeleteTriviallyDeadInstructions(&FDiv, TLI); +DeadVals.insert(&FDiv); } return true; @@ -1310,7 +1303,8 @@ within the byte are all 0. static bool tryNarrowMathIfNoOverflow(Instruction *I, const SITargetLowering *TLI, const TargetTransformInfo &TTI, - const DataLayout &DL) { + const DataLayout &DL, + SetVector &DeadVals) { unsigned Opc = I->getOpcode(); Type *OldType = I->getType(); @@ -1365,7 +1359,7 @@ static bool tryNarrowMathIfNoOverflow(Instruction *I, Value *Zext = Builder.CreateZExt(Arith, OldType); I->replaceAllUsesWith(Zext); - I->eraseFromParent(); + DeadVals.insert(I); return true; } @@ -1376,7 +1370,7 @@ bool AMDGPUCodeGenPrepareImpl::visitBinaryOperator(BinaryOperator &I) { if (UseMul24Intrin && replaceMulWithMul24(I)) return true; if (tryNarrowMathIfNoOverflow(&I, ST.getTargetLowering(), -TM.getTargetTransformIn
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
Pierre-vh wrote: > Why DAG and not InstCombine for this? The intrinsics we want to optimize with this aren't lowered yet at IC https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146054 >From 17ac90ad1ee167f35321e01625a207f2b94ff523 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 26 Jun 2025 13:31:37 +0200 Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences Fold sequences where we extract a bunch of contiguous bits from a value, merge them into the low bit and then check if the low bits are zero or not. It seems like a strange sequence at first but it's an idiom used by device libs in device libs to check workitem IDs for AMDGPU. The reason I put this in DAGCombiner instead of the target combiner is because this is a generic, valid transform that's also fairly niche, so there isn't much risk of a combine loop I think. See #136727 --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++- .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++-- 2 files changed, 91 insertions(+), 29 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 08dab7c697b99..a189208d3a62e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); + + if (Root.getOpcode() != ISD::AND) +return SDValue(); + + SDValue N0 = Root.getOperand(0); + SDValue N1 = Root.getOperand(1); + + if (N0.getOpcode() != ISD::OR || !isa(N1)) +return SDValue(); + + APInt RootMask = cast(N1)->getAsAPIntVal(); + if (!RootMask.isMask()) +return SDValue(); + + SDValue Src; + const auto IsSrc = [&](SDValue V) { +if (!Src) { + Src = V; + return true; +} + +return Src == V; + }; + + SmallVector Worklist = {N0}; + APInt PartsMask(VT.getSizeInBits(), 0); + while (!Worklist.empty()) { +SDValue V = Worklist.pop_back_val(); +if (!V.hasOneUse() && Src != V) + return SDValue(); + +if (V.getOpcode() == ISD::OR) { + Worklist.push_back(V.getOperand(0)); + Worklist.push_back(V.getOperand(1)); + continue; +} + +if (V.getOpcode() == ISD::SRL) { + SDValue ShiftSrc = V.getOperand(0); + SDValue ShiftAmt = V.getOperand(1); + + if (!IsSrc(ShiftSrc) || !isa(ShiftAmt)) +return SDValue(); + + PartsMask |= (RootMask << cast(ShiftAmt)->getAsZExtVal()); + continue; +} + +if (IsSrc(V)) { + PartsMask |= RootMask; + continue; +} + +return SDValue(); + } + + if (!RootMask.isMask() || !Src) +return SDValue(); + + SDLoc DL(Root); + return DAG.getNode(ISD::AND, DL, VT, + {Src, DAG.getConstant(PartsMask, DL, VT)}); +} + /// This is a stub for TargetLowering::SimplifySetCC. SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond, const SDLoc &DL, bool foldBooleans) { TargetLowering::DAGCombinerInfo DagCombineInfo(DAG, Level, false, this); - return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL); + if (SDValue C = + TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL)) +return C; + + if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) && + N0.getOpcode() == ISD::AND && isNullConstant(N1)) { + +if (SDValue Res = matchMergedBFX(N0, DAG, TLI)) + return DAG.getSetCC(DL, VT, Res, N1, Cond); + } + + return SDValue(); } /// Given an ISD::SDIV node expressing a divide by constant, return diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll index 07c4aeb1ac7df..64d055bc40e98 100644 --- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll +++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll @@ -12,11 +12,7 @@ define i1 @workitem_zero() { ; DAGISEL-GFX8-LABEL: workitem_zero: ; DAGISEL-GFX8: ; %bb.0: ; %entry ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31 -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0 -; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31] @@ -
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Improve readanylane combines in regbanklegalize (PR #145911)
@@ -115,126 +117,233 @@ class AMDGPURegBankLegalizeCombiner { VgprRB(&RBI.getRegBank(AMDGPU::VGPRRegBankID)), VccRB(&RBI.getRegBank(AMDGPU::VCCRegBankID)) {}; - bool isLaneMask(Register Reg) { -const RegisterBank *RB = MRI.getRegBankOrNull(Reg); -if (RB && RB->getID() == AMDGPU::VCCRegBankID) - return true; + bool isLaneMask(Register Reg); + std::pair tryMatch(Register Src, unsigned Opcode); + std::pair tryMatchRALFromUnmerge(Register Src); + Register getReadAnyLaneSrc(Register Src); + void replaceRegWithOrBuildCopy(Register Dst, Register Src); -const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg); -return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1); - } + bool tryEliminateReadAnyLane(MachineInstr &Copy); + void tryCombineCopy(MachineInstr &MI); + void tryCombineS1AnyExt(MachineInstr &MI); +}; - void cleanUpAfterCombine(MachineInstr &MI, MachineInstr *Optional0) { -MI.eraseFromParent(); -if (Optional0 && isTriviallyDead(*Optional0, MRI)) - Optional0->eraseFromParent(); - } +bool AMDGPURegBankLegalizeCombiner::isLaneMask(Register Reg) { + const RegisterBank *RB = MRI.getRegBankOrNull(Reg); + if (RB && RB->getID() == AMDGPU::VCCRegBankID) +return true; - std::pair tryMatch(Register Src, unsigned Opcode) { -MachineInstr *MatchMI = MRI.getVRegDef(Src); -if (MatchMI->getOpcode() != Opcode) - return {nullptr, Register()}; -return {MatchMI, MatchMI->getOperand(1).getReg()}; - } + const TargetRegisterClass *RC = MRI.getRegClassOrNull(Reg); + return RC && TRI.isSGPRClass(RC) && MRI.getType(Reg) == LLT::scalar(1); +} - void tryCombineCopy(MachineInstr &MI) { -Register Dst = MI.getOperand(0).getReg(); -Register Src = MI.getOperand(1).getReg(); -// Skip copies of physical registers. -if (!Dst.isVirtual() || !Src.isVirtual()) - return; - -// This is a cross bank copy, sgpr S1 to lane mask. -// -// %Src:sgpr(s1) = G_TRUNC %TruncS32Src:sgpr(s32) -// %Dst:lane-mask(s1) = COPY %Src:sgpr(s1) -// -> -// %Dst:lane-mask(s1) = G_AMDGPU_COPY_VCC_SCC %TruncS32Src:sgpr(s32) -if (isLaneMask(Dst) && MRI.getRegBankOrNull(Src) == SgprRB) { - auto [Trunc, TruncS32Src] = tryMatch(Src, AMDGPU::G_TRUNC); - assert(Trunc && MRI.getType(TruncS32Src) == S32 && - "sgpr S1 must be result of G_TRUNC of sgpr S32"); - - B.setInstr(MI); - // Ensure that truncated bits in BoolSrc are 0. - auto One = B.buildConstant({SgprRB, S32}, 1); - auto BoolSrc = B.buildAnd({SgprRB, S32}, TruncS32Src, One); - B.buildInstr(AMDGPU::G_AMDGPU_COPY_VCC_SCC, {Dst}, {BoolSrc}); - cleanUpAfterCombine(MI, Trunc); - return; -} +std::pair +AMDGPURegBankLegalizeCombiner::tryMatch(Register Src, unsigned Opcode) { + MachineInstr *MatchMI = MRI.getVRegDef(Src); + if (MatchMI->getOpcode() != Opcode) +return {nullptr, Register()}; + return {MatchMI, MatchMI->getOperand(1).getReg()}; +} + +std::pair +AMDGPURegBankLegalizeCombiner::tryMatchRALFromUnmerge(Register Src) { + MachineInstr *ReadAnyLane = MRI.getVRegDef(Src); + if (ReadAnyLane->getOpcode() != AMDGPU::G_AMDGPU_READANYLANE) +return {nullptr, -1}; + + Register RALSrc = ReadAnyLane->getOperand(1).getReg(); + if (auto *UnMerge = getOpcodeDef(RALSrc, MRI)) +return {UnMerge, UnMerge->findRegisterDefOperandIdx(RALSrc, nullptr)}; -// Src = G_AMDGPU_READANYLANE RALSrc -// Dst = COPY Src -// -> -// Dst = RALSrc -if (MRI.getRegBankOrNull(Dst) == VgprRB && -MRI.getRegBankOrNull(Src) == SgprRB) { - auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); - if (!RAL) -return; - - assert(MRI.getRegBank(RALSrc) == VgprRB); - MRI.replaceRegWith(Dst, RALSrc); - cleanUpAfterCombine(MI, RAL); - return; + return {nullptr, -1}; +} + +Register AMDGPURegBankLegalizeCombiner::getReadAnyLaneSrc(Register Src) { + // Src = G_AMDGPU_READANYLANE RALSrc + auto [RAL, RALSrc] = tryMatch(Src, AMDGPU::G_AMDGPU_READANYLANE); + if (RAL) +return RALSrc; + + // LoVgpr, HiVgpr = G_UNMERGE_VALUES UnmergeSrc + // LoSgpr = G_AMDGPU_READANYLANE LoVgpr + // HiSgpr = G_AMDGPU_READANYLANE HiVgpr + // Src G_MERGE_VALUES LoSgpr, HiSgpr + auto *Merge = getOpcodeDef(Src, MRI); + if (Merge) { +unsigned NumElts = Merge->getNumSources(); +auto [Unmerge, Idx] = tryMatchRALFromUnmerge(Merge->getSourceReg(0)); +if (!Unmerge || Unmerge->getNumDefs() != NumElts || Idx != 0) + return {}; + +// Check if all elements are from same unmerge and there is no shuffling. +for (unsigned i = 1; i < NumElts; ++i) { + auto [UnmergeI, IdxI] = tryMatchRALFromUnmerge(Merge->getSourceReg(i)); + if (UnmergeI != Unmerge || (unsigned)IdxI != i) +return {}; } +return Unmerge->getSourceReg(); } - void tryCombineS1AnyExt(MachineInstr &MI) { -// %Src:sgpr(S1) = G_TRUNC %TruncSrc -
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
@@ -15136,6 +15136,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode *N, return Folded; } + // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if + // that transformation can't block an offset folding at any use of the ptradd. + // This should be done late, after legalization, so that it doesn't block + // other ptradd combines that could enable more offset folding. + bool HasIntermediateAssertAlign = + N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd(); + // This is a hack to work around an ordering problem for DAGs like this: + // (ptradd (AssertAlign (ptradd p, c1), k), c2) + // If the outer ptradd is handled first by the DAGCombiner, it can be + // transformed into a disjoint or. Then, when the generic AssertAlign combine + // pushes the AssertAlign through the inner ptradd, it's too late for the + // ptradd reassociation to trigger. + if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign && + DAG.haveNoCommonBitsSet(N0, N1)) { +bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) { + if (auto *LoadStore = dyn_cast(User); + LoadStore && LoadStore->getBasePtr().getNode() == N) { +unsigned AS = LoadStore->getAddressSpace(); +// Currently, we only really need ptradds to fold offsets into flat +// memory instructions. +if (AS != AMDGPUAS::FLAT_ADDRESS) + return false; +TargetLoweringBase::AddrMode AM; +AM.HasBaseReg = true; +EVT VT = LoadStore->getMemoryVT(); +Type *AccessTy = VT.getTypeForEVT(*DAG.getContext()); +return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS); + } + return false; ritter-x2a wrote: I'm not sure if every backend that could want to use ptradd nodes would want to transform them to ORs. However, there is probably not much point in developing for hypothetical backends, so I moved it to the generic combines for now, behind the `canTransformPtrArithOutOfBounds()` check (I also fixed it to actually check the intended addressing mode). Dropping the target-specific `AS != AMDGPUAS::FLAT_ADDRESS` check affects the generated code in two lit tests ([identical-subrange-spill-infloop.ll](https://github.com/llvm/llvm-project/pull/146076/files#diff-517b7174ca71caeed2dd13ec440ee963e4db61f01911ff1cbc86ab0e60f16721) and [store-weird-sizes.ll](https://github.com/llvm/llvm-project/pull/146076/files#diff-32010dfaf8188291719044adb5c6e927b17fe3e3657e0f27ebe2e2a10a020889)). But, looking more into it, I find that - the new code for `identical-subrange-spill-infloop.ll` could be argued to be an improvement (offsets for SMRD instructions are folded where they weren't before) and - the problem for `store-weird-sizes.ll` seems to be that `SITargetLowering::isLegalAddressingMode` is overly optimistic when asked if "register + constant offset" is a valid addressing mode for AS4 on architectures predating `global_*` instructions. So this should be fixed there. We already have a [generic combine](https://github.com/llvm/llvm-project/blob/629126ed44bd3ce5b6f476459c805be4e4e0c2ca/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L15173-L15196) that pushes the AssertAlign through the (ptr)add. The handling here was necessary because that combine would only be applied after the outer PTRADD was already visited and combined into an OR. However, that doesn't seem to happen anymore in the tests when it's a generic combine, so I dropped this handling as well. https://github.com/llvm/llvm-project/pull/146075 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
@@ -416,6 +416,60 @@ entry: ret void } +; Check that ptradds can be lowered to disjoint ORs. +define ptr @gep_disjoint_or(ptr %base) { +; GFX942-LABEL: gep_disjoint_or: +; GFX942: ; %bb.0: +; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4 +; GFX942-NEXT:s_setpc_b64 s[30:31] + %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0) + %gep = getelementptr nuw inbounds i8, ptr %p, i64 4 + ret ptr %gep +} + +; Check that AssertAlign nodes between ptradd nodes don't block offset folding, +; taken from preload-implicit-kernargs.ll +define amdgpu_kernel void @random_incorrect_offset(ptr addrspace(1) inreg %out) #0 { +; GFX942_PTRADD-LABEL: random_incorrect_offset: +; GFX942_PTRADD: ; %bb.1: +; GFX942_PTRADD-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x0 +; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0) +; GFX942_PTRADD-NEXT:s_branch .LBB21_0 +; GFX942_PTRADD-NEXT:.p2align 8 +; GFX942_PTRADD-NEXT: ; %bb.2: +; GFX942_PTRADD-NEXT: .LBB21_0: +; GFX942_PTRADD-NEXT:s_load_dword s0, s[0:1], 0xa +; GFX942_PTRADD-NEXT:v_mov_b32_e32 v0, 0 +; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0) +; GFX942_PTRADD-NEXT:v_mov_b32_e32 v1, s0 +; GFX942_PTRADD-NEXT:global_store_dword v0, v1, s[2:3] +; GFX942_PTRADD-NEXT:s_endpgm +; +; GFX942_LEGACY-LABEL: random_incorrect_offset: +; GFX942_LEGACY: ; %bb.1: +; GFX942_LEGACY-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x0 +; GFX942_LEGACY-NEXT:s_waitcnt lgkmcnt(0) +; GFX942_LEGACY-NEXT:s_branch .LBB21_0 +; GFX942_LEGACY-NEXT:.p2align 8 +; GFX942_LEGACY-NEXT: ; %bb.2: +; GFX942_LEGACY-NEXT: .LBB21_0: +; GFX942_LEGACY-NEXT:s_mov_b32 s4, 8 +; GFX942_LEGACY-NEXT:s_load_dword s0, s[0:1], s4 offset:0x2 +; GFX942_LEGACY-NEXT:v_mov_b32_e32 v0, 0 +; GFX942_LEGACY-NEXT:s_waitcnt lgkmcnt(0) +; GFX942_LEGACY-NEXT:v_mov_b32_e32 v1, s0 +; GFX942_LEGACY-NEXT:global_store_dword v0, v1, s[2:3] +; GFX942_LEGACY-NEXT:s_endpgm + %imp_arg_ptr = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr() + %gep = getelementptr i8, ptr addrspace(4) %imp_arg_ptr, i32 2 + %load = load i32, ptr addrspace(4) %gep + store i32 %load, ptr addrspace(1) %out + ret void +} + declare void @llvm.memcpy.p0.p4.i64(ptr noalias nocapture writeonly, ptr addrspace(4) noalias nocapture readonly, i64, i1 immarg) !0 = !{} + +attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" } ritter-x2a wrote: Removed. https://github.com/llvm/llvm-project/pull/146075 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Robert Imschweiler (ro-i) Changes OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary codegen changes. --- Patch is 223.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146405.diff 10 Files Affected: - (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+38-23) - (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+44-10) - (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+35-26) - (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+21-5) - (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+9-1) - (modified) clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp (+703-35) - (modified) clang/test/OpenMP/parallel_num_threads_codegen.cpp (+33) - (modified) clang/test/OpenMP/target_parallel_num_threads_messages.cpp (+72-3) - (added) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp (+1642) - (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+12) ``diff diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index 8ccc37ef98a74..13a2d77bc156a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1845,11 +1845,11 @@ void CGOpenMPRuntime::emitIfClause(CodeGenFunction &CGF, const Expr *Cond, CGF.EmitBlock(ContBlock, /*IsFinished=*/true); } -void CGOpenMPRuntime::emitParallelCall(CodeGenFunction &CGF, SourceLocation Loc, - llvm::Function *OutlinedFn, - ArrayRef CapturedVars, - const Expr *IfCond, - llvm::Value *NumThreads) { +void CGOpenMPRuntime::emitParallelCall( +CodeGenFunction &CGF, SourceLocation Loc, llvm::Function *OutlinedFn, +ArrayRef CapturedVars, const Expr *IfCond, +llvm::Value *NumThreads, OpenMPNumThreadsClauseModifier NumThreadsModifier, +OpenMPSeverityClauseKind Severity, const Expr *Message) { if (!CGF.HaveInsertPoint()) return; llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc); @@ -2699,18 +2699,33 @@ llvm::Value *CGOpenMPRuntime::emitForNext(CodeGenFunction &CGF, CGF.getContext().BoolTy, Loc); } -void CGOpenMPRuntime::emitNumThreadsClause(CodeGenFunction &CGF, - llvm::Value *NumThreads, - SourceLocation Loc) { +void CGOpenMPRuntime::emitNumThreadsClause( +CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc, +OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity, +const Expr *Message) { if (!CGF.HaveInsertPoint()) return; + llvm::SmallVector Args( + {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc), + CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)}); // Build call __kmpc_push_num_threads(&loc, global_tid, num_threads) - llvm::Value *Args[] = { - emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc), - CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)}; - CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_push_num_threads), - Args); + // or __kmpc_push_num_threads_strict(&loc, global_tid, num_threads, severity, + // messsage) if strict modifier is used. + RuntimeFunction FnID = OMPRTL___kmpc_push_num_threads; + if (Modifier == OMPC_NUMTHREADS_strict) { +FnID = OMPRTL___kmpc_push_num_threads_strict; +// OpenMP 6.0, 10.4: "If no severity clause is specified then the effect is +// as if sev-level is fatal." +Args.push_back(llvm::ConstantInt::get( +CGM.Int32Ty, Severity == OMPC_SEVERITY_warning ? 1 : 2)); +if (Message) + Args.push_back(CGF.EmitStringLiteralLValue(cast(Message)) + .getPointer(CGF)); +else + Args.push_back(llvm::ConstantPointerNull::get(CGF.VoidPtrTy)); + } + CGF.EmitRuntimeCall( + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args); } void CGOpenMPRuntime::emitProcBindClause(CodeGenFunction &CGF, @@ -11986,12 +12001,11 @@ llvm::Function *CGOpenMPSIMDRuntime::emitTaskOutlinedFunction( llvm_unreachable("Not supported in SIMD-only mode"); } -void CGOpenMPSIMDRuntime::emitParallelCall(CodeGenFunction &CGF, - SourceLocation Loc, - llvm::Function *OutlinedFn, - ArrayRef CapturedVars, - const Expr *IfCond, - llvm::Value *NumThreads) { +void CGOpenMPSIMDRuntime::emitParallelCall( +CodeGenFunction &CGF, SourceLocation L
[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)
llvmbot wrote: @llvm/pr-subscribers-clang-codegen Author: Robert Imschweiler (ro-i) Changes OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary codegen changes. --- Patch is 223.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146405.diff 10 Files Affected: - (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+38-23) - (modified) clang/lib/CodeGen/CGOpenMPRuntime.h (+44-10) - (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+35-26) - (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+21-5) - (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+9-1) - (modified) clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp (+703-35) - (modified) clang/test/OpenMP/parallel_num_threads_codegen.cpp (+33) - (modified) clang/test/OpenMP/target_parallel_num_threads_messages.cpp (+72-3) - (added) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp (+1642) - (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+12) ``diff diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index 8ccc37ef98a74..13a2d77bc156a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1845,11 +1845,11 @@ void CGOpenMPRuntime::emitIfClause(CodeGenFunction &CGF, const Expr *Cond, CGF.EmitBlock(ContBlock, /*IsFinished=*/true); } -void CGOpenMPRuntime::emitParallelCall(CodeGenFunction &CGF, SourceLocation Loc, - llvm::Function *OutlinedFn, - ArrayRef CapturedVars, - const Expr *IfCond, - llvm::Value *NumThreads) { +void CGOpenMPRuntime::emitParallelCall( +CodeGenFunction &CGF, SourceLocation Loc, llvm::Function *OutlinedFn, +ArrayRef CapturedVars, const Expr *IfCond, +llvm::Value *NumThreads, OpenMPNumThreadsClauseModifier NumThreadsModifier, +OpenMPSeverityClauseKind Severity, const Expr *Message) { if (!CGF.HaveInsertPoint()) return; llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc); @@ -2699,18 +2699,33 @@ llvm::Value *CGOpenMPRuntime::emitForNext(CodeGenFunction &CGF, CGF.getContext().BoolTy, Loc); } -void CGOpenMPRuntime::emitNumThreadsClause(CodeGenFunction &CGF, - llvm::Value *NumThreads, - SourceLocation Loc) { +void CGOpenMPRuntime::emitNumThreadsClause( +CodeGenFunction &CGF, llvm::Value *NumThreads, SourceLocation Loc, +OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity, +const Expr *Message) { if (!CGF.HaveInsertPoint()) return; + llvm::SmallVector Args( + {emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc), + CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)}); // Build call __kmpc_push_num_threads(&loc, global_tid, num_threads) - llvm::Value *Args[] = { - emitUpdateLocation(CGF, Loc), getThreadID(CGF, Loc), - CGF.Builder.CreateIntCast(NumThreads, CGF.Int32Ty, /*isSigned*/ true)}; - CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_push_num_threads), - Args); + // or __kmpc_push_num_threads_strict(&loc, global_tid, num_threads, severity, + // messsage) if strict modifier is used. + RuntimeFunction FnID = OMPRTL___kmpc_push_num_threads; + if (Modifier == OMPC_NUMTHREADS_strict) { +FnID = OMPRTL___kmpc_push_num_threads_strict; +// OpenMP 6.0, 10.4: "If no severity clause is specified then the effect is +// as if sev-level is fatal." +Args.push_back(llvm::ConstantInt::get( +CGM.Int32Ty, Severity == OMPC_SEVERITY_warning ? 1 : 2)); +if (Message) + Args.push_back(CGF.EmitStringLiteralLValue(cast(Message)) + .getPointer(CGF)); +else + Args.push_back(llvm::ConstantPointerNull::get(CGF.VoidPtrTy)); + } + CGF.EmitRuntimeCall( + OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args); } void CGOpenMPRuntime::emitProcBindClause(CodeGenFunction &CGF, @@ -11986,12 +12001,11 @@ llvm::Function *CGOpenMPSIMDRuntime::emitTaskOutlinedFunction( llvm_unreachable("Not supported in SIMD-only mode"); } -void CGOpenMPSIMDRuntime::emitParallelCall(CodeGenFunction &CGF, - SourceLocation Loc, - llvm::Function *OutlinedFn, - ArrayRef CapturedVars, - const Expr *IfCond, - llvm::Value *NumThreads) { +void CGOpenMPSIMDRuntime::emitParallelCall( +CodeGenFunction &CGF, SourceLo
[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)
llvmbot wrote: @llvm/pr-subscribers-offload Author: Robert Imschweiler (ro-i) Changes OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary device runtime changes. --- Full diff: https://github.com/llvm/llvm-project/pull/146404.diff 3 Files Affected: - (modified) offload/DeviceRTL/include/DeviceTypes.h (+6) - (modified) offload/DeviceRTL/src/Parallelism.cpp (+60-18) - (modified) openmp/runtime/src/kmp.h (+1) ``diff diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/offload/DeviceRTL/include/DeviceTypes.h index 2e5d92380f040..43a5578f1 100644 --- a/offload/DeviceRTL/include/DeviceTypes.h +++ b/offload/DeviceRTL/include/DeviceTypes.h @@ -136,6 +136,12 @@ struct omp_lock_t { void *Lock; }; +// see definition in openmp/runtime kmp.h +typedef enum omp_severity_t { + severity_warning = 1, + severity_fatal = 2 +} omp_severity_t; + using InterWarpCopyFnTy = void (*)(void *src, int32_t warp_num); using ShuffleReductFnTy = void (*)(void *rhsData, int16_t lane_id, int16_t lane_offset, int16_t shortCircuit); diff --git a/offload/DeviceRTL/src/Parallelism.cpp b/offload/DeviceRTL/src/Parallelism.cpp index 08ce616aee1c4..78438a60454b8 100644 --- a/offload/DeviceRTL/src/Parallelism.cpp +++ b/offload/DeviceRTL/src/Parallelism.cpp @@ -45,7 +45,24 @@ using namespace ompx; namespace { -uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { +void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity, + const char *nt_message, int32_t requested, + int32_t actual) { + if (nt_message) +printf("%s\n", nt_message); + else +printf("The computed number of threads (%u) does not match the requested " + "number of threads (%d). Consider that it might not be supported " + "to select exactly %d threads on this target device.\n", + actual, requested, requested); + if (nt_severity == severity_fatal) +__builtin_trap(); +} + +uint32_t determineNumberOfThreads(int32_t NumThreadsClause, + int32_t nt_strict = false, + int32_t nt_severity = severity_fatal, + const char *nt_message = nullptr) { uint32_t NThreadsICV = NumThreadsClause != -1 ? NumThreadsClause : icv::NThreads; uint32_t NumThreads = mapping::getMaxTeamThreads(); @@ -55,13 +72,17 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { // SPMD mode allows any number of threads, for generic mode we round down to a // multiple of WARPSIZE since it is legal to do so in OpenMP. - if (mapping::isSPMDMode()) -return NumThreads; + if (!mapping::isSPMDMode()) { +if (NumThreads < mapping::getWarpSize()) + NumThreads = 1; +else + NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1)); + } - if (NumThreads < mapping::getWarpSize()) -NumThreads = 1; - else -NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1)); + if (NumThreadsClause != -1 && nt_strict && + NumThreads != static_cast(NumThreadsClause)) +num_threads_strict_error(nt_strict, nt_severity, nt_message, + NumThreadsClause, NumThreads); return NumThreads; } @@ -82,12 +103,14 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { extern "C" { -[[clang::always_inline]] void __kmpc_parallel_spmd(IdentTy *ident, - int32_t num_threads, - void *fn, void **args, - const int64_t nargs) { +[[clang::always_inline]] void +__kmpc_parallel_spmd(IdentTy *ident, int32_t num_threads, void *fn, void **args, + const int64_t nargs, int32_t nt_strict = false, + int32_t nt_severity = severity_fatal, + const char *nt_message = nullptr) { uint32_t TId = mapping::getThreadIdInBlock(); - uint32_t NumThreads = determineNumberOfThreads(num_threads); + uint32_t NumThreads = + determineNumberOfThreads(num_threads, nt_strict, nt_severity, nt_message); uint32_t PTeamSize = NumThreads == mapping::getMaxTeamThreads() ? 0 : NumThreads; // Avoid the race between the read of the `icv::Level` above and the write @@ -140,10 +163,11 @@ extern "C" { return; } -[[clang::always_inline]] void -__kmpc_parallel_51(IdentTy *ident, int32_t, int32_t if_expr, - int32_t num_threads, int proc_bind, void *fn, - void *wrapper_fn, void **args, int64_t nargs) { +[[clang::always_inline]] void __kmpc_parallel_51( +IdentTy *ident, int32_t, int32_t if_expr, int32_t num_threads, +int proc_bind, void *fn, void *wrapper_fn, void
[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)
https://github.com/ro-i created https://github.com/llvm/llvm-project/pull/146404 OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary device runtime changes. >From cf566c60db9eef81c39a45082645c9d44992bec5 Mon Sep 17 00:00:00 2001 From: Robert Imschweiler Date: Fri, 27 Jun 2025 07:54:07 -0500 Subject: [PATCH] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary device runtime changes. --- offload/DeviceRTL/include/DeviceTypes.h | 6 ++ offload/DeviceRTL/src/Parallelism.cpp | 78 +++-- openmp/runtime/src/kmp.h| 1 + 3 files changed, 67 insertions(+), 18 deletions(-) diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/offload/DeviceRTL/include/DeviceTypes.h index 2e5d92380f040..43a5578f1 100644 --- a/offload/DeviceRTL/include/DeviceTypes.h +++ b/offload/DeviceRTL/include/DeviceTypes.h @@ -136,6 +136,12 @@ struct omp_lock_t { void *Lock; }; +// see definition in openmp/runtime kmp.h +typedef enum omp_severity_t { + severity_warning = 1, + severity_fatal = 2 +} omp_severity_t; + using InterWarpCopyFnTy = void (*)(void *src, int32_t warp_num); using ShuffleReductFnTy = void (*)(void *rhsData, int16_t lane_id, int16_t lane_offset, int16_t shortCircuit); diff --git a/offload/DeviceRTL/src/Parallelism.cpp b/offload/DeviceRTL/src/Parallelism.cpp index 08ce616aee1c4..78438a60454b8 100644 --- a/offload/DeviceRTL/src/Parallelism.cpp +++ b/offload/DeviceRTL/src/Parallelism.cpp @@ -45,7 +45,24 @@ using namespace ompx; namespace { -uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { +void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity, + const char *nt_message, int32_t requested, + int32_t actual) { + if (nt_message) +printf("%s\n", nt_message); + else +printf("The computed number of threads (%u) does not match the requested " + "number of threads (%d). Consider that it might not be supported " + "to select exactly %d threads on this target device.\n", + actual, requested, requested); + if (nt_severity == severity_fatal) +__builtin_trap(); +} + +uint32_t determineNumberOfThreads(int32_t NumThreadsClause, + int32_t nt_strict = false, + int32_t nt_severity = severity_fatal, + const char *nt_message = nullptr) { uint32_t NThreadsICV = NumThreadsClause != -1 ? NumThreadsClause : icv::NThreads; uint32_t NumThreads = mapping::getMaxTeamThreads(); @@ -55,13 +72,17 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { // SPMD mode allows any number of threads, for generic mode we round down to a // multiple of WARPSIZE since it is legal to do so in OpenMP. - if (mapping::isSPMDMode()) -return NumThreads; + if (!mapping::isSPMDMode()) { +if (NumThreads < mapping::getWarpSize()) + NumThreads = 1; +else + NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1)); + } - if (NumThreads < mapping::getWarpSize()) -NumThreads = 1; - else -NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1)); + if (NumThreadsClause != -1 && nt_strict && + NumThreads != static_cast(NumThreadsClause)) +num_threads_strict_error(nt_strict, nt_severity, nt_message, + NumThreadsClause, NumThreads); return NumThreads; } @@ -82,12 +103,14 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { extern "C" { -[[clang::always_inline]] void __kmpc_parallel_spmd(IdentTy *ident, - int32_t num_threads, - void *fn, void **args, - const int64_t nargs) { +[[clang::always_inline]] void +__kmpc_parallel_spmd(IdentTy *ident, int32_t num_threads, void *fn, void **args, + const int64_t nargs, int32_t nt_strict = false, + int32_t nt_severity = severity_fatal, + const char *nt_message = nullptr) { uint32_t TId = mapping::getThreadIdInBlock(); - uint32_t NumThreads = determineNumberOfThreads(num_threads); + uint32_t NumThreads = + determineNumberOfThreads(num_threads, nt_strict, nt_severity, nt_message); uint32_t PTeamSize = NumThreads == mapping::getMaxTeamThreads() ? 0 : NumThreads; // Avoid the race between the read of the `icv::Level` above and the write @@ -140,10 +163,11 @@ extern "C" { return; } -[[clang::always_in
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
https://github.com/ritter-x2a edited https://github.com/llvm/llvm-project/pull/146075 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
https://github.com/jthackray updated https://github.com/llvm/llvm-project/pull/146329 >From 69c97078a3e7ee1592e5e5c4b2f4eba6455dd96e Mon Sep 17 00:00:00 2001 From: Jonathan Thackray Date: Wed, 25 Jun 2025 21:22:43 +0100 Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests Co-authored-by: Virginia Cangelosi --- llvm/test/MC/AArch64/armv9.2a-mec.s | 172 ++- llvm/test/MC/AArch64/armv9.4-lse128.s | 98 - llvm/test/MC/AArch64/armv9.4a-gcs.s | 198 +- .../MC/AArch64/armv9.4a-lse128-diagnostics.s | 17 ++ llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 llvm/test/MC/AArch64/armv9.5a-cpa.s | 89 +--- .../MC/AArch64/armv9.6a-mpam-diagnostics.s| 5 + llvm/test/MC/AArch64/armv9.6a-mpam.s | 80 +-- .../MC/Disassembler/AArch64/armv9.4a-gcs.txt | 90 .../Disassembler/AArch64/armv9.4a-lse128.txt | 98 - .../MC/Disassembler/AArch64/armv9.5a-cpa.txt | 42 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt | 50 - .../MC/Disassembler/AArch64/armv9a-mec.txt| 54 - 13 files changed, 541 insertions(+), 590 deletions(-) delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s b/llvm/test/MC/AArch64/armv9.2a-mec.s index 42e4bf732086e..c747886f7ec3b 100644 --- a/llvm/test/MC/AArch64/armv9.2a-mec.s +++ b/llvm/test/MC/AArch64/armv9.2a-mec.s @@ -1,55 +1,117 @@ -// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-MEC %s - - mrs x0, MECIDR_EL2 -// CHECK: mrs x0, MECIDR_EL2 // encoding: [0xe0,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P0_EL2 -// CHECK: mrs x0, MECID_P0_EL2 // encoding: [0x00,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A0_EL2 -// CHECK: mrs x0, MECID_A0_EL2 // encoding: [0x20,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P1_EL2 -// CHECK: mrs x0, MECID_P1_EL2 // encoding: [0x40,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A1_EL2 -// CHECK: mrs x0, MECID_A1_EL2 // encoding: [0x60,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_P_EL2 -// CHECK: mrs x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_A_EL2 -// CHECK: mrs x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_RL_A_EL3 -// CHECK: mrs x0, MECID_RL_A_EL3 // encoding: [0x20,0xaa,0x3e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - msr MECID_P0_EL2,x0 -// CHECK: msr MECID_P0_EL2, x0 // encoding: [0x00,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A0_EL2,x0 -// CHECK: msr MECID_A0_EL2, x0 // encoding: [0x20,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_P1_EL2,x0 -// CHECK: msr MECID_P1_EL2, x0 // encoding: [0x40,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A1_EL2,x0 -// CHECK: msr MECID_A1_EL2, x0 // encoding: [0x60,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - m
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
https://github.com/jthackray updated https://github.com/llvm/llvm-project/pull/146329 >From 69c97078a3e7ee1592e5e5c4b2f4eba6455dd96e Mon Sep 17 00:00:00 2001 From: Jonathan Thackray Date: Wed, 25 Jun 2025 21:22:43 +0100 Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests Co-authored-by: Virginia Cangelosi --- llvm/test/MC/AArch64/armv9.2a-mec.s | 172 ++- llvm/test/MC/AArch64/armv9.4-lse128.s | 98 - llvm/test/MC/AArch64/armv9.4a-gcs.s | 198 +- .../MC/AArch64/armv9.4a-lse128-diagnostics.s | 17 ++ llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 llvm/test/MC/AArch64/armv9.5a-cpa.s | 89 +--- .../MC/AArch64/armv9.6a-mpam-diagnostics.s| 5 + llvm/test/MC/AArch64/armv9.6a-mpam.s | 80 +-- .../MC/Disassembler/AArch64/armv9.4a-gcs.txt | 90 .../Disassembler/AArch64/armv9.4a-lse128.txt | 98 - .../MC/Disassembler/AArch64/armv9.5a-cpa.txt | 42 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt | 50 - .../MC/Disassembler/AArch64/armv9a-mec.txt| 54 - 13 files changed, 541 insertions(+), 590 deletions(-) delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s b/llvm/test/MC/AArch64/armv9.2a-mec.s index 42e4bf732086e..c747886f7ec3b 100644 --- a/llvm/test/MC/AArch64/armv9.2a-mec.s +++ b/llvm/test/MC/AArch64/armv9.2a-mec.s @@ -1,55 +1,117 @@ -// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-MEC %s - - mrs x0, MECIDR_EL2 -// CHECK: mrs x0, MECIDR_EL2 // encoding: [0xe0,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P0_EL2 -// CHECK: mrs x0, MECID_P0_EL2 // encoding: [0x00,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A0_EL2 -// CHECK: mrs x0, MECID_A0_EL2 // encoding: [0x20,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P1_EL2 -// CHECK: mrs x0, MECID_P1_EL2 // encoding: [0x40,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A1_EL2 -// CHECK: mrs x0, MECID_A1_EL2 // encoding: [0x60,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_P_EL2 -// CHECK: mrs x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_A_EL2 -// CHECK: mrs x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_RL_A_EL3 -// CHECK: mrs x0, MECID_RL_A_EL3 // encoding: [0x20,0xaa,0x3e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - msr MECID_P0_EL2,x0 -// CHECK: msr MECID_P0_EL2, x0 // encoding: [0x00,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A0_EL2,x0 -// CHECK: msr MECID_A0_EL2, x0 // encoding: [0x20,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_P1_EL2,x0 -// CHECK: msr MECID_P1_EL2, x0 // encoding: [0x40,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A1_EL2,x0 -// CHECK: msr MECID_A1_EL2, x0 // encoding: [0x60,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - m
[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)
https://github.com/evelez7 updated https://github.com/llvm/llvm-project/pull/146165 >From 318f0c85b9f984ba22873ee76a0e610b07d443e9 Mon Sep 17 00:00:00 2001 From: Erick Velez Date: Thu, 26 Jun 2025 20:54:03 -0700 Subject: [PATCH] [clang-doc] serialize friends --- clang-tools-extra/clang-doc/BitcodeReader.cpp | 46 +++ clang-tools-extra/clang-doc/BitcodeWriter.cpp | 27 ++- clang-tools-extra/clang-doc/BitcodeWriter.h | 6 +- clang-tools-extra/clang-doc/HTMLGenerator.cpp | 3 + .../clang-doc/HTMLMustacheGenerator.cpp | 1 + clang-tools-extra/clang-doc/JSONGenerator.cpp | 23 +- clang-tools-extra/clang-doc/MDGenerator.cpp | 4 + .../clang-doc/Representation.cpp | 16 clang-tools-extra/clang-doc/Representation.h | 21 - clang-tools-extra/clang-doc/Serialize.cpp | 55 ++ clang-tools-extra/clang-doc/YAMLGenerator.cpp | 1 + .../test/clang-doc/json/class.cpp | 76 +-- .../unittests/clang-doc/BitcodeTest.cpp | 2 + 13 files changed, 236 insertions(+), 45 deletions(-) diff --git a/clang-tools-extra/clang-doc/BitcodeReader.cpp b/clang-tools-extra/clang-doc/BitcodeReader.cpp index fd6f40cff1a4e..2cbf8bf6b2879 100644 --- a/clang-tools-extra/clang-doc/BitcodeReader.cpp +++ b/clang-tools-extra/clang-doc/BitcodeReader.cpp @@ -94,6 +94,7 @@ static llvm::Error decodeRecord(const Record &R, InfoType &Field, case InfoType::IT_typedef: case InfoType::IT_concept: case InfoType::IT_variable: + case InfoType::IT_friend: Field = IT; return llvm::Error::success(); } @@ -111,6 +112,7 @@ static llvm::Error decodeRecord(const Record &R, FieldId &Field, case FieldId::F_child_namespace: case FieldId::F_child_record: case FieldId::F_concept: + case FieldId::F_friend: case FieldId::F_default: Field = F; return llvm::Error::success(); @@ -450,6 +452,15 @@ static llvm::Error parseRecord(const Record &R, unsigned ID, } } +static llvm::Error parseRecord(const Record &R, unsigned ID, StringRef Blob, + FriendInfo *F) { + if (ID == FRIEND_IS_CLASS) { +return decodeRecord(R, F->IsClass, Blob); + } + return llvm::createStringError(llvm::inconvertibleErrorCode(), + "invalid field for Friend"); +} + template static llvm::Expected getCommentInfo(T I) { return llvm::createStringError(llvm::inconvertibleErrorCode(), "invalid type cannot contain CommentInfo"); @@ -525,6 +536,18 @@ template <> llvm::Error addTypeInfo(FunctionInfo *I, FieldTypeInfo &&T) { return llvm::Error::success(); } +template <> llvm::Error addTypeInfo(FriendInfo *I, FieldTypeInfo &&T) { + if (!I->Params) +I->Params.emplace(); + I->Params->emplace_back(std::move(T)); + return llvm::Error::success(); +} + +template <> llvm::Error addTypeInfo(FriendInfo *I, TypeInfo &&T) { + I->ReturnType.emplace(std::move(T)); + return llvm::Error::success(); +} + template <> llvm::Error addTypeInfo(EnumInfo *I, TypeInfo &&T) { I->BaseType = std::move(T); return llvm::Error::success(); @@ -667,6 +690,16 @@ llvm::Error addReference(ConstraintInfo *I, Reference &&R, FieldId F) { "ConstraintInfo cannot contain this Reference"); } +template <> +llvm::Error addReference(FriendInfo *Friend, Reference &&R, FieldId F) { + if (F == FieldId::F_friend) { +Friend->Ref = std::move(R); +return llvm::Error::success(); + } + return llvm::createStringError(llvm::inconvertibleErrorCode(), + "Friend cannot contain this Reference"); +} + template static void addChild(T I, ChildInfoType &&R) { llvm::errs() << "invalid child type for info"; @@ -700,6 +733,9 @@ template <> void addChild(RecordInfo *I, EnumInfo &&R) { template <> void addChild(RecordInfo *I, TypedefInfo &&R) { I->Children.Typedefs.emplace_back(std::move(R)); } +template <> void addChild(RecordInfo *I, FriendInfo &&R) { + I->Friends.emplace_back(std::move(R)); +} // Other types of children: template <> void addChild(EnumInfo *I, EnumValueInfo &&R) { @@ -741,6 +777,9 @@ template <> void addTemplate(FunctionInfo *I, TemplateInfo &&P) { template <> void addTemplate(ConceptInfo *I, TemplateInfo &&P) { I->Template = std::move(P); } +template <> void addTemplate(FriendInfo *I, TemplateInfo &&P) { + I->Template.emplace(std::move(P)); +} // Template specializations go only into template records. template @@ -921,6 +960,10 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned ID, T I) { case BI_VAR_BLOCK_ID: { return handleSubBlock(ID, I, CreateAddFunc(addChild)); } + case BI_FRIEND_BLOCK_ID: { +return handleSubBlock(ID, I, + CreateAddFunc(addChild)); + } default: return llvm::createStringError(llvm::inconvertibleErrorCode(), "invalid subblock type"); @@ -1032,6 +1075,8 @@ ClangDocBitcodeRe
[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)
https://github.com/evelez7 updated https://github.com/llvm/llvm-project/pull/146165 >From 318f0c85b9f984ba22873ee76a0e610b07d443e9 Mon Sep 17 00:00:00 2001 From: Erick Velez Date: Thu, 26 Jun 2025 20:54:03 -0700 Subject: [PATCH] [clang-doc] serialize friends --- clang-tools-extra/clang-doc/BitcodeReader.cpp | 46 +++ clang-tools-extra/clang-doc/BitcodeWriter.cpp | 27 ++- clang-tools-extra/clang-doc/BitcodeWriter.h | 6 +- clang-tools-extra/clang-doc/HTMLGenerator.cpp | 3 + .../clang-doc/HTMLMustacheGenerator.cpp | 1 + clang-tools-extra/clang-doc/JSONGenerator.cpp | 23 +- clang-tools-extra/clang-doc/MDGenerator.cpp | 4 + .../clang-doc/Representation.cpp | 16 clang-tools-extra/clang-doc/Representation.h | 21 - clang-tools-extra/clang-doc/Serialize.cpp | 55 ++ clang-tools-extra/clang-doc/YAMLGenerator.cpp | 1 + .../test/clang-doc/json/class.cpp | 76 +-- .../unittests/clang-doc/BitcodeTest.cpp | 2 + 13 files changed, 236 insertions(+), 45 deletions(-) diff --git a/clang-tools-extra/clang-doc/BitcodeReader.cpp b/clang-tools-extra/clang-doc/BitcodeReader.cpp index fd6f40cff1a4e..2cbf8bf6b2879 100644 --- a/clang-tools-extra/clang-doc/BitcodeReader.cpp +++ b/clang-tools-extra/clang-doc/BitcodeReader.cpp @@ -94,6 +94,7 @@ static llvm::Error decodeRecord(const Record &R, InfoType &Field, case InfoType::IT_typedef: case InfoType::IT_concept: case InfoType::IT_variable: + case InfoType::IT_friend: Field = IT; return llvm::Error::success(); } @@ -111,6 +112,7 @@ static llvm::Error decodeRecord(const Record &R, FieldId &Field, case FieldId::F_child_namespace: case FieldId::F_child_record: case FieldId::F_concept: + case FieldId::F_friend: case FieldId::F_default: Field = F; return llvm::Error::success(); @@ -450,6 +452,15 @@ static llvm::Error parseRecord(const Record &R, unsigned ID, } } +static llvm::Error parseRecord(const Record &R, unsigned ID, StringRef Blob, + FriendInfo *F) { + if (ID == FRIEND_IS_CLASS) { +return decodeRecord(R, F->IsClass, Blob); + } + return llvm::createStringError(llvm::inconvertibleErrorCode(), + "invalid field for Friend"); +} + template static llvm::Expected getCommentInfo(T I) { return llvm::createStringError(llvm::inconvertibleErrorCode(), "invalid type cannot contain CommentInfo"); @@ -525,6 +536,18 @@ template <> llvm::Error addTypeInfo(FunctionInfo *I, FieldTypeInfo &&T) { return llvm::Error::success(); } +template <> llvm::Error addTypeInfo(FriendInfo *I, FieldTypeInfo &&T) { + if (!I->Params) +I->Params.emplace(); + I->Params->emplace_back(std::move(T)); + return llvm::Error::success(); +} + +template <> llvm::Error addTypeInfo(FriendInfo *I, TypeInfo &&T) { + I->ReturnType.emplace(std::move(T)); + return llvm::Error::success(); +} + template <> llvm::Error addTypeInfo(EnumInfo *I, TypeInfo &&T) { I->BaseType = std::move(T); return llvm::Error::success(); @@ -667,6 +690,16 @@ llvm::Error addReference(ConstraintInfo *I, Reference &&R, FieldId F) { "ConstraintInfo cannot contain this Reference"); } +template <> +llvm::Error addReference(FriendInfo *Friend, Reference &&R, FieldId F) { + if (F == FieldId::F_friend) { +Friend->Ref = std::move(R); +return llvm::Error::success(); + } + return llvm::createStringError(llvm::inconvertibleErrorCode(), + "Friend cannot contain this Reference"); +} + template static void addChild(T I, ChildInfoType &&R) { llvm::errs() << "invalid child type for info"; @@ -700,6 +733,9 @@ template <> void addChild(RecordInfo *I, EnumInfo &&R) { template <> void addChild(RecordInfo *I, TypedefInfo &&R) { I->Children.Typedefs.emplace_back(std::move(R)); } +template <> void addChild(RecordInfo *I, FriendInfo &&R) { + I->Friends.emplace_back(std::move(R)); +} // Other types of children: template <> void addChild(EnumInfo *I, EnumValueInfo &&R) { @@ -741,6 +777,9 @@ template <> void addTemplate(FunctionInfo *I, TemplateInfo &&P) { template <> void addTemplate(ConceptInfo *I, TemplateInfo &&P) { I->Template = std::move(P); } +template <> void addTemplate(FriendInfo *I, TemplateInfo &&P) { + I->Template.emplace(std::move(P)); +} // Template specializations go only into template records. template @@ -921,6 +960,10 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned ID, T I) { case BI_VAR_BLOCK_ID: { return handleSubBlock(ID, I, CreateAddFunc(addChild)); } + case BI_FRIEND_BLOCK_ID: { +return handleSubBlock(ID, I, + CreateAddFunc(addChild)); + } default: return llvm::createStringError(llvm::inconvertibleErrorCode(), "invalid subblock type"); @@ -1032,6 +1075,8 @@ ClangDocBitcodeRe
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146076 >From 2d8d232729769a3ca274789dee2fe542d0045ef2 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 05:38:52 -0400 Subject: [PATCH] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default Also removes the command line option to control this feature. There seem to be mainly two kinds of test changes: - Some operands of addition instructions are swapped; that is to be expected since PTRADD is not commutative. - Improvements in code generation, probably because the legacy lowering enabled some transformations that were sometimes harmful. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 10 +- .../identical-subrange-spill-infloop.ll | 354 +++--- .../AMDGPU/infer-addrspace-flat-atomic.ll | 14 +- llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll | 8 +- .../AMDGPU/lower-module-lds-via-hybrid.ll | 4 +- .../AMDGPU/lower-module-lds-via-table.ll | 16 +- .../match-perm-extract-vector-elt-bug.ll | 22 +- llvm/test/CodeGen/AMDGPU/memmove-var-size.ll | 16 +- .../AMDGPU/preload-implicit-kernargs.ll | 6 +- .../AMDGPU/promote-constOffset-to-imm.ll | 8 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 7 +- .../AMDGPU/ptradd-sdag-optimizations.ll | 94 ++--- .../AMDGPU/ptradd-sdag-undef-poison.ll| 6 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 27 +- llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll | 29 +- 15 files changed, 311 insertions(+), 310 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 822bab88c8a09..79981007c13af 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -63,14 +63,6 @@ static cl::opt UseDivergentRegisterIndexing( cl::desc("Use indirect register addressing for divergent indexes"), cl::init(false)); -// TODO: This option should be removed once we switch to always using PTRADD in -// the SelectionDAG. -static cl::opt UseSelectionDAGPTRADD( -"amdgpu-use-sdag-ptradd", cl::Hidden, -cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the " - "SelectionDAG ISel"), -cl::init(false)); - static bool denormalModeIsFlushAllF32(const MachineFunction &MF) { const SIMachineFunctionInfo *Info = MF.getInfo(); return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign(); @@ -10599,7 +10591,7 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, bool SITargetLowering::shouldPreservePtrArith(const Function &F, EVT PtrVT) const { - return UseSelectionDAGPTRADD && PtrVT == MVT::i64; + return PtrVT == MVT::i64; } bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F, diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll index 56ceba258f471..f9fcf489bd389 100644 --- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll +++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll @@ -6,97 +6,151 @@ define void @main(i1 %arg) #0 { ; CHECK: ; %bb.0: ; %bb ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill -; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill ; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:v_writelane_b32 v5, s30, 0 -; CHECK-NEXT:v_writelane_b32 v5, s31, 1 -; CHECK-NEXT:v_writelane_b32 v5, s36, 2 -; CHECK-NEXT:v_writelane_b32 v5, s37, 3 -; CHECK-NEXT:v_writelane_b32 v5, s38, 4 -; CHECK-NEXT:v_writelane_b32 v5, s39, 5 -; CHECK-NEXT:v_writelane_b32 v5, s48, 6 -; CHECK-NEXT:v_writelane_b32 v5, s49, 7 -; CHECK-NEXT:v_writelane_b32 v5, s50, 8 -; CHECK-NEXT:v_writelane_b32 v5, s51, 9 -; CHECK-NEXT:v_writelane_b32 v5, s52, 10 -; CHECK-NEXT:v_writelane_b32 v5, s53, 11 -; CHECK-NEXT:v_writelane_b32 v5, s54, 12 -; CHECK-NEXT:v_writelane_b32 v5, s55, 13 -; CHECK-NEXT:s_getpc_b64 s[24:25] -; CHECK-NEXT:v_writelane_b32 v5, s64, 14 -; CHECK-NEXT:s_movk_i32 s4, 0xf0 -; CHECK-NEXT:s_mov_b32 s5, s24 -; CHECK-NEXT:v_writelane_b32 v5, s65, 15 -; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0 -; CHECK-NEXT:s_mov_b64 s[4:5], 0 -; CHECK-NEXT:v_writelane_b32 v5, s66, 16 -; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0 -; CHECK-NEXT:v_writelane_b32 v5, s67, 17 -; CHECK-NEXT:s_waitcnt lgkmcnt(0) -; CHECK-NEXT:s_movk_i32 s6, 0x130 -; CHECK-NEXT:s_mov_b32 s7, s24 -; CHECK-NEXT:v_writela
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146076 >From 2d8d232729769a3ca274789dee2fe542d0045ef2 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 05:38:52 -0400 Subject: [PATCH] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default Also removes the command line option to control this feature. There seem to be mainly two kinds of test changes: - Some operands of addition instructions are swapped; that is to be expected since PTRADD is not commutative. - Improvements in code generation, probably because the legacy lowering enabled some transformations that were sometimes harmful. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 10 +- .../identical-subrange-spill-infloop.ll | 354 +++--- .../AMDGPU/infer-addrspace-flat-atomic.ll | 14 +- llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll | 8 +- .../AMDGPU/lower-module-lds-via-hybrid.ll | 4 +- .../AMDGPU/lower-module-lds-via-table.ll | 16 +- .../match-perm-extract-vector-elt-bug.ll | 22 +- llvm/test/CodeGen/AMDGPU/memmove-var-size.ll | 16 +- .../AMDGPU/preload-implicit-kernargs.ll | 6 +- .../AMDGPU/promote-constOffset-to-imm.ll | 8 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 7 +- .../AMDGPU/ptradd-sdag-optimizations.ll | 94 ++--- .../AMDGPU/ptradd-sdag-undef-poison.ll| 6 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 27 +- llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll | 29 +- 15 files changed, 311 insertions(+), 310 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 822bab88c8a09..79981007c13af 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -63,14 +63,6 @@ static cl::opt UseDivergentRegisterIndexing( cl::desc("Use indirect register addressing for divergent indexes"), cl::init(false)); -// TODO: This option should be removed once we switch to always using PTRADD in -// the SelectionDAG. -static cl::opt UseSelectionDAGPTRADD( -"amdgpu-use-sdag-ptradd", cl::Hidden, -cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the " - "SelectionDAG ISel"), -cl::init(false)); - static bool denormalModeIsFlushAllF32(const MachineFunction &MF) { const SIMachineFunctionInfo *Info = MF.getInfo(); return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign(); @@ -10599,7 +10591,7 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, bool SITargetLowering::shouldPreservePtrArith(const Function &F, EVT PtrVT) const { - return UseSelectionDAGPTRADD && PtrVT == MVT::i64; + return PtrVT == MVT::i64; } bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F, diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll index 56ceba258f471..f9fcf489bd389 100644 --- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll +++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll @@ -6,97 +6,151 @@ define void @main(i1 %arg) #0 { ; CHECK: ; %bb.0: ; %bb ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill -; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill ; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:v_writelane_b32 v5, s30, 0 -; CHECK-NEXT:v_writelane_b32 v5, s31, 1 -; CHECK-NEXT:v_writelane_b32 v5, s36, 2 -; CHECK-NEXT:v_writelane_b32 v5, s37, 3 -; CHECK-NEXT:v_writelane_b32 v5, s38, 4 -; CHECK-NEXT:v_writelane_b32 v5, s39, 5 -; CHECK-NEXT:v_writelane_b32 v5, s48, 6 -; CHECK-NEXT:v_writelane_b32 v5, s49, 7 -; CHECK-NEXT:v_writelane_b32 v5, s50, 8 -; CHECK-NEXT:v_writelane_b32 v5, s51, 9 -; CHECK-NEXT:v_writelane_b32 v5, s52, 10 -; CHECK-NEXT:v_writelane_b32 v5, s53, 11 -; CHECK-NEXT:v_writelane_b32 v5, s54, 12 -; CHECK-NEXT:v_writelane_b32 v5, s55, 13 -; CHECK-NEXT:s_getpc_b64 s[24:25] -; CHECK-NEXT:v_writelane_b32 v5, s64, 14 -; CHECK-NEXT:s_movk_i32 s4, 0xf0 -; CHECK-NEXT:s_mov_b32 s5, s24 -; CHECK-NEXT:v_writelane_b32 v5, s65, 15 -; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0 -; CHECK-NEXT:s_mov_b64 s[4:5], 0 -; CHECK-NEXT:v_writelane_b32 v5, s66, 16 -; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0 -; CHECK-NEXT:v_writelane_b32 v5, s67, 17 -; CHECK-NEXT:s_waitcnt lgkmcnt(0) -; CHECK-NEXT:s_movk_i32 s6, 0x130 -; CHECK-NEXT:s_mov_b32 s7, s24 -; CHECK-NEXT:v_writela
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146075 >From 452008111a34c815b38242272063654393261921 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 04:23:50 -0400 Subject: [PATCH 1/3] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which we will soon need to fold offsets into FLAT instructions. Currently, disjoint ORs can still be used for offset folding, so that part of the logic can't be tested. The PR contains a hacky workaround for a situation where an AssertAlign operand of a PTRADD is not DAGCombined before the PTRADD, causing the PTRADD to be turned into a disjoint OR although reassociating it with the operand of the AssertAlign would be better. This wouldn't be a problem if the DAGCombiner ensured that a node is only processed after all its operands have been processed. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 .../AMDGPU/ptradd-sdag-optimizations.ll | 56 ++- 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 822bab88c8a09..71230078edc69 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -15136,6 +15136,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode *N, return Folded; } + // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if + // that transformation can't block an offset folding at any use of the ptradd. + // This should be done late, after legalization, so that it doesn't block + // other ptradd combines that could enable more offset folding. + bool HasIntermediateAssertAlign = + N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd(); + // This is a hack to work around an ordering problem for DAGs like this: + // (ptradd (AssertAlign (ptradd p, c1), k), c2) + // If the outer ptradd is handled first by the DAGCombiner, it can be + // transformed into a disjoint or. Then, when the generic AssertAlign combine + // pushes the AssertAlign through the inner ptradd, it's too late for the + // ptradd reassociation to trigger. + if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign && + DAG.haveNoCommonBitsSet(N0, N1)) { +bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) { + if (auto *LoadStore = dyn_cast(User); + LoadStore && LoadStore->getBasePtr().getNode() == N) { +unsigned AS = LoadStore->getAddressSpace(); +// Currently, we only really need ptradds to fold offsets into flat +// memory instructions. +if (AS != AMDGPUAS::FLAT_ADDRESS) + return false; +TargetLoweringBase::AddrMode AM; +AM.HasBaseReg = true; +EVT VT = LoadStore->getMemoryVT(); +Type *AccessTy = VT.getTypeForEVT(*DAG.getContext()); +return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS); + } + return false; +}); + +if (!TransformCanBreakAddrMode) + return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint); + } + if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse()) return SDValue(); diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll index 893deb35fe822..64e041103a563 100644 --- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll @@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) { ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in the ; assertalign DAG combine. -define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) #0 { +define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) { ; GFX942-LABEL: llvm_amdgcn_queue_ptr: ; GFX942: ; %bb.0: ; GFX942-NEXT:v_mov_b32_e32 v2, 0 @@ -416,6 +416,60 @@ entry: ret void } +; Check that ptradds can be lowered to disjoint ORs. +define ptr @gep_disjoint_or(ptr %base) { +; GFX942-LABEL: gep_disjoint_or: +; GFX942: ; %bb.0: +; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4 +; GFX942-NEXT:s_setpc_b64 s[30:31] + %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0) + %gep = getelementptr nuw inbounds i8, ptr %p, i64 4 + ret ptr %gep +} + +; Check that AssertAlign no
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
@@ -1,592 +1,697 @@ -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+the -mattr=+d128 < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+v8.9a -mattr=+the -mattr=+d128 < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+v9.4a -mattr=+the -mattr=+d128 < %s | FileCheck %s +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v8.9a < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128,v9.4a < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \ +// RUN:| llvm-objdump -d --mattr=+the,+d128 - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+the,+d128 < %s \ +// RUN: | llvm-objdump -d --mattr=-the,-d128 - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+the,+d128 < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+the,+d128 -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-THE %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+the < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.9a -mattr=+the < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v9.4a -mattr=+the < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-D128 %s +mrs x3, RCWMASK_EL1 +// CHECK-INST: mrs x3, RCWMASK_EL1 +// CHECK-ENCODING: encoding: [0xc3,0xd0,0x38,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d538d0c3 mrs x3, S3_0_C13_C0_6 -// RUN: not llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+the -mattr=+d128 < %s 2>&1 | FileCheck --check-prefix=ERROR-NO-ZXR %s +msr RCWMASK_EL1, x1 +// CHECK-INST: msr RCWMASK_EL1, x1 +// CHECK-ENCODING: encoding: [0xc1,0xd0,0x18,0xd5] +// CHECK-ERROR: error: expected writable system register or pstate +// CHECK-UNKNOWN: d518d0c1 msr S3_0_C13_C0_6, x1 -mrs x3, RCWMASK_EL1 -// CHECK: mrs x3, RCWMASK_EL1 // encoding: [0xc3,0xd0,0x38,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register -msr RCWMASK_EL1, x1 -// CHECK: msr RCWMASK_EL1, x1 // encoding: [0xc1,0xd0,0x18,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or pstate -mrs x3, RCWSMASK_EL1 -// CHECK: mrs x3, RCWSMASK_EL1 // encoding: [0x63,0xd0,0x38,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:21: error: expected readable system register -msr RCWSMASK_EL1, x1 -// CHECK: msr RCWSMASK_EL1, x1 // encoding: [0x61,0xd0,0x18,0xd5] -// ERROR-NO-THE: [[@LINE-2]]:17: error: expected writable system register or pstate +mrs x3, RCWSMASK_EL1 +// CHECK-INST: mrs x3, RCWSMASK_EL1 +// CHECK-ENCODING: encoding: [0x63,0xd0,0x38,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d538d063 mrs x3, S3_0_C13_C0_3 +msr RCWSMASK_EL1, x1 +// CHECK-INST: msr RCWSMASK_EL1, x1 +// CHECK-ENCODING: encoding: [0x61,0xd0,0x18,0xd5] +// CHECK-ERROR: error: expected writable system register or pstate +// CHECK-UNKNOWN: d518d061 msr S3_0_C13_C0_3, x1 -rcwcas x0, x1, [x4] -// CHECK: rcwcas x0, x1, [x4] // encoding: [0x81,0x08,0x20,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the -rcwcasa x0, x1, [x4] -// CHECK: rcwcasa x0, x1, [x4] // encoding: [0x81,0x08,0xa0,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the -rcwcasal x0, x1, [x4] -// CHECK: rcwcasal x0, x1, [x4] // encoding: [0x81,0x08,0xe0,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the -rcwcasl x0, x1, [x4] -// CHECK: rcwcasl x0, x1, [x4] // encoding: [0x81,0x08,0x60,0x19] -// ERROR-NO-THE: [[@LINE-2]]:13: error: instruction requires: the
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
@@ -16,28 +16,41 @@ // RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,-clrbhb < %s | FileCheck %s --check-prefix=HINT_22 // Optional, off by default, manually enabled -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB -// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb < %s | FileCheck %s --check-prefix=CLRBHB +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v8.8a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -show-encoding -triple aarch64-none-elf -mattr=+v9.3a,+clrbhb < %s | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST jthackray wrote: I'm not sure (Virginia converted this file) but I suspect because it has fairly extensive version-dependent tests at the top. https://github.com/llvm/llvm-project/pull/146330 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)
https://github.com/jthackray updated https://github.com/llvm/llvm-project/pull/146331 >From 8c9eccdc95e465fdbfe833080afb1ad1099c224c Mon Sep 17 00:00:00 2001 From: Jonathan Thackray Date: Fri, 27 Jun 2025 20:16:06 +0100 Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) This is a series of patches (4/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests whose .s tests have functions * makes the .s tests have a roundabout run line to test both encoding and assembly Co-authored-by: Virginia Cangelosi --- llvm/test/MC/AArch64/armv9.6a-lsui.s | 1073 +++-- llvm/test/MC/AArch64/armv9.6a-occmo.s | 54 +- llvm/test/MC/AArch64/armv9.6a-pcdphint.s | 37 +- llvm/test/MC/AArch64/armv9.6a-rme-gpc3.s | 46 +- .../MC/Disassembler/AArch64/armv9.6a-lsui.txt | 323 - .../Disassembler/AArch64/armv9.6a-occmo.txt | 11 - .../AArch64/armv9.6a-pcdphint.txt |8 - .../AArch64/armv9.6a-rme-gpc3.txt | 18 - 8 files changed, 805 insertions(+), 765 deletions(-) delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-lsui.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-occmo.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-pcdphint.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-rme-gpc3.txt diff --git a/llvm/test/MC/AArch64/armv9.6a-lsui.s b/llvm/test/MC/AArch64/armv9.6a-lsui.s index d4a5e1f980560..264a869b6d286 100644 --- a/llvm/test/MC/AArch64/armv9.6a-lsui.s +++ b/llvm/test/MC/AArch64/armv9.6a-lsui.s @@ -1,408 +1,751 @@ -// RUN: llvm-mc -triple aarch64 -mattr=+lsui -show-encoding %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>&1 | FileCheck %s --check-prefix=ERROR +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \ +// RUN: | llvm-objdump -d --mattr=+lsui --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \ +// RUN: | llvm-objdump -d --mattr=-lsui --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+lsui -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + -_func: -// CHECK: _func: //-- // Unprivileged load/store operations //-- - ldtxr x9, [sp] -// CHECK: ldtxrx9, [sp]// encoding: [0xe9,0x7f,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x9, [sp, #0] -// CHECK: ldtxrx9, [sp]// encoding: [0xe9,0x7f,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x10, [x11] -// CHECK: ldtxrx10, [x11] // encoding: [0x6a,0x7d,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x10, [x11, #0] -// CHECK: ldtxrx10, [x11] // encoding: [0x6a,0x7d,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - - ldatxr x9, [sp] -// CHECK: ldatxr x9, [sp]// encoding: [0xe9,0xff,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldatxr x10, [x11] -// CHECK: ldatxr x10, [x11] // encoding: [0x6a,0xfd,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - - sttxr wzr, w4, [sp] -// CHECK: sttxrwzr, w4, [sp] // encoding: [0xe4,0x7f,0x1f,0x89] -// ERROR: error: instruction requires: lsui - sttxr wzr, w4, [sp, #0] -// CHECK: sttxrwzr, w4, [sp] // encoding: [0xe4,0x7f,0x1f,0x89] -// ERROR: error: instruction requires: lsui - sttxr w5, x6, [x7] -// CHECK: sttxrw5, x6, [x7]// encoding: [0xe6,0x7c,0x05,0xc9] -// ERROR: error: instruction requires: lsui - sttxr w5, x6, [x7, #0] -// CHECK: sttxrw5, x6, [x7]// encoding: [0xe6,0x7c,0x05,0xc9] -// ERROR: error: instruction requires: lsui - - stltxr w2, w4, [sp] -// CHECK: stltxr w2, w4, [sp]// encoding: [0xe4,0xff,0x02,0x89]
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
https://github.com/jthackray updated https://github.com/llvm/llvm-project/pull/146329 >From be8bcdead883ec9bac8bebf6b3382974fc988c28 Mon Sep 17 00:00:00 2001 From: Jonathan Thackray Date: Wed, 25 Jun 2025 21:22:43 +0100 Subject: [PATCH 1/2] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests Co-authored-by: Virginia Cangelosi --- llvm/test/MC/AArch64/armv9.2a-mec.s | 172 ++- llvm/test/MC/AArch64/armv9.4-lse128.s | 98 - llvm/test/MC/AArch64/armv9.4a-gcs.s | 198 +- .../MC/AArch64/armv9.4a-lse128-diagnostics.s | 17 ++ llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 llvm/test/MC/AArch64/armv9.5a-cpa.s | 89 +--- .../MC/AArch64/armv9.6a-mpam-diagnostics.s| 5 + llvm/test/MC/AArch64/armv9.6a-mpam.s | 80 +-- .../MC/Disassembler/AArch64/armv9.4a-gcs.txt | 90 .../Disassembler/AArch64/armv9.4a-lse128.txt | 98 - .../MC/Disassembler/AArch64/armv9.5a-cpa.txt | 42 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt | 50 - .../MC/Disassembler/AArch64/armv9a-mec.txt| 54 - 13 files changed, 541 insertions(+), 590 deletions(-) delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s b/llvm/test/MC/AArch64/armv9.2a-mec.s index 42e4bf732086e..c747886f7ec3b 100644 --- a/llvm/test/MC/AArch64/armv9.2a-mec.s +++ b/llvm/test/MC/AArch64/armv9.2a-mec.s @@ -1,55 +1,117 @@ -// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-MEC %s - - mrs x0, MECIDR_EL2 -// CHECK: mrs x0, MECIDR_EL2 // encoding: [0xe0,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P0_EL2 -// CHECK: mrs x0, MECID_P0_EL2 // encoding: [0x00,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A0_EL2 -// CHECK: mrs x0, MECID_A0_EL2 // encoding: [0x20,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P1_EL2 -// CHECK: mrs x0, MECID_P1_EL2 // encoding: [0x40,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A1_EL2 -// CHECK: mrs x0, MECID_A1_EL2 // encoding: [0x60,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_P_EL2 -// CHECK: mrs x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_A_EL2 -// CHECK: mrs x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_RL_A_EL3 -// CHECK: mrs x0, MECID_RL_A_EL3 // encoding: [0x20,0xaa,0x3e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - msr MECID_P0_EL2,x0 -// CHECK: msr MECID_P0_EL2, x0 // encoding: [0x00,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A0_EL2,x0 -// CHECK: msr MECID_A0_EL2, x0 // encoding: [0x20,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_P1_EL2,x0 -// CHECK: msr MECID_P1_EL2, x0 // encoding: [0x40,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A1_EL2,x0 -// CHECK: msr MECID_A1_EL2, x0 // encoding: [0x60,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - m
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
@@ -1,55 +1,117 @@ -// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-MEC %s - - mrs x0, MECIDR_EL2 -// CHECK: mrs x0, MECIDR_EL2 // encoding: [0xe0,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P0_EL2 -// CHECK: mrs x0, MECID_P0_EL2 // encoding: [0x00,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A0_EL2 -// CHECK: mrs x0, MECID_A0_EL2 // encoding: [0x20,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P1_EL2 -// CHECK: mrs x0, MECID_P1_EL2 // encoding: [0x40,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A1_EL2 -// CHECK: mrs x0, MECID_A1_EL2 // encoding: [0x60,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_P_EL2 -// CHECK: mrs x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_A_EL2 -// CHECK: mrs x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_RL_A_EL3 -// CHECK: mrs x0, MECID_RL_A_EL3 // encoding: [0x20,0xaa,0x3e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - msr MECID_P0_EL2,x0 -// CHECK: msr MECID_P0_EL2, x0 // encoding: [0x00,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A0_EL2,x0 -// CHECK: msr MECID_A0_EL2, x0 // encoding: [0x20,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_P1_EL2,x0 -// CHECK: msr MECID_P1_EL2, x0 // encoding: [0x40,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A1_EL2,x0 -// CHECK: msr MECID_A1_EL2, x0 // encoding: [0x60,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr VMECID_P_EL2, x0 -// CHECK: msr VMECID_P_EL2, x0 // encoding: [0x00,0xa9,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr VMECID_A_EL2, x0 -// CHECK: msr VMECID_A_EL2, x0 // encoding: [0x20,0xa9,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_RL_A_EL3, x0 -// CHECK: msr MECID_RL_A_EL3, x0 // encoding: [0x20,0xaa,0x1e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - - dc cigdpae, x0 -// CHECK: dc cigdpae, x0 // encoding: [0xe0,0x7e,0x0c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIGDPAE requires: mec - dc cipae, x0 -// CHECK: dc cipae, x0 // encoding: [0x00,0x7e,0x0c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIPAE requires: mec +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \ +// RUN:| llvm-objdump -d --mattr=+mec --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+mec < %s \ +// RUN: | llvm-objdump -d --mattr=-mec --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+mec < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+mec -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + + +mrs x0, MECIDR_EL2 +// CHECK-INST: mrs x0, MECIDR_EL2 +// CHECK-ENCODING: encoding: [0xe0,0xa8,0x3c,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d53ca8e0 mrs x0, S3_4_C10_C8_7 + +mrs x0, MECID_P0_EL2 +// CHECK-INST: mrs x0, MECID_P0_EL2 +// CHECK-ENCODING: encoding: [0x00,0xa8,0x3c,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d53ca800 mrs x0, S3_4_C10_C8_0 + +mrs x0, MECID_A0_EL2 +// CHECK-INST: mrs x0, MECID_A0_EL2 +// CHECK-ENCODING: encoding: [0x20,0xa8,0x3c,0xd5] +// CHECK-ERROR: error: expected readable system register +// CHECK-UNKNOWN: d53ca820 mrs
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
@@ -0,0 +1,138 @@ +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \ +// RUN:| llvm-objdump -d --mattr=+lse128 - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lse128 < %s \ +// RUN: | llvm-objdump -d --mattr=-lse128 - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lse128 < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+lse128 -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + + +ldclrp x1, x2, [x11] +// CHECK-INST: ldclrp x1, x2, [x11] +// CHECK-ENCODING: encoding: [0x61,0x11,0x22,0x19] +// CHECK-ERROR: :[[@LINE-3]]:1: error: instruction requires: lse128 +// CHECK-UNKNOWN: 19221161 +ldclrp x21, x22, [sp] jthackray wrote: Thanks, now fixed. https://github.com/llvm/llvm-project/pull/146329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
https://github.com/jthackray edited https://github.com/llvm/llvm-project/pull/146329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)
llvmbot wrote: @llvm/pr-subscribers-clang-tools-extra Author: Erick Velez (evelez7) Changes --- Patch is 24.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146165.diff 13 Files Affected: - (modified) clang-tools-extra/clang-doc/BitcodeReader.cpp (+46) - (modified) clang-tools-extra/clang-doc/BitcodeWriter.cpp (+24-3) - (modified) clang-tools-extra/clang-doc/BitcodeWriter.h (+5-1) - (modified) clang-tools-extra/clang-doc/HTMLGenerator.cpp (+3) - (modified) clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp (+1) - (modified) clang-tools-extra/clang-doc/JSONGenerator.cpp (+21-2) - (modified) clang-tools-extra/clang-doc/MDGenerator.cpp (+4) - (modified) clang-tools-extra/clang-doc/Representation.cpp (+16) - (modified) clang-tools-extra/clang-doc/Representation.h (+20-1) - (modified) clang-tools-extra/clang-doc/Serialize.cpp (+53) - (modified) clang-tools-extra/clang-doc/YAMLGenerator.cpp (+1) - (modified) clang-tools-extra/test/clang-doc/json/class.cpp (+38-38) - (modified) clang-tools-extra/unittests/clang-doc/BitcodeTest.cpp (+2) ``diff diff --git a/clang-tools-extra/clang-doc/BitcodeReader.cpp b/clang-tools-extra/clang-doc/BitcodeReader.cpp index fd6f40cff1a4e..2cbf8bf6b2879 100644 --- a/clang-tools-extra/clang-doc/BitcodeReader.cpp +++ b/clang-tools-extra/clang-doc/BitcodeReader.cpp @@ -94,6 +94,7 @@ static llvm::Error decodeRecord(const Record &R, InfoType &Field, case InfoType::IT_typedef: case InfoType::IT_concept: case InfoType::IT_variable: + case InfoType::IT_friend: Field = IT; return llvm::Error::success(); } @@ -111,6 +112,7 @@ static llvm::Error decodeRecord(const Record &R, FieldId &Field, case FieldId::F_child_namespace: case FieldId::F_child_record: case FieldId::F_concept: + case FieldId::F_friend: case FieldId::F_default: Field = F; return llvm::Error::success(); @@ -450,6 +452,15 @@ static llvm::Error parseRecord(const Record &R, unsigned ID, } } +static llvm::Error parseRecord(const Record &R, unsigned ID, StringRef Blob, + FriendInfo *F) { + if (ID == FRIEND_IS_CLASS) { +return decodeRecord(R, F->IsClass, Blob); + } + return llvm::createStringError(llvm::inconvertibleErrorCode(), + "invalid field for Friend"); +} + template static llvm::Expected getCommentInfo(T I) { return llvm::createStringError(llvm::inconvertibleErrorCode(), "invalid type cannot contain CommentInfo"); @@ -525,6 +536,18 @@ template <> llvm::Error addTypeInfo(FunctionInfo *I, FieldTypeInfo &&T) { return llvm::Error::success(); } +template <> llvm::Error addTypeInfo(FriendInfo *I, FieldTypeInfo &&T) { + if (!I->Params) +I->Params.emplace(); + I->Params->emplace_back(std::move(T)); + return llvm::Error::success(); +} + +template <> llvm::Error addTypeInfo(FriendInfo *I, TypeInfo &&T) { + I->ReturnType.emplace(std::move(T)); + return llvm::Error::success(); +} + template <> llvm::Error addTypeInfo(EnumInfo *I, TypeInfo &&T) { I->BaseType = std::move(T); return llvm::Error::success(); @@ -667,6 +690,16 @@ llvm::Error addReference(ConstraintInfo *I, Reference &&R, FieldId F) { "ConstraintInfo cannot contain this Reference"); } +template <> +llvm::Error addReference(FriendInfo *Friend, Reference &&R, FieldId F) { + if (F == FieldId::F_friend) { +Friend->Ref = std::move(R); +return llvm::Error::success(); + } + return llvm::createStringError(llvm::inconvertibleErrorCode(), + "Friend cannot contain this Reference"); +} + template static void addChild(T I, ChildInfoType &&R) { llvm::errs() << "invalid child type for info"; @@ -700,6 +733,9 @@ template <> void addChild(RecordInfo *I, EnumInfo &&R) { template <> void addChild(RecordInfo *I, TypedefInfo &&R) { I->Children.Typedefs.emplace_back(std::move(R)); } +template <> void addChild(RecordInfo *I, FriendInfo &&R) { + I->Friends.emplace_back(std::move(R)); +} // Other types of children: template <> void addChild(EnumInfo *I, EnumValueInfo &&R) { @@ -741,6 +777,9 @@ template <> void addTemplate(FunctionInfo *I, TemplateInfo &&P) { template <> void addTemplate(ConceptInfo *I, TemplateInfo &&P) { I->Template = std::move(P); } +template <> void addTemplate(FriendInfo *I, TemplateInfo &&P) { + I->Template.emplace(std::move(P)); +} // Template specializations go only into template records. template @@ -921,6 +960,10 @@ llvm::Error ClangDocBitcodeReader::readSubBlock(unsigned ID, T I) { case BI_VAR_BLOCK_ID: { return handleSubBlock(ID, I, CreateAddFunc(addChild)); } + case BI_FRIEND_BLOCK_ID: { +return handleSubBlock(ID, I, + CreateAddFunc(addChild)); + } default: return llvm::createStringError(llvm::inconvertibleErrorCode(), "invalid su
[llvm-branch-commits] [clang-tools-extra] [clang-doc] serialize friends (PR #146165)
https://github.com/evelez7 ready_for_review https://github.com/llvm/llvm-project/pull/146165 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
@@ -1,115 +1,203 @@ -// RUN: llvm-mc -triple aarch64 -mattr +gcs -show-encoding %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>%t | FileCheck %s --check-prefix=NO-GCS -// RUN: FileCheck --check-prefix=ERROR-NO-GCS %s < %t +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \ +// RUN:| llvm-objdump -d --mattr=+gcs --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+gcs < %s \ +// RUN: | llvm-objdump -d --mattr=-gcs --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+gcs < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+gcs -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + msr GCSCR_EL1, x0 +// CHECK-INST: msr GCSCR_EL1, x0 +// CHECK-ENCODING: encoding: [0x00,0x25,0x18,0xd5] +// CHECK-UNKNOWN: d5182500 msr GCSCR_EL1, x0 + mrs x1, GCSCR_EL1 -// CHECK: msr GCSCR_EL1, x0 // encoding: [0x00,0x25,0x18,0xd5] -// CHECK: mrs x1, GCSCR_EL1 // encoding: [0x01,0x25,0x38,0xd5] +// CHECK-INST: mrs x1, GCSCR_EL1 +// CHECK-ENCODING: encoding: [0x01,0x25,0x38,0xd5] +// CHECK-UNKNOWN: d5382501 mrs x1, GCSCR_EL1 msr GCSPR_EL1, x2 +// CHECK-INST: msr GCSPR_EL1, x2 +// CHECK-ENCODING: encoding: [0x22,0x25,0x18,0xd5] +// CHECK-UNKNOWN: d5182522 msr GCSPR_EL1, x2 + mrs x3, GCSPR_EL1 -// CHECK: msr GCSPR_EL1, x2 // encoding: [0x22,0x25,0x18,0xd5] -// CHECK: mrs x3, GCSPR_EL1 // encoding: [0x23,0x25,0x38,0xd5] +// CHECK-INST: mrs x3, GCSPR_EL1 +// CHECK-ENCODING: encoding: [0x23,0x25,0x38,0xd5] +// CHECK-UNKNOWN: d5382523 mrs x3, GCSPR_EL1 msr GCSCRE0_EL1, x4 +// CHECK-INST: msr GCSCRE0_EL1, x4 +// CHECK-ENCODING: encoding: [0x44,0x25,0x18,0xd5] +// CHECK-UNKNOWN: d5182544 msr GCSCRE0_EL1, x4 + mrs x5, GCSCRE0_EL1 -// CHECK: msr GCSCRE0_EL1, x4 // encoding: [0x44,0x25,0x18,0xd5] -// CHECK: mrs x5, GCSCRE0_EL1 // encoding: [0x45,0x25,0x38,0xd5] +// CHECK-INST: mrs x5, GCSCRE0_EL1 +// CHECK-ENCODING: encoding: [0x45,0x25,0x38,0xd5] +// CHECK-UNKNOWN: d5382545 mrs x5, GCSCRE0_EL1 msr GCSPR_EL0, x6 +// CHECK-INST: msr GCSPR_EL0, x6 +// CHECK-ENCODING: encoding: [0x26,0x25,0x1b,0xd5] +// CHECK-UNKNOWN: d51b2526 msr GCSPR_EL0, x6 + mrs x7, GCSPR_EL0 -// CHECK: msr GCSPR_EL0, x6 // encoding: [0x26,0x25,0x1b,0xd5] -// CHECK: mrs x7, GCSPR_EL0 // encoding: [0x27,0x25,0x3b,0xd5] +// CHECK-INST: mrs x7, GCSPR_EL0 +// CHECK-ENCODING: encoding: [0x27,0x25,0x3b,0xd5] +// CHECK-UNKNOWN: d53b2527 mrs x7, GCSPR_EL0 msr GCSCR_EL2, x10 +// CHECK-INST: msr GCSCR_EL2, x10 +// CHECK-ENCODING: encoding: [0x0a,0x25,0x1c,0xd5] +// CHECK-UNKNOWN: d51c250a msr GCSCR_EL2, x10 + mrs x11, GCSCR_EL2 -// CHECK: msr GCSCR_EL2, x10 // encoding: [0x0a,0x25,0x1c,0xd5] -// CHECK: mrs x11, GCSCR_EL2 // encoding: [0x0b,0x25,0x3c,0xd5] +// CHECK-INST: mrs x11, GCSCR_EL2 +// CHECK-ENCODING: encoding: [0x0b,0x25,0x3c,0xd5] +// CHECK-UNKNOWN: d53c250b mrs x11, GCSCR_EL2 msr GCSPR_EL2, x12 +// CHECK-INST: msr GCSPR_EL2, x12 +// CHECK-ENCODING: encoding: [0x2c,0x25,0x1c,0xd5] +// CHECK-UNKNOWN: d51c252c msr GCSPR_EL2, x12 + mrs x13, GCSPR_EL2 -// CHECK: msr GCSPR_EL2, x12 // encoding: [0x2c,0x25,0x1c,0xd5] -// CHECK: mrs x13, GCSPR_EL2 // encoding: [0x2d,0x25,0x3c,0xd5] +// CHECK-INST: mrs x13, GCSPR_EL2 +// CHECK-ENCODING: encoding: [0x2d,0x25,0x3c,0xd5] +// CHECK-UNKNOWN: d53c252d mrs x13, GCSPR_EL2 msr GCSCR_EL12, x14 +// CHECK-INST: msr GCSCR_EL12, x14 +// CHECK-ENCODING: encoding: [0x0e,0x25,0x1d,0xd5] +// CHECK-UNKNOWN: d51d250e msr GCSCR_EL12, x14 + mrs x15, GCSCR_EL12 -// CHECK: msr GCSCR_EL12, x14 // encoding: [0x0e,0x25,0x1d,0xd5] -// CHECK: mrs x15, GCSCR_EL12 // encoding: [0x0f,0x25,0x3d,0xd5] +// CHECK-INST: mrs x15, GCSCR_EL12 +// CHECK-ENCODING: encoding: [0x0f,0x25,0x3d,0xd5] +// CHECK-UNKNOWN: d53d250f mrs x15, GCSCR_EL12 msr GCSPR_EL12, x16 +// CHECK-INST: msr GCSPR_EL12, x16 +// CHECK-ENCODING: encoding: [0x30,0x25,0x1d,0xd5] +// CHECK-UNKNOWN: d51d2530 msr GCSPR_EL12, x16 + mrs x17, GCSPR_EL12 -// CHECK: msr GCSPR_EL12, x16 // encoding: [0x30,0x25,0x1d,0xd5] -// CHECK: mrs x17, GCSPR_EL12 // encoding: [0
[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)
https://github.com/shiltian commented: There doesn't seem to be any test case for the new added `__kmpc_parallel_60`. If it is orthogonal to the `__kmpc_push_num_threads_strict` change, I'd prefer to make it a separate PR and have tests there. https://github.com/llvm/llvm-project/pull/146405 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)
@@ -45,7 +45,24 @@ using namespace ompx; namespace { -uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { +void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity, ro-i wrote: sorry, done https://github.com/llvm/llvm-project/pull/146404 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)
@@ -45,7 +45,24 @@ using namespace ompx; namespace { -uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { +void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity, shiltian wrote: Please use LLVM code style for device runtime. https://github.com/llvm/llvm-project/pull/146404 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [openmp] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (PR #146404)
https://github.com/ro-i updated https://github.com/llvm/llvm-project/pull/146404 >From cf566c60db9eef81c39a45082645c9d44992bec5 Mon Sep 17 00:00:00 2001 From: Robert Imschweiler Date: Fri, 27 Jun 2025 07:54:07 -0500 Subject: [PATCH 1/2] [OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary device runtime changes. --- offload/DeviceRTL/include/DeviceTypes.h | 6 ++ offload/DeviceRTL/src/Parallelism.cpp | 78 +++-- openmp/runtime/src/kmp.h| 1 + 3 files changed, 67 insertions(+), 18 deletions(-) diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/offload/DeviceRTL/include/DeviceTypes.h index 2e5d92380f040..43a5578f1 100644 --- a/offload/DeviceRTL/include/DeviceTypes.h +++ b/offload/DeviceRTL/include/DeviceTypes.h @@ -136,6 +136,12 @@ struct omp_lock_t { void *Lock; }; +// see definition in openmp/runtime kmp.h +typedef enum omp_severity_t { + severity_warning = 1, + severity_fatal = 2 +} omp_severity_t; + using InterWarpCopyFnTy = void (*)(void *src, int32_t warp_num); using ShuffleReductFnTy = void (*)(void *rhsData, int16_t lane_id, int16_t lane_offset, int16_t shortCircuit); diff --git a/offload/DeviceRTL/src/Parallelism.cpp b/offload/DeviceRTL/src/Parallelism.cpp index 08ce616aee1c4..78438a60454b8 100644 --- a/offload/DeviceRTL/src/Parallelism.cpp +++ b/offload/DeviceRTL/src/Parallelism.cpp @@ -45,7 +45,24 @@ using namespace ompx; namespace { -uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { +void num_threads_strict_error(int32_t nt_strict, int32_t nt_severity, + const char *nt_message, int32_t requested, + int32_t actual) { + if (nt_message) +printf("%s\n", nt_message); + else +printf("The computed number of threads (%u) does not match the requested " + "number of threads (%d). Consider that it might not be supported " + "to select exactly %d threads on this target device.\n", + actual, requested, requested); + if (nt_severity == severity_fatal) +__builtin_trap(); +} + +uint32_t determineNumberOfThreads(int32_t NumThreadsClause, + int32_t nt_strict = false, + int32_t nt_severity = severity_fatal, + const char *nt_message = nullptr) { uint32_t NThreadsICV = NumThreadsClause != -1 ? NumThreadsClause : icv::NThreads; uint32_t NumThreads = mapping::getMaxTeamThreads(); @@ -55,13 +72,17 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { // SPMD mode allows any number of threads, for generic mode we round down to a // multiple of WARPSIZE since it is legal to do so in OpenMP. - if (mapping::isSPMDMode()) -return NumThreads; + if (!mapping::isSPMDMode()) { +if (NumThreads < mapping::getWarpSize()) + NumThreads = 1; +else + NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1)); + } - if (NumThreads < mapping::getWarpSize()) -NumThreads = 1; - else -NumThreads = (NumThreads & ~((uint32_t)mapping::getWarpSize() - 1)); + if (NumThreadsClause != -1 && nt_strict && + NumThreads != static_cast(NumThreadsClause)) +num_threads_strict_error(nt_strict, nt_severity, nt_message, + NumThreadsClause, NumThreads); return NumThreads; } @@ -82,12 +103,14 @@ uint32_t determineNumberOfThreads(int32_t NumThreadsClause) { extern "C" { -[[clang::always_inline]] void __kmpc_parallel_spmd(IdentTy *ident, - int32_t num_threads, - void *fn, void **args, - const int64_t nargs) { +[[clang::always_inline]] void +__kmpc_parallel_spmd(IdentTy *ident, int32_t num_threads, void *fn, void **args, + const int64_t nargs, int32_t nt_strict = false, + int32_t nt_severity = severity_fatal, + const char *nt_message = nullptr) { uint32_t TId = mapping::getThreadIdInBlock(); - uint32_t NumThreads = determineNumberOfThreads(num_threads); + uint32_t NumThreads = + determineNumberOfThreads(num_threads, nt_strict, nt_severity, nt_message); uint32_t PTeamSize = NumThreads == mapping::getMaxTeamThreads() ? 0 : NumThreads; // Avoid the race between the read of the `icv::Level` above and the write @@ -140,10 +163,11 @@ extern "C" { return; } -[[clang::always_inline]] void -__kmpc_parallel_51(IdentTy *ident, int32_t, int32_t if_expr, - int32_t num_threads, int proc_bind, void *fn, - void *wrapper_fn, void **args, int64_t nargs) { +[[clang
[llvm-branch-commits] [clang] [llvm] [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (PR #146405)
shiltian wrote: Even after I expanded all folded files, when I search for `__kmpc_parallel_60`, my browser only shows three matches. Did I miss anything here? https://github.com/llvm/llvm-project/pull/146405 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146053 >From 3f62ab3beb30abbf8c8c32dd79c0133f7ca122e0 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 26 Jun 2025 13:08:31 +0200 Subject: [PATCH 1/2] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations --- .../AMDGPU/workitems-intrinsics-opts.ll | 553 ++ 1 file changed, 553 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll diff --git a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll new file mode 100644 index 0..14120680216fc --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll @@ -0,0 +1,553 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s --check-prefixes=GFX8,DAGISEL-GFX9 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s --check-prefixes=GFX942,DAGISEL-GFX942 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s --check-prefixes=GFX12,DAGISEL-GFX12 + +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s --check-prefixes=GFX8,GISEL-GFX8 +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx942 %s -o - | FileCheck %s --check-prefixes=GFX942,GISEL-GFX942 +; RUN: llc -O3 -global-isel -mtriple=amdgcn -mcpu=gfx1200 %s -o - | FileCheck %s --check-prefixes=GFX12,GISEL-GFX12 + +; (workitem_id_x | workitem_id_y | workitem_id_z) == 0 +define i1 @workitem_zero() { +; DAGISEL-GFX9-LABEL: workitem_zero: +; DAGISEL-GFX9: ; %bb.0: ; %entry +; DAGISEL-GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX9-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX9-NEXT:v_or_b32_e32 v1, v31, v1 +; DAGISEL-GFX9-NEXT:v_or_b32_e32 v0, v1, v0 +; DAGISEL-GFX9-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX9-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; DAGISEL-GFX9-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; DAGISEL-GFX9-NEXT:s_setpc_b64 s[30:31] +; +; DAGISEL-GFX942-LABEL: workitem_zero: +; DAGISEL-GFX942: ; %bb.0: ; %entry +; DAGISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX942-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX942-NEXT:v_or3_b32 v0, v31, v1, v0 +; DAGISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; DAGISEL-GFX942-NEXT:s_nop 1 +; DAGISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; DAGISEL-GFX942-NEXT:s_setpc_b64 s[30:31] +; +; DAGISEL-GFX12-LABEL: workitem_zero: +; DAGISEL-GFX12: ; %bb.0: ; %entry +; DAGISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_expcnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_samplecnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_bvhcnt 0x0 +; DAGISEL-GFX12-NEXT:s_wait_kmcnt 0x0 +; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v0, 20, v31 +; DAGISEL-GFX12-NEXT:v_lshrrev_b32_e32 v1, 10, v31 +; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1) +; DAGISEL-GFX12-NEXT:v_or3_b32 v0, v31, v1, v0 +; DAGISEL-GFX12-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) +; DAGISEL-GFX12-NEXT:v_cmp_eq_u32_e32 vcc_lo, 0, v0 +; DAGISEL-GFX12-NEXT:s_wait_alu 0xfffd +; DAGISEL-GFX12-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc_lo +; DAGISEL-GFX12-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX8-LABEL: workitem_zero: +; GISEL-GFX8: ; %bb.0: ; %entry +; GISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v31 +; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 10, 10 +; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1 +; GISEL-GFX8-NEXT:v_bfe_u32 v1, v31, 20, 10 +; GISEL-GFX8-NEXT:v_or_b32_e32 v0, v0, v1 +; GISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; GISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; GISEL-GFX8-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX942-LABEL: workitem_zero: +; GISEL-GFX942: ; %bb.0: ; %entry +; GISEL-GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GISEL-GFX942-NEXT:v_and_b32_e32 v0, 0x3ff, v31 +; GISEL-GFX942-NEXT:v_bfe_u32 v1, v31, 10, 10 +; GISEL-GFX942-NEXT:v_bfe_u32 v2, v31, 20, 10 +; GISEL-GFX942-NEXT:v_or3_b32 v0, v0, v1, v2 +; GISEL-GFX942-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 +; GISEL-GFX942-NEXT:s_nop 1 +; GISEL-GFX942-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc +; GISEL-GFX942-NEXT:s_setpc_b64 s[30:31] +; +; GISEL-GFX12-LABEL: workitem_zero: +; GISEL-GFX12: ; %bb.0: ; %entry +; GISEL-GFX12-NEXT:s_wait_loadcnt_dscnt 0x0 +; GISEL-GFX12-NEXT:s_wait_expcnt 0x0 +; GISEL-GFX12-NEXT:s_wait_samplecnt 0x0 +; GISEL-GFX12-NEXT:s_wait_bvhcnt 0x0 +; GISEL-GFX1
[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146055 >From da05cc2d920917f0cb6f171b0d9e2e535836ca3c Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 27 Jun 2025 12:04:53 +0200 Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together. Equivalent of the previous DAG patch for GISel. The shifts are BFXs in GISel, so the canonical form of the entire expression is different than in the DAG. The mask is not at the root of the expression, it remains on the leaves instead. See #136727 --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 2 + .../include/llvm/Target/GlobalISel/Combine.td | 11 +- .../GlobalISel/CombinerHelperCompares.cpp | 89 + .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++ .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++ 5 files changed, 483 insertions(+), 139 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index c15263e0b06f8..5ec82c30f268f 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -641,6 +641,8 @@ class CombinerHelper { /// KnownBits information. bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const; + bool combineMergedBFXCompare(MachineInstr &MI) const; + /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2) bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 4a92dc16c1bf4..cba46a5edf9ec 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule< (G_ICMP $root, $p, $ordst, 0)) >; +// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0) +// into a compare of a extract/mask of X +def icmp_merged_bfx_combine: GICombineRule< + (defs root:$root), + (combine (G_ICMP $dst, $p, $src, 0):$root, + [{ return Helper.combineMergedBFXCompare(*${root}); }]) +>; + def and_or_disjoint_mask : GICombineRule< (defs root:$root, build_fn_matchinfo:$info), (match (wip_match_opcode G_AND):$root, @@ -2052,7 +2060,8 @@ def all_combines : GICombineGroup<[integer_reassoc_combines, trivial_combines, fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors, simplify_neg_minmax, combine_concat_vector, sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines, -combine_use_vector_truncate, merge_combines, overflow_combines]>; +combine_use_vector_truncate, merge_combines, overflow_combines, +icmp_merged_bfx_combine]>; // A combine group used to for prelegalizer combiners at -O0. The combines in // this group have been selected based on experiments to balance code size and diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp index fc40533cf3dc9..e1d43f37bac13 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp @@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const MachineInstr &MI, return false; } + +bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const { + const GICmp *Cmp = cast(&MI); + + ICmpInst::Predicate CC = Cmp->getCond(); + if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE) +return false; + + Register CmpLHS = Cmp->getLHSReg(); + Register CmpRHS = Cmp->getRHSReg(); + + LLT OpTy = MRI.getType(CmpLHS); + if (!OpTy.isScalar() || OpTy.isPointer()) +return false; + + assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false)); + + Register Src; + const auto IsSrc = [&](Register R) { +if (!Src) { + Src = R; + return true; +} + +return Src == R; + }; + + MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS); + if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR) +return false; + + APInt PartsMask(OpTy.getSizeInBits(), 0); + SmallVector Worklist = {CmpLHSDef}; + while (!Worklist.empty()) { +MachineInstr *Cur = Worklist.pop_back_val(); + +Register Dst = Cur->getOperand(0).getReg(); +if (!MRI.hasOneUse(Dst) && Dst != Src) + return false; + +if (Cur->getOpcode() == TargetOpcode::G_OR) { + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg())); + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg())); + continue; +} + +if (Cur->getOpcode() == TargetOpcode::G_UBFX) { + Register Op = Cur->getOperand(1).getReg(); + Register Width = Cur->getOperand(2).getReg(); + Register Off = Cur->getOperand(3).getReg(); + + auto WidthCst = getIConstantVRegVal(Width, MRI); + auto
[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146055 >From da05cc2d920917f0cb6f171b0d9e2e535836ca3c Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 27 Jun 2025 12:04:53 +0200 Subject: [PATCH] [GISel] Combine compare of bitfield extracts or'd together. Equivalent of the previous DAG patch for GISel. The shifts are BFXs in GISel, so the canonical form of the entire expression is different than in the DAG. The mask is not at the root of the expression, it remains on the leaves instead. See #136727 --- .../llvm/CodeGen/GlobalISel/CombinerHelper.h | 2 + .../include/llvm/Target/GlobalISel/Combine.td | 11 +- .../GlobalISel/CombinerHelperCompares.cpp | 89 + .../GlobalISel/combine-cmp-merged-bfx.mir | 326 ++ .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 194 +++ 5 files changed, 483 insertions(+), 139 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/combine-cmp-merged-bfx.mir diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h index c15263e0b06f8..5ec82c30f268f 100644 --- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h +++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h @@ -641,6 +641,8 @@ class CombinerHelper { /// KnownBits information. bool matchICmpToLHSKnownBits(MachineInstr &MI, BuildFnTy &MatchInfo) const; + bool combineMergedBFXCompare(MachineInstr &MI) const; + /// \returns true if (and (or x, c1), c2) can be replaced with (and x, c2) bool matchAndOrDisjointMask(MachineInstr &MI, BuildFnTy &MatchInfo) const; diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td index 4a92dc16c1bf4..cba46a5edf9ec 100644 --- a/llvm/include/llvm/Target/GlobalISel/Combine.td +++ b/llvm/include/llvm/Target/GlobalISel/Combine.td @@ -1085,6 +1085,14 @@ def double_icmp_zero_or_combine: GICombineRule< (G_ICMP $root, $p, $ordst, 0)) >; +// Transform ((X | (G_UBFX X, ...) | ...) == 0) (or != 0) +// into a compare of a extract/mask of X +def icmp_merged_bfx_combine: GICombineRule< + (defs root:$root), + (combine (G_ICMP $dst, $p, $src, 0):$root, + [{ return Helper.combineMergedBFXCompare(*${root}); }]) +>; + def and_or_disjoint_mask : GICombineRule< (defs root:$root, build_fn_matchinfo:$info), (match (wip_match_opcode G_AND):$root, @@ -2052,7 +2060,8 @@ def all_combines : GICombineGroup<[integer_reassoc_combines, trivial_combines, fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors, simplify_neg_minmax, combine_concat_vector, sext_trunc, zext_trunc, prefer_sign_combines, shuffle_combines, -combine_use_vector_truncate, merge_combines, overflow_combines]>; +combine_use_vector_truncate, merge_combines, overflow_combines, +icmp_merged_bfx_combine]>; // A combine group used to for prelegalizer combiners at -O0. The combines in // this group have been selected based on experiments to balance code size and diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp index fc40533cf3dc9..e1d43f37bac13 100644 --- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp +++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCompares.cpp @@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const MachineInstr &MI, return false; } + +bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const { + const GICmp *Cmp = cast(&MI); + + ICmpInst::Predicate CC = Cmp->getCond(); + if (CC != CmpInst::ICMP_EQ && CC != CmpInst::ICMP_NE) +return false; + + Register CmpLHS = Cmp->getLHSReg(); + Register CmpRHS = Cmp->getRHSReg(); + + LLT OpTy = MRI.getType(CmpLHS); + if (!OpTy.isScalar() || OpTy.isPointer()) +return false; + + assert(isZeroOrZeroSplat(CmpRHS, /*AllowUndefs=*/false)); + + Register Src; + const auto IsSrc = [&](Register R) { +if (!Src) { + Src = R; + return true; +} + +return Src == R; + }; + + MachineInstr *CmpLHSDef = MRI.getVRegDef(CmpLHS); + if (CmpLHSDef->getOpcode() != TargetOpcode::G_OR) +return false; + + APInt PartsMask(OpTy.getSizeInBits(), 0); + SmallVector Worklist = {CmpLHSDef}; + while (!Worklist.empty()) { +MachineInstr *Cur = Worklist.pop_back_val(); + +Register Dst = Cur->getOperand(0).getReg(); +if (!MRI.hasOneUse(Dst) && Dst != Src) + return false; + +if (Cur->getOpcode() == TargetOpcode::G_OR) { + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(1).getReg())); + Worklist.push_back(MRI.getVRegDef(Cur->getOperand(2).getReg())); + continue; +} + +if (Cur->getOpcode() == TargetOpcode::G_UBFX) { + Register Op = Cur->getOperand(1).getReg(); + Register Width = Cur->getOperand(2).getReg(); + Register Off = Cur->getOperand(3).getReg(); + + auto WidthCst = getIConstantVRegVal(Width, MRI); + auto
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/146054 >From 17ac90ad1ee167f35321e01625a207f2b94ff523 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 26 Jun 2025 13:31:37 +0200 Subject: [PATCH 1/2] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences Fold sequences where we extract a bunch of contiguous bits from a value, merge them into the low bit and then check if the low bits are zero or not. It seems like a strange sequence at first but it's an idiom used by device libs in device libs to check workitem IDs for AMDGPU. The reason I put this in DAGCombiner instead of the target combiner is because this is a generic, valid transform that's also fairly niche, so there isn't much risk of a combine loop I think. See #136727 --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 86 ++- .../CodeGen/AMDGPU/workitem-intrinsic-opts.ll | 34 ++-- 2 files changed, 91 insertions(+), 29 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 08dab7c697b99..a189208d3a62e 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); + + if (Root.getOpcode() != ISD::AND) +return SDValue(); + + SDValue N0 = Root.getOperand(0); + SDValue N1 = Root.getOperand(1); + + if (N0.getOpcode() != ISD::OR || !isa(N1)) +return SDValue(); + + APInt RootMask = cast(N1)->getAsAPIntVal(); + if (!RootMask.isMask()) +return SDValue(); + + SDValue Src; + const auto IsSrc = [&](SDValue V) { +if (!Src) { + Src = V; + return true; +} + +return Src == V; + }; + + SmallVector Worklist = {N0}; + APInt PartsMask(VT.getSizeInBits(), 0); + while (!Worklist.empty()) { +SDValue V = Worklist.pop_back_val(); +if (!V.hasOneUse() && Src != V) + return SDValue(); + +if (V.getOpcode() == ISD::OR) { + Worklist.push_back(V.getOperand(0)); + Worklist.push_back(V.getOperand(1)); + continue; +} + +if (V.getOpcode() == ISD::SRL) { + SDValue ShiftSrc = V.getOperand(0); + SDValue ShiftAmt = V.getOperand(1); + + if (!IsSrc(ShiftSrc) || !isa(ShiftAmt)) +return SDValue(); + + PartsMask |= (RootMask << cast(ShiftAmt)->getAsZExtVal()); + continue; +} + +if (IsSrc(V)) { + PartsMask |= RootMask; + continue; +} + +return SDValue(); + } + + if (!RootMask.isMask() || !Src) +return SDValue(); + + SDLoc DL(Root); + return DAG.getNode(ISD::AND, DL, VT, + {Src, DAG.getConstant(PartsMask, DL, VT)}); +} + /// This is a stub for TargetLowering::SimplifySetCC. SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond, const SDLoc &DL, bool foldBooleans) { TargetLowering::DAGCombinerInfo DagCombineInfo(DAG, Level, false, this); - return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL); + if (SDValue C = + TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL)) +return C; + + if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) && + N0.getOpcode() == ISD::AND && isNullConstant(N1)) { + +if (SDValue Res = matchMergedBFX(N0, DAG, TLI)) + return DAG.getSetCC(DL, VT, Res, N1, Cond); + } + + return SDValue(); } /// Given an ISD::SDIV node expressing a divide by constant, return diff --git a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll index 07c4aeb1ac7df..64d055bc40e98 100644 --- a/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll +++ b/llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll @@ -12,11 +12,7 @@ define i1 @workitem_zero() { ; DAGISEL-GFX8-LABEL: workitem_zero: ; DAGISEL-GFX8: ; %bb.0: ; %entry ; DAGISEL-GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v1, 10, v31 -; DAGISEL-GFX8-NEXT:v_lshrrev_b32_e32 v0, 20, v31 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v1, v31, v1 -; DAGISEL-GFX8-NEXT:v_or_b32_e32 v0, v1, v0 -; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; DAGISEL-GFX8-NEXT:v_and_b32_e32 v0, 0x3fff, v31 ; DAGISEL-GFX8-NEXT:v_cmp_eq_u32_e32 vcc, 0, v0 ; DAGISEL-GFX8-NEXT:v_cndmask_b32_e64 v0, 0, 1, vcc ; DAGISEL-GFX8-NEXT:s_setpc_b64 s[30:31] @@ -
[llvm-branch-commits] [mlir] [mlir][tblgen] Fix test definition names to reflect expected valid results (NFC) (PR #146243)
https://github.com/zero9178 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/146243 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); + + if (Root.getOpcode() != ISD::AND) +return SDValue(); + + SDValue N0 = Root.getOperand(0); + SDValue N1 = Root.getOperand(1); + + if (N0.getOpcode() != ISD::OR || !isa(N1)) +return SDValue(); Pierre-vh wrote: I don't think so, except maybe for the shift part but even then it doesn't make the code much shorter. I don't check for a tree of node, just one node at a time https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern such as: + // (X | (X >> C0) | (X >> C1) | ...) & Mask + // This extracts contiguous parts of X and ORs them together before comparing. + // We can optimize this so that we directly check (X & SomeMask) instead, + // eliminating the shifts. + + EVT VT = Root.getValueType(); Pierre-vh wrote: I'll update it. Should I bother supporting vector types here? I think nothing's stopping it except testing coverage. On AMDGPU we scalarize the vector compares https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
llvmbot wrote: @llvm/pr-subscribers-mc Author: Jonathan Thackray (jthackray) Changes This is a series of patches (3/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have multiple feature dependancies * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed --- Patch is 461.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146330.diff 29 Files Affected: - (modified) llvm/test/MC/AArch64/armv8.6a-fgt.s (+105-47) - (added) llvm/test/MC/AArch64/armv8.8a-mops-diagnostics.s (+227) - (modified) llvm/test/MC/AArch64/armv8.8a-mops.s (+569-488) - (modified) llvm/test/MC/AArch64/armv8.9a-clrbhb.s (+29-16) - (modified) llvm/test/MC/AArch64/armv8.9a-debug-pmu.s (+1560-467) - (modified) llvm/test/MC/AArch64/armv8.9a-lrcpc3.s (+237-138) - (modified) llvm/test/MC/AArch64/armv8.9a-specres2.s (+27-8) - (added) llvm/test/MC/AArch64/armv8.9a-the-diagnostics.s (+103) - (modified) llvm/test/MC/AArch64/armv8.9a-the.s (+677-572) - (added) llvm/test/MC/AArch64/armv9-mrrs-diagnostics.s (+30) - (modified) llvm/test/MC/AArch64/armv9-mrrs.s (+235-92) - (added) llvm/test/MC/AArch64/armv9-msrr-diagnostics.s (+30) - (modified) llvm/test/MC/AArch64/armv9-msrr.s (+125-95) - (added) llvm/test/MC/AArch64/armv9-sysp-diagnostics.s (+35) - (removed) llvm/test/MC/AArch64/armv9-sysp.s (-538) - (modified) llvm/test/MC/AArch64/armv9.4a-chk.s (+26-9) - (modified) llvm/test/MC/AArch64/armv9.5a-tlbiw.s (+38-15) - (added) llvm/test/MC/AArch64/armv9a-sysp.s (+834) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.6a-fgt.txt (-75) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.8a-mops.txt (-434) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-clrbhb.txt (-16) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-debug-pmu.txt (-730) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-lrcpc3.txt (-113) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-specres2.txt (-16) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-the.txt (-482) - (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysp.txt (-562) - (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysreg128.txt (-147) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-chk.txt (-8) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.5a-tlbiw.txt (-27) ``diff diff --git a/llvm/test/MC/AArch64/armv8.6a-fgt.s b/llvm/test/MC/AArch64/armv8.6a-fgt.s index 11002aca5e1a0..4b825ea191a68 100644 --- a/llvm/test/MC/AArch64/armv8.6a-fgt.s +++ b/llvm/test/MC/AArch64/armv8.6a-fgt.s @@ -1,75 +1,133 @@ -// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+fgt < %s | FileCheck %s -// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+v8.6a < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64 -show-encoding < %s 2>&1 | FileCheck %s --check-prefix=NOFGT +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+v8.6a < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \ +// RUN:| llvm-objdump -d --mattr=+fgt - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \ +// RUN: | llvm-objdump -d --mattr=-fgt - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+fgt -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + msr HFGRTR_EL2, x0 +// CHECK-INST: msr HFGRTR_EL2, x0 +// CHECK-ENCODING: encoding: [0x80,0x11,0x1c,0xd5] +// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or pstate +// CHECK-UNKNOWN: d51c1180 msr S3_4_C1_C1_4, x0 msr HFGWTR_EL2, x5 +// CHECK-INST: msr HFGWTR_EL2, x5 +// CHECK-ENCODING: encoding: [0xa5,0x11,0x1c,0xd5] +// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or pstate +// CHECK-UNKNOWN: d51c11a5 msr S3_4_C1_C1_5, x5 msr HFGITR_EL2, x10 +// CHECK-INST: msr HFGITR_EL2, x10 +// CHECK-ENCODING: encoding: [0xca,0x11,0x1c,0xd5] +// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or pstate +// CHECK-UNKNOWN: d51c11ca msr S3_4_C1_C1_6, x10 msr HDFGRTR_EL2, x15 +/
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)
llvmbot wrote: @llvm/pr-subscribers-mc @llvm/pr-subscribers-backend-aarch64 Author: Jonathan Thackray (jthackray) Changes This is a series of patches (4/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests whose .s tests have functions * makes the .s tests have a roundabout run line to test both encoding and assembly --- Patch is 67.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146331.diff 8 Files Affected: - (modified) llvm/test/MC/AArch64/armv9.6a-lsui.s (+708-365) - (modified) llvm/test/MC/AArch64/armv9.6a-occmo.s (+38-16) - (modified) llvm/test/MC/AArch64/armv9.6a-pcdphint.s (+25-12) - (modified) llvm/test/MC/AArch64/armv9.6a-rme-gpc3.s (+34-12) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-lsui.txt (-323) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-occmo.txt (-11) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-pcdphint.txt (-8) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-rme-gpc3.txt (-18) ``diff diff --git a/llvm/test/MC/AArch64/armv9.6a-lsui.s b/llvm/test/MC/AArch64/armv9.6a-lsui.s index d4a5e1f980560..264a869b6d286 100644 --- a/llvm/test/MC/AArch64/armv9.6a-lsui.s +++ b/llvm/test/MC/AArch64/armv9.6a-lsui.s @@ -1,408 +1,751 @@ -// RUN: llvm-mc -triple aarch64 -mattr=+lsui -show-encoding %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64 -show-encoding %s 2>&1 | FileCheck %s --check-prefix=ERROR +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \ +// RUN: | llvm-objdump -d --mattr=+lsui --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+lsui < %s \ +// RUN: | llvm-objdump -d --mattr=-lsui --no-print-imm-hex - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+lsui < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+lsui -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + -_func: -// CHECK: _func: //-- // Unprivileged load/store operations //-- - ldtxr x9, [sp] -// CHECK: ldtxrx9, [sp]// encoding: [0xe9,0x7f,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x9, [sp, #0] -// CHECK: ldtxrx9, [sp]// encoding: [0xe9,0x7f,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x10, [x11] -// CHECK: ldtxrx10, [x11] // encoding: [0x6a,0x7d,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldtxr x10, [x11, #0] -// CHECK: ldtxrx10, [x11] // encoding: [0x6a,0x7d,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - - ldatxr x9, [sp] -// CHECK: ldatxr x9, [sp]// encoding: [0xe9,0xff,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - ldatxr x10, [x11] -// CHECK: ldatxr x10, [x11] // encoding: [0x6a,0xfd,0x5f,0xc9] -// ERROR: error: instruction requires: lsui - - sttxr wzr, w4, [sp] -// CHECK: sttxrwzr, w4, [sp] // encoding: [0xe4,0x7f,0x1f,0x89] -// ERROR: error: instruction requires: lsui - sttxr wzr, w4, [sp, #0] -// CHECK: sttxrwzr, w4, [sp] // encoding: [0xe4,0x7f,0x1f,0x89] -// ERROR: error: instruction requires: lsui - sttxr w5, x6, [x7] -// CHECK: sttxrw5, x6, [x7]// encoding: [0xe6,0x7c,0x05,0xc9] -// ERROR: error: instruction requires: lsui - sttxr w5, x6, [x7, #0] -// CHECK: sttxrw5, x6, [x7]// encoding: [0xe6,0x7c,0x05,0xc9] -// ERROR: error: instruction requires: lsui - - stltxr w2, w4, [sp] -// CHECK: stltxr w2, w4, [sp]// encoding: [0xe4,0xff,0x02,0x89] -// ERROR: error: instruction requires: lsui - stltxr w5, x6, [x7] -// CHECK: stltxr w5, x6, [x7]// encoding: [0xe6,0xfc,0x05,0xc9] -// ERROR: error: instruction requires: lsui +ldtxr x9, [sp] +// CHECK-INST: ldtxr x9, [sp] +// CHECK-ENCODING: encoding: [0xe9,0x7f,0x5f,0xc9] +// CHECK-ERROR: error: ins
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: Jonathan Thackray (jthackray) Changes This is a series of patches (3/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have multiple feature dependancies * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed --- Patch is 461.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146330.diff 29 Files Affected: - (modified) llvm/test/MC/AArch64/armv8.6a-fgt.s (+105-47) - (added) llvm/test/MC/AArch64/armv8.8a-mops-diagnostics.s (+227) - (modified) llvm/test/MC/AArch64/armv8.8a-mops.s (+569-488) - (modified) llvm/test/MC/AArch64/armv8.9a-clrbhb.s (+29-16) - (modified) llvm/test/MC/AArch64/armv8.9a-debug-pmu.s (+1560-467) - (modified) llvm/test/MC/AArch64/armv8.9a-lrcpc3.s (+237-138) - (modified) llvm/test/MC/AArch64/armv8.9a-specres2.s (+27-8) - (added) llvm/test/MC/AArch64/armv8.9a-the-diagnostics.s (+103) - (modified) llvm/test/MC/AArch64/armv8.9a-the.s (+677-572) - (added) llvm/test/MC/AArch64/armv9-mrrs-diagnostics.s (+30) - (modified) llvm/test/MC/AArch64/armv9-mrrs.s (+235-92) - (added) llvm/test/MC/AArch64/armv9-msrr-diagnostics.s (+30) - (modified) llvm/test/MC/AArch64/armv9-msrr.s (+125-95) - (added) llvm/test/MC/AArch64/armv9-sysp-diagnostics.s (+35) - (removed) llvm/test/MC/AArch64/armv9-sysp.s (-538) - (modified) llvm/test/MC/AArch64/armv9.4a-chk.s (+26-9) - (modified) llvm/test/MC/AArch64/armv9.5a-tlbiw.s (+38-15) - (added) llvm/test/MC/AArch64/armv9a-sysp.s (+834) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.6a-fgt.txt (-75) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.8a-mops.txt (-434) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-clrbhb.txt (-16) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-debug-pmu.txt (-730) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-lrcpc3.txt (-113) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-specres2.txt (-16) - (removed) llvm/test/MC/Disassembler/AArch64/armv8.9a-the.txt (-482) - (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysp.txt (-562) - (removed) llvm/test/MC/Disassembler/AArch64/armv9-sysreg128.txt (-147) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-chk.txt (-8) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.5a-tlbiw.txt (-27) ``diff diff --git a/llvm/test/MC/AArch64/armv8.6a-fgt.s b/llvm/test/MC/AArch64/armv8.6a-fgt.s index 11002aca5e1a0..4b825ea191a68 100644 --- a/llvm/test/MC/AArch64/armv8.6a-fgt.s +++ b/llvm/test/MC/AArch64/armv8.6a-fgt.s @@ -1,75 +1,133 @@ -// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+fgt < %s | FileCheck %s -// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+v8.6a < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64 -show-encoding < %s 2>&1 | FileCheck %s --check-prefix=NOFGT +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+v8.6a < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +// RUN: not llvm-mc -triple=aarch64 -show-encoding < %s 2>&1 \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ERROR +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \ +// RUN:| llvm-objdump -d --mattr=+fgt - | FileCheck %s --check-prefix=CHECK-INST +// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+fgt < %s \ +// RUN: | llvm-objdump -d --mattr=-fgt - | FileCheck %s --check-prefix=CHECK-UNKNOWN +// Disassemble encoding and check the re-encoding (-show-encoding) matches. +// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+fgt < %s \ +// RUN:| sed '/.text/d' | sed 's/.*encoding: //g' \ +// RUN:| llvm-mc -triple=aarch64 -mattr=+fgt -disassemble -show-encoding \ +// RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST + + msr HFGRTR_EL2, x0 +// CHECK-INST: msr HFGRTR_EL2, x0 +// CHECK-ENCODING: encoding: [0x80,0x11,0x1c,0xd5] +// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or pstate +// CHECK-UNKNOWN: d51c1180 msr S3_4_C1_C1_4, x0 msr HFGWTR_EL2, x5 +// CHECK-INST: msr HFGWTR_EL2, x5 +// CHECK-ENCODING: encoding: [0xa5,0x11,0x1c,0xd5] +// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or pstate +// CHECK-UNKNOWN: d51c11a5 msr S3_4_C1_C1_5, x5 msr HFGITR_EL2, x10 +// CHECK-INST: msr HFGITR_EL2, x10 +// CHECK-ENCODING: encoding: [0xca,0x11,0x1c,0xd5] +// CHECK-ERROR: :[[@LINE-3]]:5: error: expected writable system register or pstate +// CHECK-UNKNOWN: d51c11ca msr S3_4_C1_C1_6, x10 msr HDFGRT
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
https://github.com/jthackray created https://github.com/llvm/llvm-project/pull/146329 This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests >From be8bcdead883ec9bac8bebf6b3382974fc988c28 Mon Sep 17 00:00:00 2001 From: Jonathan Thackray Date: Wed, 25 Jun 2025 21:22:43 +0100 Subject: [PATCH] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests Co-authored-by: Virginia Cangelosi --- llvm/test/MC/AArch64/armv9.2a-mec.s | 172 ++- llvm/test/MC/AArch64/armv9.4-lse128.s | 98 - llvm/test/MC/AArch64/armv9.4a-gcs.s | 198 +- .../MC/AArch64/armv9.4a-lse128-diagnostics.s | 17 ++ llvm/test/MC/AArch64/armv9.4a-lse128.s| 138 llvm/test/MC/AArch64/armv9.5a-cpa.s | 89 +--- .../MC/AArch64/armv9.6a-mpam-diagnostics.s| 5 + llvm/test/MC/AArch64/armv9.6a-mpam.s | 80 +-- .../MC/Disassembler/AArch64/armv9.4a-gcs.txt | 90 .../Disassembler/AArch64/armv9.4a-lse128.txt | 98 - .../MC/Disassembler/AArch64/armv9.5a-cpa.txt | 42 .../MC/Disassembler/AArch64/armv9.6a-mpam.txt | 50 - .../MC/Disassembler/AArch64/armv9a-mec.txt| 54 - 13 files changed, 541 insertions(+), 590 deletions(-) delete mode 100644 llvm/test/MC/AArch64/armv9.4-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s create mode 100644 llvm/test/MC/AArch64/armv9.4a-lse128.s create mode 100644 llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt delete mode 100644 llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s b/llvm/test/MC/AArch64/armv9.2a-mec.s index 42e4bf732086e..c747886f7ec3b 100644 --- a/llvm/test/MC/AArch64/armv9.2a-mec.s +++ b/llvm/test/MC/AArch64/armv9.2a-mec.s @@ -1,55 +1,117 @@ -// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-MEC %s - - mrs x0, MECIDR_EL2 -// CHECK: mrs x0, MECIDR_EL2 // encoding: [0xe0,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P0_EL2 -// CHECK: mrs x0, MECID_P0_EL2 // encoding: [0x00,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A0_EL2 -// CHECK: mrs x0, MECID_A0_EL2 // encoding: [0x20,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P1_EL2 -// CHECK: mrs x0, MECID_P1_EL2 // encoding: [0x40,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A1_EL2 -// CHECK: mrs x0, MECID_A1_EL2 // encoding: [0x60,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_P_EL2 -// CHECK: mrs x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_A_EL2 -// CHECK: mrs x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_RL_A_EL3 -// CHECK: mrs x0, MECID_RL_A_EL3 // encoding: [0x20,0xaa,0x3e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - msr MECID_P0_EL2,x0 -// CHECK: msr MECID_P0_EL2, x0 // encoding: [0x00,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A0_EL2,x0 -// CHECK: msr MECID_A0_EL2, x0 // encoding: [0x20,0xa8,0x1
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: Jonathan Thackray (jthackray) Changes This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests --- Patch is 52.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146329.diff 13 Files Affected: - (modified) llvm/test/MC/AArch64/armv9.2a-mec.s (+117-55) - (removed) llvm/test/MC/AArch64/armv9.4-lse128.s (-98) - (modified) llvm/test/MC/AArch64/armv9.4a-gcs.s (+143-55) - (added) llvm/test/MC/AArch64/armv9.4a-lse128-diagnostics.s (+17) - (added) llvm/test/MC/AArch64/armv9.4a-lse128.s (+138) - (modified) llvm/test/MC/AArch64/armv9.5a-cpa.s (+63-26) - (added) llvm/test/MC/AArch64/armv9.6a-mpam-diagnostics.s (+5) - (modified) llvm/test/MC/AArch64/armv9.6a-mpam.s (+58-22) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-gcs.txt (-90) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.4a-lse128.txt (-98) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.5a-cpa.txt (-42) - (removed) llvm/test/MC/Disassembler/AArch64/armv9.6a-mpam.txt (-50) - (removed) llvm/test/MC/Disassembler/AArch64/armv9a-mec.txt (-54) ``diff diff --git a/llvm/test/MC/AArch64/armv9.2a-mec.s b/llvm/test/MC/AArch64/armv9.2a-mec.s index 42e4bf732086e..c747886f7ec3b 100644 --- a/llvm/test/MC/AArch64/armv9.2a-mec.s +++ b/llvm/test/MC/AArch64/armv9.2a-mec.s @@ -1,55 +1,117 @@ -// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+mec < %s | FileCheck %s -// RUN: not llvm-mc -triple aarch64-none-linux-gnu < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-MEC %s - - mrs x0, MECIDR_EL2 -// CHECK: mrs x0, MECIDR_EL2 // encoding: [0xe0,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P0_EL2 -// CHECK: mrs x0, MECID_P0_EL2 // encoding: [0x00,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A0_EL2 -// CHECK: mrs x0, MECID_A0_EL2 // encoding: [0x20,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_P1_EL2 -// CHECK: mrs x0, MECID_P1_EL2 // encoding: [0x40,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_A1_EL2 -// CHECK: mrs x0, MECID_A1_EL2 // encoding: [0x60,0xa8,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_P_EL2 -// CHECK: mrs x0, VMECID_P_EL2 // encoding: [0x00,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, VMECID_A_EL2 -// CHECK: mrs x0, VMECID_A_EL2 // encoding: [0x20,0xa9,0x3c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - mrs x0, MECID_RL_A_EL3 -// CHECK: mrs x0, MECID_RL_A_EL3 // encoding: [0x20,0xaa,0x3e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:19: error: expected readable system register - msr MECID_P0_EL2,x0 -// CHECK: msr MECID_P0_EL2, x0 // encoding: [0x00,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A0_EL2,x0 -// CHECK: msr MECID_A0_EL2, x0 // encoding: [0x20,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_P1_EL2,x0 -// CHECK: msr MECID_P1_EL2, x0 // encoding: [0x40,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_A1_EL2,x0 -// CHECK: msr MECID_A1_EL2, x0 // encoding: [0x60,0xa8,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr VMECID_P_EL2, x0 -// CHECK: msr VMECID_P_EL2, x0 // encoding: [0x00,0xa9,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr VMECID_A_EL2, x0 -// CHECK: msr VMECID_A_EL2, x0 // encoding: [0x20,0xa9,0x1c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - msr MECID_RL_A_EL3, x0 -// CHECK: msr MECID_RL_A_EL3, x0 // encoding: [0x20,0xaa,0x1e,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:15: error: expected writable system register or pstate - - dc cigdpae, x0 -// CHECK: dc cigdpae, x0 // encoding: [0xe0,0x7e,0x0c,0xd5] -// CHECK-NO-MEC: [[@LINE-2]]:14: error: DC CIGDPA
[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)
Pierre-vh wrote: ping https://github.com/llvm/llvm-project/pull/141589 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (PR #146329)
https://github.com/jthackray edited https://github.com/llvm/llvm-project/pull/146329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (PR #146331)
https://github.com/jthackray edited https://github.com/llvm/llvm-project/pull/146331 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (PR #146330)
https://github.com/jthackray edited https://github.com/llvm/llvm-project/pull/146330 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits