[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for UA 2005 instructions (PR #138400)
@@ -0,0 +1,28 @@ +//=== SparcInstrUAOSA.td - UltraSPARC/Oracle SPARC Architecture extensions ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This file contains instruction formats, definitions and patterns needed for +// UA 2005 instructions on SPARC. +//===--===// + +class UA2005RegWin fcn> +: F3_1<2, 0b110001, (outs), (ins), asmstr, []> { +let rd = fcn; +let rs1 = 0; +let rs2 = 0; s-barannikov wrote: The body is usually indented by 2 (the colon should still be indented by 4): ```suggestion let rd = fcn; let rs1 = 0; let rs2 = 0; ``` https://github.com/llvm/llvm-project/pull/138400 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for UA 2005 instructions (PR #138400)
https://github.com/s-barannikov approved this pull request. https://github.com/llvm/llvm-project/pull/138400 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)
https://github.com/mstorsjo updated https://github.com/llvm/llvm-project/pull/139468 From 79e10b190029b749e042d1aaec3ee697a2f5d41a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20Storsj=C3=B6?= Date: Fri, 28 Feb 2025 20:43:46 -0100 Subject: [PATCH 1/4] [libcxx] Provide locale conversions to tests through lit substitution (#105651) There are 2 problems today that this PR resolves: libcxx tests assume the thousands separator for fr_FR locale is x00A0 on Windows. This currently fails when run on newer versions of Windows (it seems to have been updated to the new correct value of 0x202F around windows 11. The exact windows version where it changed doesn't seem to be documented anywhere). Depending the OS version, you need different values. There are several ifdefs to determine the environment/platform-specific locale conversion values and it leads to maintenance as things change over time. This PR includes the following changes: - Provide the environment's locale conversion values through a substitution. The test can opt in by placing the substitution value in a define flag. - Remove the platform ifdefs (the swapping of values between Windows, Linux, Apple, AIX). This is accomplished through a lit feature action that fetches the environment's locale conversions (lconv) for members like 'thousands_sep' that we need to provide. This should ensure that we don't lose the effectiveness of the test itself. In addition, as a result of the above, this PR: - Fixes a handful of locale tests which unexpectedly fail on newer Windows versions. - Resolves 3 XFAIL FIX-MEs. Originally submitted in https://github.com/llvm/llvm-project/pull/86649. Co-authored-by: Rodrigo Salazar <4rodrigosala...@gmail.com> (cherry picked from commit f909b2229ac16ae3898d8b158bee85c384173dfa) --- .../get_long_double_fr_FR.pass.cpp| 5 +- .../get_long_double_ru_RU.pass.cpp| 5 +- .../put_long_double_fr_FR.pass.cpp| 5 +- .../put_long_double_ru_RU.pass.cpp| 5 +- .../thousands_sep.pass.cpp| 34 ++- .../thousands_sep.pass.cpp| 20 ++-- .../time.duration.nonmember/ostream.pass.cpp | 24 ++--- libcxx/test/support/locale_helpers.h | 37 ++-- libcxx/utils/libcxx/test/features.py | 91 ++- 9 files changed, 138 insertions(+), 88 deletions(-) diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp index bbb67d694970a..f02241ad36a5b 100644 --- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp +++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp @@ -13,6 +13,8 @@ // REQUIRES: locale.fr_FR.UTF-8 +// ADDITIONAL_COMPILE_FLAGS: -DFR_MON_THOU_SEP=%{LOCALE_CONV_FR_FR_UTF_8_MON_THOUSANDS_SEP} + // // class money_get @@ -59,7 +61,8 @@ class my_facetw }; static std::wstring convert_thousands_sep(std::wstring const& in) { - return LocaleHelpers::convert_thousands_sep_fr_FR(in); + const wchar_t fr_sep = LocaleHelpers::mon_thousands_sep_or_default(FR_MON_THOU_SEP); + return LocaleHelpers::convert_thousands_sep(in, fr_sep); } #endif // TEST_HAS_NO_WIDE_CHARACTERS diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp index e680f2ea8816a..371cf0e90c8d3 100644 --- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp +++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp @@ -11,6 +11,8 @@ // REQUIRES: locale.ru_RU.UTF-8 +// ADDITIONAL_COMPILE_FLAGS: -DRU_MON_THOU_SEP=%{LOCALE_CONV_RU_RU_UTF_8_MON_THOUSANDS_SEP} + // XFAIL: glibc-old-ru_RU-decimal-point // @@ -52,7 +54,8 @@ class my_facetw }; static std::wstring convert_thousands_sep(std::wstring const& in) { - return LocaleHelpers::convert_thousands_sep_ru_RU(in); + const wchar_t ru_sep = LocaleHelpers::mon_thousands_sep_or_default(RU_MON_THOU_SEP); + return LocaleHelpers::convert_thousands_sep(in, ru_sep); } #endif // TEST_HAS_NO_WIDE_CHARACTERS diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.m
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From 4fccbd69f8ee5b6f16b08da38cb65d989450c8aa Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 37 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 209 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index f88b4d5693979..a3b30943c8e7d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + assert(LD->getExtensionType() == ISD::NON_EXTLOAD && + "Extended load during type legalization!"); + SDLoc dl(LD); + EVT VT = LD->getValueType(0); + EVT LoVT, HiVT; + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(VT); + + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits()); + EVT MemIntVT = + EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, MemIntVT, IntVT, Ch, + Ptr, LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::TRUNCATE, dl, LoIntVT, ALD); + SDValue ExtractHi = + DAG.getNode(ISD::SRL, dl, IntVT, ALD, + DAG.getIntPtrConstant(VT.getSizeInBits() / 2, dl)); + ExtractHi = DAG.getNode(ISD::TRUNCATE, dl, HiIntVT, ExtractHi); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 3cf9e3c1a8dfa..6e2e9d4b21891 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -205,6 +205,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl (%rdi), %eax +; CHECK0-NEXT:movl %eax, %ecx +; CHECK0-NEXT:shrl
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From 4fccbd69f8ee5b6f16b08da38cb65d989450c8aa Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 37 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 209 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index f88b4d5693979..a3b30943c8e7d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + assert(LD->getExtensionType() == ISD::NON_EXTLOAD && + "Extended load during type legalization!"); + SDLoc dl(LD); + EVT VT = LD->getValueType(0); + EVT LoVT, HiVT; + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(VT); + + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits()); + EVT MemIntVT = + EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, MemIntVT, IntVT, Ch, + Ptr, LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::TRUNCATE, dl, LoIntVT, ALD); + SDValue ExtractHi = + DAG.getNode(ISD::SRL, dl, IntVT, ALD, + DAG.getIntPtrConstant(VT.getSizeInBits() / 2, dl)); + ExtractHi = DAG.getNode(ISD::TRUNCATE, dl, HiIntVT, ExtractHi); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 3cf9e3c1a8dfa..6e2e9d4b21891 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -205,6 +205,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl (%rdi), %eax +; CHECK0-NEXT:movl %eax, %ecx +; CHECK0-NEXT:shrl
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 717ea645df30178ab0873da4191d41bc7ba4b761 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 - llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 158 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..70f59eafc6ecb 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..eaa2ffd9b2731 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index f72970d12b6eb..d3027e799 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -382,6 +382,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LABEL: atomic_vec4_i8: ; CHECK3: ## %bb.0: @@ -405,6 +420,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ret <4 x i16> %ret } +define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_ptr270: +; CHECK: ## %b
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From 684a54284458cae0b700737126715384b9fddab1 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 2 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 65 ++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++ 4 files changed, 65 insertions(+), 153 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index 87b6914f8a0ee..40550d96a5b3d 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1873,7 +1873,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index bbf1b0fd590ef..d6e5cd1078776 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12215,7 +12215,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 3ab548f64d04c..409a8c7e73c0e 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -7193,15 +7193,19 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, MVT VT, const SDLoc &dl, } // Recurse to find a LoadSDNode source and the accumulated ByteOffest. -static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) { - if (ISD::isNON_EXTLoad(Elt.getNode())) { -auto *BaseLd = cast(Elt); -if (!BaseLd->isSimple()) - return false; +static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) { + if (auto *BaseLd = dyn_cast(Elt)) { Ld = BaseLd; ByteOffset = 0; return true; - } + } else if (auto *BaseLd = dyn_cast(Elt)) +if (ISD::isNON_EXTLoad(Elt.getNode())) { + if (!BaseLd->isSimple()) +return false; + Ld = BaseLd; + ByteOffset = 0; + return true; +} switch (Elt.getOpcode()) { case ISD::BITCAST: @@ -7254,7 +7258,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, APInt ZeroMask = APInt::getZero(NumElems); APInt UndefMask = APInt::getZero(NumElems); - SmallVector Loads(NumElems, nullptr); + SmallVector Loads(NumElems, nullptr); SmallVector ByteOffsets(NumElems, 0); // For each element in the initializer, see if we've found a load, zero or an @@ -7304,7 +7308,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, EVT EltBaseVT = EltBase.getValueType(); assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() && "Register/Memory size mismatch"); - LoadSDNode *LDBase = Loads[FirstLoadedElt]; + MemSDNode *LDBase = Loads[FirstLoadedElt]; assert(LDBase && "Did not find base load for merging consecutive loads"); unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits(); unsigned BaseSizeInBytes = BaseSizeInBits / 8; @@ -7318,16 +7322,18 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, // Check to see if the element's load is consecutive to the base load // or offset from a previous (already checked) load. - auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) { -LoadSDNode *Ld = Loads[EltIdx]; + auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) { +MemSDNode *Ld = Loads[EltIdx]; int64_t ByteOffset = ByteOffsets[EltIdx]; if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) { int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes); return (0 <= BaseIdx && BaseIdx < (int)NumElems && LoadMask[BaseIdx] && Loads[Bas
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From 684a54284458cae0b700737126715384b9fddab1 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 2 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 65 ++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++ 4 files changed, 65 insertions(+), 153 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index 87b6914f8a0ee..40550d96a5b3d 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1873,7 +1873,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index bbf1b0fd590ef..d6e5cd1078776 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12215,7 +12215,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 3ab548f64d04c..409a8c7e73c0e 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -7193,15 +7193,19 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, MVT VT, const SDLoc &dl, } // Recurse to find a LoadSDNode source and the accumulated ByteOffest. -static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) { - if (ISD::isNON_EXTLoad(Elt.getNode())) { -auto *BaseLd = cast(Elt); -if (!BaseLd->isSimple()) - return false; +static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) { + if (auto *BaseLd = dyn_cast(Elt)) { Ld = BaseLd; ByteOffset = 0; return true; - } + } else if (auto *BaseLd = dyn_cast(Elt)) +if (ISD::isNON_EXTLoad(Elt.getNode())) { + if (!BaseLd->isSimple()) +return false; + Ld = BaseLd; + ByteOffset = 0; + return true; +} switch (Elt.getOpcode()) { case ISD::BITCAST: @@ -7254,7 +7258,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, APInt ZeroMask = APInt::getZero(NumElems); APInt UndefMask = APInt::getZero(NumElems); - SmallVector Loads(NumElems, nullptr); + SmallVector Loads(NumElems, nullptr); SmallVector ByteOffsets(NumElems, 0); // For each element in the initializer, see if we've found a load, zero or an @@ -7304,7 +7308,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, EVT EltBaseVT = EltBase.getValueType(); assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() && "Register/Memory size mismatch"); - LoadSDNode *LDBase = Loads[FirstLoadedElt]; + MemSDNode *LDBase = Loads[FirstLoadedElt]; assert(LDBase && "Did not find base load for merging consecutive loads"); unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits(); unsigned BaseSizeInBytes = BaseSizeInBits / 8; @@ -7318,16 +7322,18 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, // Check to see if the element's load is consecutive to the base load // or offset from a previous (already checked) load. - auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) { -LoadSDNode *Ld = Loads[EltIdx]; + auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) { +MemSDNode *Ld = Loads[EltIdx]; int64_t ByteOffset = ByteOffsets[EltIdx]; if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) { int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes); return (0 <= BaseIdx && BaseIdx < (int)NumElems && LoadMask[BaseIdx] && Loads[Bas
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/125432 >From 684a54284458cae0b700737126715384b9fddab1 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 31 Jan 2025 13:12:56 -0500 Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic vector. After splitting, all elements are created. The two components must be found by looking at the upper and lower half of EXTRACT_ELEMENT. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8 --- llvm/include/llvm/CodeGen/SelectionDAG.h | 2 +- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- llvm/lib/Target/X86/X86ISelLowering.cpp | 65 ++-- llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++ 4 files changed, 65 insertions(+), 153 deletions(-) diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h index 87b6914f8a0ee..40550d96a5b3d 100644 --- a/llvm/include/llvm/CodeGen/SelectionDAG.h +++ b/llvm/include/llvm/CodeGen/SelectionDAG.h @@ -1873,7 +1873,7 @@ class SelectionDAG { /// chain to the token factor. This ensures that the new memory node will have /// the same relative memory dependency position as the old load. Returns the /// new merged load chain. - SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp); + SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp); /// Topological-sort the AllNodes list and a /// assign a unique node id for each node in the DAG based on their diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index bbf1b0fd590ef..d6e5cd1078776 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -12215,7 +12215,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain, return TokenFactor; } -SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, +SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp) { assert(isa(NewMemOp.getNode()) && "Expected a memop node"); SDValue OldChain = SDValue(OldLoad, 1); diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 3ab548f64d04c..409a8c7e73c0e 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -7193,15 +7193,19 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, MVT VT, const SDLoc &dl, } // Recurse to find a LoadSDNode source and the accumulated ByteOffest. -static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) { - if (ISD::isNON_EXTLoad(Elt.getNode())) { -auto *BaseLd = cast(Elt); -if (!BaseLd->isSimple()) - return false; +static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) { + if (auto *BaseLd = dyn_cast(Elt)) { Ld = BaseLd; ByteOffset = 0; return true; - } + } else if (auto *BaseLd = dyn_cast(Elt)) +if (ISD::isNON_EXTLoad(Elt.getNode())) { + if (!BaseLd->isSimple()) +return false; + Ld = BaseLd; + ByteOffset = 0; + return true; +} switch (Elt.getOpcode()) { case ISD::BITCAST: @@ -7254,7 +7258,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, APInt ZeroMask = APInt::getZero(NumElems); APInt UndefMask = APInt::getZero(NumElems); - SmallVector Loads(NumElems, nullptr); + SmallVector Loads(NumElems, nullptr); SmallVector ByteOffsets(NumElems, 0); // For each element in the initializer, see if we've found a load, zero or an @@ -7304,7 +7308,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, EVT EltBaseVT = EltBase.getValueType(); assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() && "Register/Memory size mismatch"); - LoadSDNode *LDBase = Loads[FirstLoadedElt]; + MemSDNode *LDBase = Loads[FirstLoadedElt]; assert(LDBase && "Did not find base load for merging consecutive loads"); unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits(); unsigned BaseSizeInBytes = BaseSizeInBits / 8; @@ -7318,16 +7322,18 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef Elts, // Check to see if the element's load is consecutive to the base load // or offset from a previous (already checked) load. - auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) { -LoadSDNode *Ld = Loads[EltIdx]; + auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) { +MemSDNode *Ld = Loads[EltIdx]; int64_t ByteOffset = ByteOffsets[EltIdx]; if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) { int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes); return (0 <= BaseIdx && BaseIdx < (int)NumElems && LoadMask[BaseIdx] && Loads[Bas
[llvm-branch-commits] [llvm] [SelectionDAG] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 730b40b39dfa3ed5d802bbb1270d49273a5de7fb Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 97 ++- llvm/test/CodeGen/X86/atomic-load-store.ll| 78 +++ 3 files changed, 153 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 8eee7a4c61fe6..f88b4d5693979 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -6014,6 +6017,74 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue coerceLoadedValue(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, + SDLoc dl, SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } + assert(FirstVT == WidenVT); + return LdOp; +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = + findMemType(DAG, TLI, LdWidth.getKnownMinValue(), WidenVT, /*LdAlign=*/0, + WidthDiff.getKnownMinValue()); + + if (!FirstVT) +return SDValue(); + + SmallVector MemVTs; + TypeSize FirstVTWidth = FirstVT->getSizeInBits(); + + SDValue LdOp = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, *FirstVT, *FirstVT, + Chain, BasePtr, LD->getMemOperand()); + + // Load the element with one instruction. + SDValue Result = coerceLoadedValue(LdOp, *FirstVT, WidenVT, LdWidth, + FirstVTWidth, dl, DAG); + + // Modified the chain - switch anything that used the old chain to use + // the new
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 717ea645df30178ab0873da4191d41bc7ba4b761 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 - llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 158 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..70f59eafc6ecb 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..eaa2ffd9b2731 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index f72970d12b6eb..d3027e799 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -382,6 +382,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LABEL: atomic_vec4_i8: ; CHECK3: ## %bb.0: @@ -405,6 +420,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ret <4 x i16> %ret } +define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_ptr270: +; CHECK: ## %b
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120640 >From 4fccbd69f8ee5b6f16b08da38cb65d989450c8aa Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 16:25:55 -0500 Subject: [PATCH] [SelectionDAG] Split vector types for atomic load Vector types that aren't widened are split so that a single ATOMIC_LOAD is issued for the entire vector at once. This change utilizes the load vectorization infrastructure in SelectionDAG in order to group the vectors. This enables SelectionDAG to translate vectors with type bfloat,half. commit-id:3a045357 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 37 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++ 3 files changed, 209 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index bdfa5f7741ad3..d8f402f529632 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi); void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi); + void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi); void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo, diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index f88b4d5693979..a3b30943c8e7d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SplitVecRes_STEP_VECTOR(N, Lo, Hi); break; case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break; + case ISD::ATOMIC_LOAD: +SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi); +break; case ISD::LOAD: SplitVecRes_LOAD(cast(N), Lo, Hi); break; @@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) { SetSplitVector(SDValue(N, ResNo), Lo, Hi); } +void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, + SDValue &Hi) { + assert(LD->getExtensionType() == ISD::NON_EXTLOAD && + "Extended load during type legalization!"); + SDLoc dl(LD); + EVT VT = LD->getValueType(0); + EVT LoVT, HiVT; + std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(VT); + + SDValue Ch = LD->getChain(); + SDValue Ptr = LD->getBasePtr(); + + EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits()); + EVT MemIntVT = + EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits()); + SDValue ALD = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, MemIntVT, IntVT, Ch, + Ptr, LD->getMemOperand()); + + EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits()); + EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits()); + SDValue ExtractLo = DAG.getNode(ISD::TRUNCATE, dl, LoIntVT, ALD); + SDValue ExtractHi = + DAG.getNode(ISD::SRL, dl, IntVT, ALD, + DAG.getIntPtrConstant(VT.getSizeInBits() / 2, dl)); + ExtractHi = DAG.getNode(ISD::TRUNCATE, dl, HiIntVT, ExtractHi); + + Lo = DAG.getBitcast(LoVT, ExtractLo); + Hi = DAG.getBitcast(HiVT, ExtractHi); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1)); +} + void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI, SDValue &Ptr, uint64_t *ScaledOffset) { diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 3cf9e3c1a8dfa..6e2e9d4b21891 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -205,6 +205,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) { ret <2 x float> %ret } +define <2 x half> @atomic_vec2_half(ptr %x) { +; CHECK3-LABEL: atomic_vec2_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:shrl $16, %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm1 +; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movl (%rdi), %eax +; CHECK0-NEXT:movl %eax, %ecx +; CHECK0-NEXT:shrl
[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)
https://github.com/orodley edited https://github.com/llvm/llvm-project/pull/139497 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/139194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print heatmap section scores in perf2bolt (PR #139194)
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/139194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print heatmap section scores in perf2bolt (PR #139194)
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/139194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120385 >From 192b17cf42a818acb1f10c2a81481e58b25ff238 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:37:17 -0500 Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load `load atomic <1 x T>` is not valid. This change legalizes vector types of atomic load via scalarization in SelectionDAG so that it can, for example, translate from `v1i32` to `i32`. commit-id:5c36cc8c --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 15 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +- 3 files changed, 135 insertions(+), 2 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 720393158aa5e..89ea7ef4dbe89 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); + SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N); SDValue ScalarizeVecRes_VSELECT(SDNode *N); SDValue ScalarizeVecRes_SELECT(SDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index d0b69b88748a9..8eee7a4c61fe6 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) { R = ScalarizeVecRes_UnaryOpWithExtraInput(N); break; case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +R = ScalarizeVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: R = ScalarizeVecRes_LOAD(cast(N));break; case ISD::SCALAR_TO_VECTOR: R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break; case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break; @@ -458,6 +461,18 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) { return Op; } +SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) { + SDValue Result = DAG.getAtomicLoad( + ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(), + N->getValueType(0).getVectorElementType(), N->getChain(), N->getBasePtr(), + N->getMemOperand()); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); + return Result; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) { assert(N->isUnindexed() && "Indexed vector load?"); diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 5bce4401f7bdb..d23cfb89f9fc8 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -1,6 +1,6 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,CHECK3 +; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | FileCheck %s --check-prefixes=CHECK,CHECK0 define void @test1(ptr %ptr, i32 %val1) { ; CHECK-LABEL: test1: @@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) { %val = load atomic i32, ptr %ptr seq_cst, align 4 ret i32 %val } + +define <1 x i32> @atomic_vec1_i32(ptr %x) { +; CHECK-LABEL: atomic_vec1_i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:movl (%rdi), %eax +; CHECK-NEXT:retq + %ret = load atomic <1 x i32>, ptr %x acquire, align 4 + ret <1 x i32> %ret +} + +define <1 x i8> @atomic_vec1_i8(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzbl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movb (%rdi), %al +; CHECK0-NEXT:retq + %ret = load atomic <1 x i8>, ptr %x acquire, align 1 + ret <1 x i8> %ret +} + +define <1 x i16> @atomic_vec1_i16(ptr %x) { +; CHECK3-LABEL: atomic_vec1_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %ax +; CHECK0-NEXT:retq + %ret = load atomic <1 x i16>, ptr %x acquire, align 2 + ret <1 x i16> %ret +} + +define <1 x i32> @atomic_vec1_i8_zext(ptr %x) { +; CHECK3-LABEL: atomic_ve
[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120387 >From d212710191a62be5ad7257f8825b71230d715041 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:40:32 -0500 Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes. Unaligned atomic vectors with size >1 are lowered to calls. Adding their tests separately here. commit-id:a06a5cc6 --- llvm/test/CodeGen/X86/atomic-load-store.ll | 253 + 1 file changed, 253 insertions(+) diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 6efcbb80c0ce6..39e9fdfa5e62b 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { ret <1 x i64> %ret } +define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_ptr: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_ptr: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} + define <1 x half> @atomic_vec1_half(ptr %x) { ; CHECK3-LABEL: atomic_vec1_half: ; CHECK3: ## %bb.0: @@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { %ret = load atomic <1 x double>, ptr %x acquire, align 8 ret <1 x double> %ret } + +define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_i64: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movq (%rsp), %rax +; CHECK3-NEXT:popq %rcx +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_i64: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq (%rsp), %rax +; CHECK0-NEXT:popq %rcx +; CHECK0-NEXT:retq + %ret = load atomic <1 x i64>, ptr %x acquire, align 4 + ret <1 x i64> %ret +} + +define <1 x double> @atomic_vec1_double(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec1_double: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_double: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 4 + ret <1 x double> %ret +} + +define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { +; CHECK3-LABEL: atomic_vec2_i32: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:pushq %rax +; CHECK3-NEXT:movq %rdi, %rsi +; CHECK3-NEXT:movq %rsp, %rdx +; CHECK3-NEXT:movl $8, %edi +; CHECK3-NEXT:movl $2, %ecx +; CHECK3-NEXT:callq ___atomic_load +; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK3-NEXT:popq %rax +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i32: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:pushq %rax +; CHECK0-NEXT:movq %rdi, %rsi +; CHECK0-NEXT:movl $8, %edi +; CHECK0-NEXT:movq %rsp, %rdx +; CHECK0-NEXT:movl $2, %ecx +; CHECK0-NEXT:callq ___atomic_load +; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero +; CHECK0-NEXT:popq %rax +; CHECK0-NEXT:retq + %ret = load atomic <2 x i32>, ptr %x acquire, align 4 + ret <2 x i32> %ret +} + +define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_float_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[
[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/138635 >From 6312f8c4dbc5272b5f2c741a46fe7623ace49bf8 Mon Sep 17 00:00:00 2001 From: jofernau_amdeng Date: Tue, 6 May 2025 01:48:11 -0400 Subject: [PATCH] [X86] Remove extra MOV after widening atomic load This change adds patterns to optimize out an extra MOV present after widening the atomic load. commit-id:45989503 --- llvm/lib/Target/X86/X86InstrCompiler.td| 7 llvm/test/CodeGen/X86/atomic-load-store.ll | 40 -- 2 files changed, 29 insertions(+), 18 deletions(-) diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td index efa1e8bd7f3e3..786d0567280f9 100644 --- a/llvm/lib/Target/X86/X86InstrCompiler.td +++ b/llvm/lib/Target/X86/X86InstrCompiler.td @@ -1204,6 +1204,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), (MOV16rm addr:$src)>; def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>; def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>; +def : Pat<(v4i32 (scalar_to_vector (i32 (zext (i16 (atomic_load_16 addr:$src)), + (MOVDI2PDIrm addr:$src)>; // load atomic <2 x i8> +def : Pat<(v4i32 (scalar_to_vector (i32 (atomic_load_32 addr:$src, + (MOVDI2PDIrm addr:$src)>; // load atomic <2 x i16> +def : Pat<(v2i64 (scalar_to_vector (i64 (atomic_load_64 addr:$src, + (MOV64toPQIrm addr:$src)>; // load atomic <2 x i32,float> + // Floating point loads/stores. def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst), (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>; diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index 9ee8b4fc5ac7f..3cf9e3c1a8dfa 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -165,11 +165,15 @@ define <2 x i8> @atomic_vec2_i8(ptr %x) { } define <2 x i16> @atomic_vec2_i16(ptr %x) { -; CHECK-LABEL: atomic_vec2_i16: -; CHECK: ## %bb.0: -; CHECK-NEXT:movl (%rdi), %eax -; CHECK-NEXT:movd %eax, %xmm0 -; CHECK-NEXT:retq +; CHECK3-LABEL: atomic_vec2_i16: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec2_i16: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK0-NEXT:retq %ret = load atomic <2 x i16>, ptr %x acquire, align 4 ret <2 x i16> %ret } @@ -177,8 +181,7 @@ define <2 x i16> @atomic_vec2_i16(ptr %x) { define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) { ; CHECK-LABEL: atomic_vec2_ptr270: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x ptr addrspace(270)>, ptr %x acquire, align 8 ret <2 x ptr addrspace(270)> %ret @@ -187,8 +190,7 @@ define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) { define <2 x i32> @atomic_vec2_i32_align(ptr %x) { ; CHECK-LABEL: atomic_vec2_i32_align: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x i32>, ptr %x acquire, align 8 ret <2 x i32> %ret @@ -197,8 +199,7 @@ define <2 x i32> @atomic_vec2_i32_align(ptr %x) { define <2 x float> @atomic_vec2_float_align(ptr %x) { ; CHECK-LABEL: atomic_vec2_float_align: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <2 x float>, ptr %x acquire, align 8 ret <2 x float> %ret @@ -354,11 +355,15 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { } define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { -; CHECK-LABEL: atomic_vec4_i8: -; CHECK: ## %bb.0: -; CHECK-NEXT:movl (%rdi), %eax -; CHECK-NEXT:movd %eax, %xmm0 -; CHECK-NEXT:retq +; CHECK3-LABEL: atomic_vec4_i8: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec4_i8: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK0-NEXT:retq %ret = load atomic <4 x i8>, ptr %x acquire, align 4 ret <4 x i8> %ret } @@ -366,8 +371,7 @@ define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ; CHECK-LABEL: atomic_vec4_i16: ; CHECK: ## %bb.0: -; CHECK-NEXT:movq (%rdi), %rax -; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:movq (%rdi), %xmm0 ; CHECK-NEXT:retq %ret = load atomic <4 x i16>, ptr %x acquire, align 8 ret <4 x i16> %ret ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bi
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120386 >From ce52d5295249681faf782a15ebe56599152e8491 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:38:23 -0500 Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG When lowering atomic <1 x T> vector types with floats, selection can fail since this pattern is unsupported. To support this, floats can be casted to an integer type of the same size. commit-id:f9d761c5 --- llvm/lib/Target/X86/X86ISelLowering.cpp| 4 +++ llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++ 2 files changed, 41 insertions(+) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index ac4fb157a6026..3ab548f64d04c 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2653,6 +2653,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(Op, MVT::f32, Promote); } + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64); + // We have target-specific dag combine patterns for the following nodes: setTargetDAGCombine({ISD::VECTOR_SHUFFLE, ISD::SCALAR_TO_VECTOR, diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index d23cfb89f9fc8..6efcbb80c0ce6 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { %ret = load atomic <1 x i64>, ptr %x acquire, align 8 ret <1 x i64> %ret } + +define <1 x half> @atomic_vec1_half(ptr %x) { +; CHECK3-LABEL: atomic_vec1_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %cx +; CHECK0-NEXT:## implicit-def: $eax +; CHECK0-NEXT:movw %cx, %ax +; CHECK0-NEXT:## implicit-def: $xmm0 +; CHECK0-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK0-NEXT:retq + %ret = load atomic <1 x half>, ptr %x acquire, align 2 + ret <1 x half> %ret +} + +define <1 x float> @atomic_vec1_float(ptr %x) { +; CHECK-LABEL: atomic_vec1_float: +; CHECK: ## %bb.0: +; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT:retq + %ret = load atomic <1 x float>, ptr %x acquire, align 4 + ret <1 x float> %ret +} + +define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec1_double_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 8 + ret <1 x double> %ret +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120386 >From ce52d5295249681faf782a15ebe56599152e8491 Mon Sep 17 00:00:00 2001 From: jofrn Date: Wed, 18 Dec 2024 03:38:23 -0500 Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG When lowering atomic <1 x T> vector types with floats, selection can fail since this pattern is unsupported. To support this, floats can be casted to an integer type of the same size. commit-id:f9d761c5 --- llvm/lib/Target/X86/X86ISelLowering.cpp| 4 +++ llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++ 2 files changed, 41 insertions(+) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index ac4fb157a6026..3ab548f64d04c 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2653,6 +2653,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, setOperationAction(Op, MVT::f32, Promote); } + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32); + setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64); + // We have target-specific dag combine patterns for the following nodes: setTargetDAGCombine({ISD::VECTOR_SHUFFLE, ISD::SCALAR_TO_VECTOR, diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index d23cfb89f9fc8..6efcbb80c0ce6 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind { %ret = load atomic <1 x i64>, ptr %x acquire, align 8 ret <1 x i64> %ret } + +define <1 x half> @atomic_vec1_half(ptr %x) { +; CHECK3-LABEL: atomic_vec1_half: +; CHECK3: ## %bb.0: +; CHECK3-NEXT:movzwl (%rdi), %eax +; CHECK3-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK3-NEXT:retq +; +; CHECK0-LABEL: atomic_vec1_half: +; CHECK0: ## %bb.0: +; CHECK0-NEXT:movw (%rdi), %cx +; CHECK0-NEXT:## implicit-def: $eax +; CHECK0-NEXT:movw %cx, %ax +; CHECK0-NEXT:## implicit-def: $xmm0 +; CHECK0-NEXT:pinsrw $0, %eax, %xmm0 +; CHECK0-NEXT:retq + %ret = load atomic <1 x half>, ptr %x acquire, align 2 + ret <1 x half> %ret +} + +define <1 x float> @atomic_vec1_float(ptr %x) { +; CHECK-LABEL: atomic_vec1_float: +; CHECK: ## %bb.0: +; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT:retq + %ret = load atomic <1 x float>, ptr %x acquire, align 4 + ret <1 x float> %ret +} + +define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec1_double_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT:retq + %ret = load atomic <1 x double>, ptr %x acquire, align 8 + ret <1 x double> %ret +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120716 >From 717ea645df30178ab0873da4191d41bc7ba4b761 Mon Sep 17 00:00:00 2001 From: jofrn Date: Fri, 20 Dec 2024 06:14:28 -0500 Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector AtomicExpand fails for aligned `load atomic ` because it does not find a compatible library call. This change adds appropriate bitcasts so that the call can be lowered. commit-id:f430c1af --- llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 - llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++ llvm/test/CodeGen/X86/atomic-load-store.ll| 30 + .../X86/expand-atomic-non-integer.ll | 65 +++ 4 files changed, 158 insertions(+), 3 deletions(-) diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index c376de877ac7d..70f59eafc6ecb 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall( I->replaceAllUsesWith(V); } else if (HasResult) { Value *V; -if (UseSizedLibcall) - V = Builder.CreateBitOrPointerCast(Result, I->getType()); -else { +if (UseSizedLibcall) { + // Add bitcasts from Result's scalar type to I's vector type + auto *PtrTy = dyn_cast(I->getType()->getScalarType()); + auto *VTy = dyn_cast(I->getType()); + if (VTy && PtrTy && !Result->getType()->isVectorTy()) { +unsigned AS = PtrTy->getAddressSpace(); +Value *BC = Builder.CreateBitCast( +Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS))); +V = Builder.CreateIntToPtr(BC, I->getType()); + } else +V = Builder.CreateBitOrPointerCast(Result, I->getType()); +} else { V = Builder.CreateAlignedLoad(I->getType(), AllocaResult, AllocaAlignment); Builder.CreateLifetimeEnd(AllocaResult, SizeVal64); diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll b/llvm/test/CodeGen/ARM/atomic-load-store.ll index 560dfde356c29..eaa2ffd9b2731 100644 --- a/llvm/test/CodeGen/ARM/atomic-load-store.ll +++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll @@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double %val1) { store atomic double %val1, ptr %ptr seq_cst, align 8 ret void } + +define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 { +; ARM-LABEL: atomic_vec1_ptr: +; ARM: @ %bb.0: +; ARM-NEXT:ldr r0, [r0] +; ARM-NEXT:dmb ish +; ARM-NEXT:bx lr +; +; ARMOPTNONE-LABEL: atomic_vec1_ptr: +; ARMOPTNONE: @ %bb.0: +; ARMOPTNONE-NEXT:ldr r0, [r0] +; ARMOPTNONE-NEXT:dmb ish +; ARMOPTNONE-NEXT:bx lr +; +; THUMBTWO-LABEL: atomic_vec1_ptr: +; THUMBTWO: @ %bb.0: +; THUMBTWO-NEXT:ldr r0, [r0] +; THUMBTWO-NEXT:dmb ish +; THUMBTWO-NEXT:bx lr +; +; THUMBONE-LABEL: atomic_vec1_ptr: +; THUMBONE: @ %bb.0: +; THUMBONE-NEXT:push {r7, lr} +; THUMBONE-NEXT:movs r1, #0 +; THUMBONE-NEXT:mov r2, r1 +; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4 +; THUMBONE-NEXT:pop {r7, pc} +; +; ARMV4-LABEL: atomic_vec1_ptr: +; ARMV4: @ %bb.0: +; ARMV4-NEXT:push {r11, lr} +; ARMV4-NEXT:mov r1, #2 +; ARMV4-NEXT:bl __atomic_load_4 +; ARMV4-NEXT:pop {r11, lr} +; ARMV4-NEXT:mov pc, lr +; +; ARMV6-LABEL: atomic_vec1_ptr: +; ARMV6: @ %bb.0: +; ARMV6-NEXT:ldr r0, [r0] +; ARMV6-NEXT:mov r1, #0 +; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5 +; ARMV6-NEXT:bx lr +; +; THUMBM-LABEL: atomic_vec1_ptr: +; THUMBM: @ %bb.0: +; THUMBM-NEXT:ldr r0, [r0] +; THUMBM-NEXT:dmb sy +; THUMBM-NEXT:bx lr + %ret = load atomic <1 x ptr>, ptr %x acquire, align 4 + ret <1 x ptr> %ret +} diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll index f72970d12b6eb..d3027e799 100644 --- a/llvm/test/CodeGen/X86/atomic-load-store.ll +++ b/llvm/test/CodeGen/X86/atomic-load-store.ll @@ -382,6 +382,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind { ret <2 x i32> %ret } +define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec2_ptr_align: +; CHECK: ## %bb.0: +; CHECK-NEXT:pushq %rax +; CHECK-NEXT:movl $2, %esi +; CHECK-NEXT:callq ___atomic_load_16 +; CHECK-NEXT:movq %rdx, %xmm1 +; CHECK-NEXT:movq %rax, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:popq %rax +; CHECK-NEXT:retq + %ret = load atomic <2 x ptr>, ptr %x acquire, align 16 + ret <2 x ptr> %ret +} + define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind { ; CHECK3-LABEL: atomic_vec4_i8: ; CHECK3: ## %bb.0: @@ -405,6 +420,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind { ret <4 x i16> %ret } +define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind { +; CHECK-LABEL: atomic_vec4_ptr270: +; CHECK: ## %b
[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)
llvmbot wrote: @llvm/pr-subscribers-backend-risc-v Author: Iris Shi (el-ev) Changes --- Patch is 24.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139495.diff 14 Files Affected: - (modified) llvm/lib/Target/RISCV/RISCVInstrInfoQ.td (+61-39) - (modified) llvm/lib/Target/RISCV/RISCVSchedGenericOOO.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedRocket.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedSiFive7.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedSiFiveP400.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedSiFiveP500.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR345.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR7.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedTTAscalonD8.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedXiangShanNanHu.td (+1) - (modified) llvm/lib/Target/RISCV/RISCVSchedule.td (+85-3) ``diff diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td index 7d216b5dd87c0..8cc965ccc515d 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td @@ -25,97 +25,119 @@ defvar QExtsRV64 = [QExt]; //===--===// let Predicates = [HasStdExtQ] in { - // def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>; - let hasSideEffects = 0, mayLoad = 1, mayStore = 0 in - def FLQ : RVInstI<0b100, OPC_LOAD_FP, (outs FPR128:$rd), - (ins GPRMem:$rs1, simm12:$imm12), - "flq", "$rd, ${imm12}(${rs1})">; + def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>; + // Operands for stores are in the order srcreg, base, offset rather than // reflecting the order these fields are specified in the instruction // encoding. - // def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>; - let hasSideEffects = 0, mayLoad = 0, mayStore = 1 in - def FSQ : RVInstS<0b100, OPC_STORE_FP, (outs), - (ins FPR128:$rs2, GPRMem:$rs1, simm12:$imm12), - "fsq", "$rs2, ${imm12}(${rs1})">; + def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>; } // Predicates = [HasStdExtQ] foreach Ext = QExts in { - defm FMADD_Q : FPFMA_rrr_frm_m; - defm FMSUB_Q : FPFMA_rrr_frm_m; - defm FNMSUB_Q : FPFMA_rrr_frm_m; - defm FNMADD_Q : FPFMA_rrr_frm_m; + let SchedRW = [WriteFMA128, ReadFMA128, ReadFMA128, ReadFMA128Addend] in { +defm FMADD_Q : FPFMA_rrr_frm_m; +defm FMSUB_Q : FPFMA_rrr_frm_m; +defm FNMSUB_Q : FPFMA_rrr_frm_m; +defm FNMADD_Q : FPFMA_rrr_frm_m; + } - defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>; - defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>; + let SchedRW = [WriteFAdd128, ReadFAdd128, ReadFAdd128] in { +defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>; +defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>; + } + let SchedRW = [WriteFMul128, ReadFMul128, ReadFMul128] in defm FMUL_Q : FPALU_rr_frm_m<0b0001011, "fmul.q", Ext>; + let SchedRW = [WriteFDiv128, ReadFDiv128, ReadFDiv128] in defm FDIV_Q : FPALU_rr_frm_m<0b000, "fdiv.q", Ext>; defm FSQRT_Q : FPUnaryOp_r_frm_m<0b010, 0b0, Ext, Ext.PrimaryTy, - Ext.PrimaryTy, "fsqrt.q">; + Ext.PrimaryTy, "fsqrt.q">, + Sched<[WriteFSqrt128, ReadFSqrt128]>; - let mayRaiseFPException = 0 in { + let SchedRW = [WriteFSGNJ128, ReadFSGNJ128, ReadFSGNJ128], + mayRaiseFPException = 0 in { defm FSGNJ_Q : FPALU_rr_m<0b0010011, 0b000, "fsgnj.q", Ext>; defm FSGNJN_Q : FPALU_rr_m<0b0010011, 0b001, "fsgnjn.q", Ext>; defm FSGNJX_Q : FPALU_rr_m<0b0010011, 0b010, "fsgnjx.q", Ext>; } - defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>; - defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>; + let SchedRW = [WriteFMinMax128, ReadFMinMax128, ReadFMinMax128] in { +defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>; +defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>; + } defm FCVT_S_Q : FPUnaryOp_r_frm_m<0b010, 0b00011, Ext, Ext.F32Ty, -Ext.PrimaryTy, "fcvt.s.q">; +Ext.PrimaryTy, "fcvt.s.q">, + Sched<[WriteFCvtF128ToF32, ReadFCvtF128ToF32]>; defm FCVT_Q_S : FPUnaryOp_r_frmlegacy_m<0b0100011, 0b0, Ext, - Ext.PrimaryTy, Ext.F32Ty, "fcvt.q.s">; + Ext.PrimaryTy, Ext.F32Ty, + "fcvt.q.s">, + Sched<[WriteFCvtF32ToF128, Read
[llvm-branch-commits] [llvm] [SelectionDAG] Widen <2 x T> vector types for atomic load (PR #120598)
https://github.com/jofrn updated https://github.com/llvm/llvm-project/pull/120598 >From 730b40b39dfa3ed5d802bbb1270d49273a5de7fb Mon Sep 17 00:00:00 2001 From: jofrn Date: Thu, 19 Dec 2024 11:19:39 -0500 Subject: [PATCH] [SelectionDAG] Widen <2 x T> vector types for atomic load Vector types of 2 elements must be widened. This change does this for vector types of atomic load in SelectionDAG so that it can translate aligned vectors of >1 size. commit-id:2894ccd1 --- llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 1 + .../SelectionDAG/LegalizeVectorTypes.cpp | 97 ++- llvm/test/CodeGen/X86/atomic-load-store.ll| 78 +++ 3 files changed, 153 insertions(+), 23 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h index 89ea7ef4dbe89..bdfa5f7741ad3 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer { SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N); SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N); SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N); + SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N); SDValue WidenVecRes_LOAD(SDNode* N); SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N); SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N); diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp index 8eee7a4c61fe6..f88b4d5693979 100644 --- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) { break; case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break; case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break; + case ISD::ATOMIC_LOAD: +Res = WidenVecRes_ATOMIC_LOAD(cast(N)); +break; case ISD::LOAD: Res = WidenVecRes_LOAD(N); break; case ISD::STEP_VECTOR: case ISD::SPLAT_VECTOR: @@ -6014,6 +6017,74 @@ SDValue DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) { N->getOperand(1), N->getOperand(2)); } +/// Either return the same load or provide appropriate casts +/// from the load and return that. +static SDValue coerceLoadedValue(SDValue LdOp, EVT FirstVT, EVT WidenVT, + TypeSize LdWidth, TypeSize FirstVTWidth, + SDLoc dl, SelectionDAG &DAG) { + assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth)); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + if (!FirstVT.isVector()) { +unsigned NumElts = +WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue(); +EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts); +SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp); +return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp); + } + assert(FirstVT == WidenVT); + return LdOp; +} + +static std::optional findMemType(SelectionDAG &DAG, + const TargetLowering &TLI, unsigned Width, + EVT WidenVT, unsigned Align, + unsigned WidenEx); + +SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) { + EVT WidenVT = + TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0)); + EVT LdVT = LD->getMemoryVT(); + SDLoc dl(LD); + assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors"); + assert(LdVT.isScalableVector() == WidenVT.isScalableVector() && + "Must be scalable"); + assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() && + "Expected equivalent element types"); + + // Load information + SDValue Chain = LD->getChain(); + SDValue BasePtr = LD->getBasePtr(); + MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags(); + AAMDNodes AAInfo = LD->getAAInfo(); + + TypeSize LdWidth = LdVT.getSizeInBits(); + TypeSize WidenWidth = WidenVT.getSizeInBits(); + TypeSize WidthDiff = WidenWidth - LdWidth; + + // Find the vector type that can load from. + std::optional FirstVT = + findMemType(DAG, TLI, LdWidth.getKnownMinValue(), WidenVT, /*LdAlign=*/0, + WidthDiff.getKnownMinValue()); + + if (!FirstVT) +return SDValue(); + + SmallVector MemVTs; + TypeSize FirstVTWidth = FirstVT->getSizeInBits(); + + SDValue LdOp = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, *FirstVT, *FirstVT, + Chain, BasePtr, LD->getMemOperand()); + + // Load the element with one instruction. + SDValue Result = coerceLoadedValue(LdOp, *FirstVT, WidenVT, LdWidth, + FirstVTWidth, dl, DAG); + + // Modified the chain - switch anything that used the old chain to use + // the new
[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)
https://github.com/el-ev created https://github.com/llvm/llvm-project/pull/139495 None >From 55a551de62d325a8e5e23c503f81abe89aead549 Mon Sep 17 00:00:00 2001 From: Iris Shi <0...@owo.li> Date: Mon, 12 May 2025 13:32:41 +0800 Subject: [PATCH] [RISCV][Scheduler] Add scheduler definitions for the Q extension --- llvm/lib/Target/RISCV/RISCVInstrInfoQ.td | 100 +++--- llvm/lib/Target/RISCV/RISCVSchedGenericOOO.td | 1 + llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td | 1 + llvm/lib/Target/RISCV/RISCVSchedRocket.td | 1 + llvm/lib/Target/RISCV/RISCVSchedSiFive7.td| 1 + llvm/lib/Target/RISCV/RISCVSchedSiFiveP400.td | 1 + llvm/lib/Target/RISCV/RISCVSchedSiFiveP500.td | 1 + llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td | 1 + .../lib/Target/RISCV/RISCVSchedSpacemitX60.td | 1 + .../Target/RISCV/RISCVSchedSyntacoreSCR345.td | 1 + .../Target/RISCV/RISCVSchedSyntacoreSCR7.td | 1 + .../lib/Target/RISCV/RISCVSchedTTAscalonD8.td | 1 + .../Target/RISCV/RISCVSchedXiangShanNanHu.td | 1 + llvm/lib/Target/RISCV/RISCVSchedule.td| 88 ++- 14 files changed, 158 insertions(+), 42 deletions(-) diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td index 7d216b5dd87c0..8cc965ccc515d 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td @@ -25,97 +25,119 @@ defvar QExtsRV64 = [QExt]; //===--===// let Predicates = [HasStdExtQ] in { - // def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>; - let hasSideEffects = 0, mayLoad = 1, mayStore = 0 in - def FLQ : RVInstI<0b100, OPC_LOAD_FP, (outs FPR128:$rd), - (ins GPRMem:$rs1, simm12:$imm12), - "flq", "$rd, ${imm12}(${rs1})">; + def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>; + // Operands for stores are in the order srcreg, base, offset rather than // reflecting the order these fields are specified in the instruction // encoding. - // def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>; - let hasSideEffects = 0, mayLoad = 0, mayStore = 1 in - def FSQ : RVInstS<0b100, OPC_STORE_FP, (outs), - (ins FPR128:$rs2, GPRMem:$rs1, simm12:$imm12), - "fsq", "$rs2, ${imm12}(${rs1})">; + def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>; } // Predicates = [HasStdExtQ] foreach Ext = QExts in { - defm FMADD_Q : FPFMA_rrr_frm_m; - defm FMSUB_Q : FPFMA_rrr_frm_m; - defm FNMSUB_Q : FPFMA_rrr_frm_m; - defm FNMADD_Q : FPFMA_rrr_frm_m; + let SchedRW = [WriteFMA128, ReadFMA128, ReadFMA128, ReadFMA128Addend] in { +defm FMADD_Q : FPFMA_rrr_frm_m; +defm FMSUB_Q : FPFMA_rrr_frm_m; +defm FNMSUB_Q : FPFMA_rrr_frm_m; +defm FNMADD_Q : FPFMA_rrr_frm_m; + } - defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>; - defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>; + let SchedRW = [WriteFAdd128, ReadFAdd128, ReadFAdd128] in { +defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>; +defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>; + } + let SchedRW = [WriteFMul128, ReadFMul128, ReadFMul128] in defm FMUL_Q : FPALU_rr_frm_m<0b0001011, "fmul.q", Ext>; + let SchedRW = [WriteFDiv128, ReadFDiv128, ReadFDiv128] in defm FDIV_Q : FPALU_rr_frm_m<0b000, "fdiv.q", Ext>; defm FSQRT_Q : FPUnaryOp_r_frm_m<0b010, 0b0, Ext, Ext.PrimaryTy, - Ext.PrimaryTy, "fsqrt.q">; + Ext.PrimaryTy, "fsqrt.q">, + Sched<[WriteFSqrt128, ReadFSqrt128]>; - let mayRaiseFPException = 0 in { + let SchedRW = [WriteFSGNJ128, ReadFSGNJ128, ReadFSGNJ128], + mayRaiseFPException = 0 in { defm FSGNJ_Q : FPALU_rr_m<0b0010011, 0b000, "fsgnj.q", Ext>; defm FSGNJN_Q : FPALU_rr_m<0b0010011, 0b001, "fsgnjn.q", Ext>; defm FSGNJX_Q : FPALU_rr_m<0b0010011, 0b010, "fsgnjx.q", Ext>; } - defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>; - defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>; + let SchedRW = [WriteFMinMax128, ReadFMinMax128, ReadFMinMax128] in { +defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>; +defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>; + } defm FCVT_S_Q : FPUnaryOp_r_frm_m<0b010, 0b00011, Ext, Ext.F32Ty, -Ext.PrimaryTy, "fcvt.s.q">; +Ext.PrimaryTy, "fcvt.s.q">, + Sched<[WriteFCvtF128ToF32, ReadFCvtF128ToF32]>; defm FCVT_Q_S : FPUnaryOp_r_frmlegacy_m<0b0100011, 0b0, Ext, - Ext.PrimaryTy, Ext.F32Ty, "fcvt.q.s">; + Ext.PrimaryTy, Ext.F32Ty, + "fcvt.q.s">, + Sched<[WriteFCvtF32ToF128,
[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)
el-ev wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/139495?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#139495** https://app.graphite.dev/github/pr/llvm/llvm-project/139495?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/139495?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#139369** https://app.graphite.dev/github/pr/llvm/llvm-project/139369?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/139495 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)
https://github.com/el-ev ready_for_review https://github.com/llvm/llvm-project/pull/139495 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)
https://github.com/orodley created https://github.com/llvm/llvm-project/pull/139497 None >From bfb6cb21243f043ea1edf6f00cf27d08549066dc Mon Sep 17 00:00:00 2001 From: Owen Rodley Date: Mon, 12 May 2025 15:50:22 +1000 Subject: [PATCH] Add a GUIDLIST table to bitcode --- llvm/include/llvm/Bitcode/LLVMBitCodes.h | 3 +++ llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 11 +++--- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 25 +++ 3 files changed, 36 insertions(+), 3 deletions(-) diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index 92b6e68d9d0a7..8acba6477c4a1 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -120,6 +120,9 @@ enum ModuleCodes { // IFUNC: [ifunc value type, addrspace, resolver val#, linkage, visibility] MODULE_CODE_IFUNC = 18, + + // GUIDLIST: [n x i64] + MODULE_CODE_GUIDLIST = 19, }; /// PARAMATTR blocks have code for defining a parameter attribute set. diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 1d7aa189026a5..6d36b007956a0 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -980,6 +980,9 @@ class ModuleSummaryIndexBitcodeReader : public BitcodeReaderBase { /// the CallStackRadixTreeBuilder class in ProfileData/MemProf.h for format. std::vector RadixArray; + // A table which maps ValueID to the GUID for that value. + std::vector DefinedGUIDs; + public: ModuleSummaryIndexBitcodeReader( BitstreamCursor Stream, StringRef Strtab, ModuleSummaryIndex &TheIndex, @@ -7164,9 +7167,7 @@ ModuleSummaryIndexBitcodeReader::getValueInfoFromValueId(unsigned ValueId) { void ModuleSummaryIndexBitcodeReader::setValueGUID( uint64_t ValueID, StringRef ValueName, GlobalValue::LinkageTypes Linkage, StringRef SourceFileName) { - std::string GlobalId = - GlobalValue::getGlobalIdentifier(ValueName, Linkage, SourceFileName); - auto ValueGUID = GlobalValue::getGUIDAssumingExternalLinkage(GlobalId); + auto ValueGUID = DefinedGUIDs[ValueID]; auto OriginalNameID = ValueGUID; if (GlobalValue::isLocalLinkage(Linkage)) OriginalNameID = GlobalValue::getGUIDAssumingExternalLinkage(ValueName); @@ -7389,6 +7390,10 @@ Error ModuleSummaryIndexBitcodeReader::parseModule() { // was historically always the start of the regular bitcode header. VSTOffset = Record[0] - 1; break; +// MODULE_CODE_GUIDLIST: [i64 x N] +case bitc::MODULE_CODE_GUIDLIST: + llvm::append_range(DefinedGUIDs, Record); + break; // v1 GLOBALVAR: [pointer type, isconst, initid, linkage, ...] // v1 FUNCTION: [type, callingconv, isproto, linkage, ...] // v1 ALIAS: [alias type, addrspace, aliasee val#, linkage, ...] diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp index 73bed85c65b3d..3e19220d1bde7 100644 --- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp +++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp @@ -227,6 +227,7 @@ class ModuleBitcodeWriterBase : public BitcodeWriterBase { protected: void writePerModuleGlobalValueSummary(); + void writeGUIDList(); private: void writePerModuleFunctionSummaryRecord( @@ -1560,6 +1561,8 @@ void ModuleBitcodeWriter::writeModuleInfo() { Vals.clear(); } + writeGUIDList(); + // Emit the global variable information. for (const GlobalVariable &GV : M.globals()) { unsigned AbbrevToUse = 0; @@ -4755,6 +4758,26 @@ void ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() { Stream.ExitBlock(); } +void ModuleBitcodeWriterBase::writeGUIDList() { + std::vector GUIDs; + GUIDs.reserve(M.global_size() + M.size() + M.alias_size()); + + for (const GlobalValue &GV : M.global_objects()) { +if (GV.isDeclaration()) { + GUIDs.push_back( + GlobalValue::getGUIDAssumingExternalLinkage(GV.getName())); +} else { + GUIDs.push_back(GV.getGUID()); +} + } + for (const GlobalAlias &GA : M.aliases()) { +// Equivalent to the above loop, as GlobalAliases are always definitions. +GUIDs.push_back(GA.getGUID()); + } + + Stream.EmitRecord(bitc::MODULE_CODE_GUIDLIST, GUIDs); +} + /// Emit the combined summary section into the combined index file. void IndexBitcodeWriter::writeCombinedGlobalValueSummary() { Stream.EnterSubblock(bitc::GLOBALVAL_SUMMARY_BLOCK_ID, 4); @@ -5538,6 +5561,8 @@ void ThinLinkBitcodeWriter::writeSimplifiedModuleInfo() { Vals.clear(); } + writeGUIDList(); + // Emit the global variable information. for (const GlobalVariable &GV : M.globals()) { // GLOBALVAR: [strtab offset, strtab size, 0, 0, 0, linkage] ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin
[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)
orodley wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/139497?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#139497** https://app.graphite.dev/github/pr/llvm/llvm-project/139497?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/139497?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#133682** https://app.graphite.dev/github/pr/llvm/llvm-project/133682?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#129644** https://app.graphite.dev/github/pr/llvm/llvm-project/129644?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/139497 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)
https://github.com/orodley edited https://github.com/llvm/llvm-project/pull/139497 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/139194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/139194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Drop perf2bolt cold samples diagnostic (PR #139337)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/139337 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/139194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/139194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for UA 2007 instructions (PR #138401)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138401 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS][NFC] Rename CBCOND -> CPBCOND (PR #138402)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138402 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS][NFC] Rename CBCOND -> CPBCOND (PR #138402)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138402 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138403 Rate limit · GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support — https://githubstatus.com";>GitHub Status — https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138403 Rate limit · GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support — https://githubstatus.com";>GitHub Status — https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for ASHR (PR #139503)
https://github.com/davemgreen created https://github.com/llvm/llvm-project/pull/139503 None Rate limit · GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol; font-size: 14px; line-height: 1.5; margin: 0; } .container { margin: 50px auto; max-width: 600px; text-align: center; padding: 0 24px; } a { color: #0366d6; text-decoration: none; } a:hover { text-decoration: underline; } h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; text-shadow: 0 1px 0 #fff; } p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; } ul { list-style: none; margin: 25px 0; padding: 0; } li { display: table-cell; font-weight: bold; width: 1%; } .logo { display: inline-block; margin-top: 35px; } .logo-img-2x { display: none; } @media only screen and (-webkit-min-device-pixel-ratio: 2), only screen and ( min--moz-device-pixel-ratio: 2), only screen and ( -o-min-device-pixel-ratio: 2/1), only screen and (min-device-pixel-ratio: 2), only screen and (min-resolution: 192dpi), only screen and (min-resolution: 2dppx) { .logo-img-1x { display: none; } .logo-img-2x { display: inline-block; } } #suggestions { margin-top: 35px; color: #ccc; } #suggestions a { color: #66; font-weight: 200; font-size: 14px; margin: 0 10px; } Whoa there! You have exceeded a secondary rate limit. Please wait a few minutes before you try again; in some cases this may take up to an hour. https://support.github.com/contact";>Contact Support — https://githubstatus.com";>GitHub Status — https://twitter.com/githubstatus";>@githubstatus ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/139451 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for ASHR (PR #139503)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/139503.diff 4 Files Affected: - (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+10) - (modified) llvm/test/CodeGen/AArch64/aarch64-smull.ll (+12-55) - (modified) llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir (+1-2) - (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-2) ``diff diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp index 21990be21bbf7..41e36e1e6640b 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp @@ -864,6 +864,16 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, return TyBits - 1; // Every always-zero bit is a sign bit. break; } + case TargetOpcode::G_ASHR: { +Register Src1 = MI.getOperand(1).getReg(); +Register Src2 = MI.getOperand(2).getReg(); +LLT SrcTy = MRI.getType(Src1); +FirstAnswer = computeNumSignBits(Src1, DemandedElts, Depth + 1); +if (auto C = getIConstantSplatVal(Src2, MRI)) + FirstAnswer = std::max(FirstAnswer + C->getZExtValue(), + SrcTy.getScalarSizeInBits()); +break; + } case TargetOpcode::G_INTRINSIC: case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS: case TargetOpcode::G_INTRINSIC_CONVERGENT: diff --git a/llvm/test/CodeGen/AArch64/aarch64-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-smull.ll index 951001c84aed0..591bc65bf3226 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-smull.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-smull.ll @@ -2265,33 +2265,12 @@ define <2 x i64> @lsr_const(<2 x i64> %a, <2 x i64> %b) { } define <2 x i64> @asr(<2 x i64> %a, <2 x i64> %b) { -; CHECK-NEON-LABEL: asr: -; CHECK-NEON: // %bb.0: -; CHECK-NEON-NEXT:shrn v0.2s, v0.2d, #32 -; CHECK-NEON-NEXT:shrn v1.2s, v1.2d, #32 -; CHECK-NEON-NEXT:smull v0.2d, v0.2s, v1.2s -; CHECK-NEON-NEXT:ret -; -; CHECK-SVE-LABEL: asr: -; CHECK-SVE: // %bb.0: -; CHECK-SVE-NEXT:shrn v0.2s, v0.2d, #32 -; CHECK-SVE-NEXT:shrn v1.2s, v1.2d, #32 -; CHECK-SVE-NEXT:smull v0.2d, v0.2s, v1.2s -; CHECK-SVE-NEXT:ret -; -; CHECK-GI-LABEL: asr: -; CHECK-GI: // %bb.0: -; CHECK-GI-NEXT:sshr v0.2d, v0.2d, #32 -; CHECK-GI-NEXT:sshr v1.2d, v1.2d, #32 -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 -; CHECK-GI-NEXT:ret +; CHECK-LABEL: asr: +; CHECK: // %bb.0: +; CHECK-NEXT:shrn v0.2s, v0.2d, #32 +; CHECK-NEXT:shrn v1.2s, v1.2d, #32 +; CHECK-NEXT:smull v0.2d, v0.2s, v1.2s +; CHECK-NEXT:ret %x = ashr <2 x i64> %a, %y = ashr <2 x i64> %b, %z = mul nsw <2 x i64> %x, %y @@ -2299,34 +2278,12 @@ define <2 x i64> @asr(<2 x i64> %a, <2 x i64> %b) { } define <2 x i64> @asr_const(<2 x i64> %a, <2 x i64> %b) { -; CHECK-NEON-LABEL: asr_const: -; CHECK-NEON: // %bb.0: -; CHECK-NEON-NEXT:movi v1.2s, #31 -; CHECK-NEON-NEXT:shrn v0.2s, v0.2d, #32 -; CHECK-NEON-NEXT:smull v0.2d, v0.2s, v1.2s -; CHECK-NEON-NEXT:ret -; -; CHECK-SVE-LABEL: asr_const: -; CHECK-SVE: // %bb.0: -; CHECK-SVE-NEXT:movi v1.2s, #31 -; CHECK-SVE-NEXT:shrn v0.2s, v0.2d, #32 -; CHECK-SVE-NEXT:smull v0.2d, v0.2s, v1.2s -; CHECK-SVE-NEXT:ret -; -; CHECK-GI-LABEL: asr_const: -; CHECK-GI: // %bb.0: -; CHECK-GI-NEXT:adrp x8, .LCPI81_0 -; CHECK-GI-NEXT:sshr v0.2d, v0.2d, #32 -; CHECK-GI-NEXT:ldr q1, [x8, :lo12:.LCPI81_0] -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 -; CHECK-GI-NEXT:ret +; CHECK-LABEL: asr_const: +; CHECK: // %bb.0: +; CHECK-NEXT:movi v1.2s, #31 +; CHECK-NEXT:shrn v0.2s, v0.2d, #32 +; CHECK-NEXT:smull v0.2d, v0.2s, v1.2s +; CHECK-NEXT:ret %x = ashr <2 x i64> %a, %z = mul nsw <2 x i64> %x, ret <2 x i64> %z diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir index 78a2227b84a3a..a7c1c6355bff6 100644 --- a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir +++ b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir @@ -88,8 +88,7 @@ body: | ; RV64I-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[ASSERT_SEXT]], [[ASHR]] ; RV64I-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s64) = G_SEXT_INREG [[ADD]], 32 ; R
[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/139451 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR (PR #139505)
llvmbot wrote: @llvm/pr-subscribers-llvm-globalisel Author: David Green (davemgreen) Changes The code is similar to computeKnownBits and the code in SelectionDAG::ComputeNumSignBits. --- Full diff: https://github.com/llvm/llvm-project/pull/139505.diff 3 Files Affected: - (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+24) - (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-23) - (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+15-26) ``diff diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp index 41e36e1e6640b..fb483ed962270 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp @@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_SHUFFLE_VECTOR: { +// Collect the minimum number of sign bits that are shared by every vector +// element referenced by the shuffle. +APInt DemandedLHS, DemandedRHS; +unsigned NumElts = MRI.getType(MI.getOperand(1).getReg()).getNumElements(); +if (!getShuffleDemandedElts(NumElts, MI.getOperand(3).getShuffleMask(), +DemandedElts, DemandedLHS, DemandedRHS)) + return 1; + +unsigned Tmp = std::numeric_limits::max(); +if (!!DemandedLHS) + Tmp = + computeNumSignBits(MI.getOperand(1).getReg(), DemandedLHS, Depth + 1); +if (!!DemandedRHS) { + unsigned Tmp2 = + computeNumSignBits(MI.getOperand(2).getReg(), DemandedRHS, Depth + 1); + Tmp = std::min(Tmp, Tmp2); +} +// If we don't know anything, early out and try computeKnownBits fall-back. +if (Tmp == 1) + break; +assert(Tmp <= TyBits && "Failed to determine minimum sign bits"); +return Tmp; + } case TargetOpcode::G_INTRINSIC: case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS: case TargetOpcode::G_INTRINSIC_CONVERGENT: diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll index 56393142726c7..d86cbf57a65f3 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll @@ -400,9 +400,10 @@ define <8 x i16> @missing_insert(<8 x i8> %b) { ; ; CHECK-GI-LABEL: missing_insert: ; CHECK-GI: // %bb.0: // %entry -; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0 -; CHECK-GI-NEXT:ext v1.16b, v0.16b, v0.16b, #4 -; CHECK-GI-NEXT:mul v0.8h, v1.8h, v0.8h +; CHECK-GI-NEXT:sshll v1.8h, v0.8b, #0 +; CHECK-GI-NEXT:ext v1.16b, v1.16b, v1.16b, #4 +; CHECK-GI-NEXT:xtn v1.8b, v1.8h +; CHECK-GI-NEXT:smull v0.8h, v1.8b, v0.8b ; CHECK-GI-NEXT:ret entry: %ext.b = sext <8 x i8> %b to <8 x i16> @@ -421,10 +422,10 @@ define <8 x i16> @shufsext_v8i8_v8i16(<8 x i8> %src, <8 x i8> %b) { ; CHECK-GI-LABEL: shufsext_v8i8_v8i16: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0 -; CHECK-GI-NEXT:sshll v1.8h, v1.8b, #0 ; CHECK-GI-NEXT:rev64 v0.8h, v0.8h ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:mul v0.8h, v0.8h, v1.8h +; CHECK-GI-NEXT:xtn v0.8b, v0.8h +; CHECK-GI-NEXT:smull v0.8h, v0.8b, v1.8b ; CHECK-GI-NEXT:ret entry: %in = sext <8 x i8> %src to <8 x i16> @@ -444,16 +445,9 @@ define <2 x i64> @shufsext_v2i32_v2i64(<2 x i32> %src, <2 x i32> %b) { ; CHECK-GI-LABEL: shufsext_v2i32_v2i64: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 -; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v0.2s, v0.2d +; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s ; CHECK-GI-NEXT:ret entry: %in = sext <2 x i32> %src to <2 x i64> @@ -496,16 +490,9 @@ define <2 x i64> @shufzext_v2i32_v2i64(<2 x i32> %src, <2 x i32> %b) { ; CHECK-GI-LABEL: shufzext_v2i32_v2i64: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 -; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v0.2s, v0.2d +; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s ; CHECK-GI-NEXT:ret entry: %in = sext <2 x i32> %src to <2 x i64> diff --git a/llvm/test/CodeGen/AA
[llvm-branch-commits] [llvm] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR (PR #139505)
https://github.com/davemgreen created https://github.com/llvm/llvm-project/pull/139505 The code is similar to computeKnownBits and the code in SelectionDAG::ComputeNumSignBits. >From 68fc0c493331eaa56ebc862ef7dfb7106cabad82 Mon Sep 17 00:00:00 2001 From: David Green Date: Mon, 12 May 2025 07:36:16 +0100 Subject: [PATCH] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR The code is similar to computeKnownBits and the code in SelectionDAG::ComputeNumSignBits --- .../CodeGen/GlobalISel/GISelValueTracking.cpp | 24 +++ llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll | 33 +-- .../AArch64/aarch64-matrix-umull-smull.ll | 41 +++ 3 files changed, 49 insertions(+), 49 deletions(-) diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp index 41e36e1e6640b..fb483ed962270 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp @@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_SHUFFLE_VECTOR: { +// Collect the minimum number of sign bits that are shared by every vector +// element referenced by the shuffle. +APInt DemandedLHS, DemandedRHS; +unsigned NumElts = MRI.getType(MI.getOperand(1).getReg()).getNumElements(); +if (!getShuffleDemandedElts(NumElts, MI.getOperand(3).getShuffleMask(), +DemandedElts, DemandedLHS, DemandedRHS)) + return 1; + +unsigned Tmp = std::numeric_limits::max(); +if (!!DemandedLHS) + Tmp = + computeNumSignBits(MI.getOperand(1).getReg(), DemandedLHS, Depth + 1); +if (!!DemandedRHS) { + unsigned Tmp2 = + computeNumSignBits(MI.getOperand(2).getReg(), DemandedRHS, Depth + 1); + Tmp = std::min(Tmp, Tmp2); +} +// If we don't know anything, early out and try computeKnownBits fall-back. +if (Tmp == 1) + break; +assert(Tmp <= TyBits && "Failed to determine minimum sign bits"); +return Tmp; + } case TargetOpcode::G_INTRINSIC: case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS: case TargetOpcode::G_INTRINSIC_CONVERGENT: diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll index 56393142726c7..d86cbf57a65f3 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll @@ -400,9 +400,10 @@ define <8 x i16> @missing_insert(<8 x i8> %b) { ; ; CHECK-GI-LABEL: missing_insert: ; CHECK-GI: // %bb.0: // %entry -; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0 -; CHECK-GI-NEXT:ext v1.16b, v0.16b, v0.16b, #4 -; CHECK-GI-NEXT:mul v0.8h, v1.8h, v0.8h +; CHECK-GI-NEXT:sshll v1.8h, v0.8b, #0 +; CHECK-GI-NEXT:ext v1.16b, v1.16b, v1.16b, #4 +; CHECK-GI-NEXT:xtn v1.8b, v1.8h +; CHECK-GI-NEXT:smull v0.8h, v1.8b, v0.8b ; CHECK-GI-NEXT:ret entry: %ext.b = sext <8 x i8> %b to <8 x i16> @@ -421,10 +422,10 @@ define <8 x i16> @shufsext_v8i8_v8i16(<8 x i8> %src, <8 x i8> %b) { ; CHECK-GI-LABEL: shufsext_v8i8_v8i16: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0 -; CHECK-GI-NEXT:sshll v1.8h, v1.8b, #0 ; CHECK-GI-NEXT:rev64 v0.8h, v0.8h ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:mul v0.8h, v0.8h, v1.8h +; CHECK-GI-NEXT:xtn v0.8b, v0.8h +; CHECK-GI-NEXT:smull v0.8h, v0.8b, v1.8b ; CHECK-GI-NEXT:ret entry: %in = sext <8 x i8> %src to <8 x i16> @@ -444,16 +445,9 @@ define <2 x i64> @shufsext_v2i32_v2i64(<2 x i32> %src, <2 x i32> %b) { ; CHECK-GI-LABEL: shufsext_v2i32_v2i64: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 -; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v0.2s, v0.2d +; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s ; CHECK-GI-NEXT:ret entry: %in = sext <2 x i32> %src to <2 x i64> @@ -496,16 +490,9 @@ define <2 x i64> @shufzext_v2i32_v2i64(<2 x i32> %src, <2 x i32> %b) { ; CHECK-GI-LABEL: shufzext_v2i32_v2i64: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 -; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEX
[llvm-branch-commits] [llvm] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR (PR #139505)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) Changes The code is similar to computeKnownBits and the code in SelectionDAG::ComputeNumSignBits. --- Full diff: https://github.com/llvm/llvm-project/pull/139505.diff 3 Files Affected: - (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+24) - (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-23) - (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+15-26) ``diff diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp index 41e36e1e6640b..fb483ed962270 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp @@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_SHUFFLE_VECTOR: { +// Collect the minimum number of sign bits that are shared by every vector +// element referenced by the shuffle. +APInt DemandedLHS, DemandedRHS; +unsigned NumElts = MRI.getType(MI.getOperand(1).getReg()).getNumElements(); +if (!getShuffleDemandedElts(NumElts, MI.getOperand(3).getShuffleMask(), +DemandedElts, DemandedLHS, DemandedRHS)) + return 1; + +unsigned Tmp = std::numeric_limits::max(); +if (!!DemandedLHS) + Tmp = + computeNumSignBits(MI.getOperand(1).getReg(), DemandedLHS, Depth + 1); +if (!!DemandedRHS) { + unsigned Tmp2 = + computeNumSignBits(MI.getOperand(2).getReg(), DemandedRHS, Depth + 1); + Tmp = std::min(Tmp, Tmp2); +} +// If we don't know anything, early out and try computeKnownBits fall-back. +if (Tmp == 1) + break; +assert(Tmp <= TyBits && "Failed to determine minimum sign bits"); +return Tmp; + } case TargetOpcode::G_INTRINSIC: case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS: case TargetOpcode::G_INTRINSIC_CONVERGENT: diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll index 56393142726c7..d86cbf57a65f3 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll @@ -400,9 +400,10 @@ define <8 x i16> @missing_insert(<8 x i8> %b) { ; ; CHECK-GI-LABEL: missing_insert: ; CHECK-GI: // %bb.0: // %entry -; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0 -; CHECK-GI-NEXT:ext v1.16b, v0.16b, v0.16b, #4 -; CHECK-GI-NEXT:mul v0.8h, v1.8h, v0.8h +; CHECK-GI-NEXT:sshll v1.8h, v0.8b, #0 +; CHECK-GI-NEXT:ext v1.16b, v1.16b, v1.16b, #4 +; CHECK-GI-NEXT:xtn v1.8b, v1.8h +; CHECK-GI-NEXT:smull v0.8h, v1.8b, v0.8b ; CHECK-GI-NEXT:ret entry: %ext.b = sext <8 x i8> %b to <8 x i16> @@ -421,10 +422,10 @@ define <8 x i16> @shufsext_v8i8_v8i16(<8 x i8> %src, <8 x i8> %b) { ; CHECK-GI-LABEL: shufsext_v8i8_v8i16: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0 -; CHECK-GI-NEXT:sshll v1.8h, v1.8b, #0 ; CHECK-GI-NEXT:rev64 v0.8h, v0.8h ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:mul v0.8h, v0.8h, v1.8h +; CHECK-GI-NEXT:xtn v0.8b, v0.8h +; CHECK-GI-NEXT:smull v0.8h, v0.8b, v1.8b ; CHECK-GI-NEXT:ret entry: %in = sext <8 x i8> %src to <8 x i16> @@ -444,16 +445,9 @@ define <2 x i64> @shufsext_v2i32_v2i64(<2 x i32> %src, <2 x i32> %b) { ; CHECK-GI-LABEL: shufsext_v2i32_v2i64: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 -; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v0.2s, v0.2d +; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s ; CHECK-GI-NEXT:ret entry: %in = sext <2 x i32> %src to <2 x i64> @@ -496,16 +490,9 @@ define <2 x i64> @shufzext_v2i32_v2i64(<2 x i32> %src, <2 x i32> %b) { ; CHECK-GI-LABEL: shufzext_v2i32_v2i64: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 -; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8 -; CHECK-GI-NEXT:fmov x9, d1 -; CHECK-GI-NEXT:mov x11, v1.d[1] -; CHECK-GI-NEXT:fmov x8, d0 -; CHECK-GI-NEXT:mov x10, v0.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v0.2s, v0.2d +; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s ; CHECK-GI-NEXT:ret entry: %in = sext <2 x i32> %src to <2 x i64> diff --git a/llvm/test/CodeGen/AA
[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR. (PR #139506)
https://github.com/davemgreen created https://github.com/llvm/llvm-project/pull/139506 The code is similar to SelectionDAG::ComputeNumSignBits, but does not deal with truncating buildvectors. >From c1286744212c2b2f09e923161a6e6fc4d894e216 Mon Sep 17 00:00:00 2001 From: David Green Date: Mon, 12 May 2025 07:48:44 +0100 Subject: [PATCH] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR. --- .../CodeGen/GlobalISel/GISelValueTracking.cpp | 17 +++ llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll | 28 --- .../AArch64/aarch64-matrix-umull-smull.ll | 46 +-- 3 files changed, 49 insertions(+), 42 deletions(-) diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp index fb483ed962270..999bae6ccf42c 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp @@ -874,6 +874,23 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_BUILD_VECTOR: { +// Collect the known bits that are shared by every demanded vector element. +FirstAnswer = TyBits; +for (unsigned i = 0, e = MI.getNumOperands() - 1; i < e; ++i) { + if (!DemandedElts[i]) +continue; + + unsigned Tmp2 = computeNumSignBits(MI.getOperand(i + 1).getReg(), + APInt(1, 1), Depth + 1); + FirstAnswer = std::min(FirstAnswer, Tmp2); + + // If we don't know any bits, early out. + if (FirstAnswer == 1) +break; +} +break; + } case TargetOpcode::G_SHUFFLE_VECTOR: { // Collect the minimum number of sign bits that are shared by every vector // element referenced by the shuffle. diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll index d86cbf57a65f3..295863f18fd41 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll @@ -61,9 +61,9 @@ define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) { ; CHECK-GI-LABEL: dupsext_v4i16_v4i32: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sxth w8, w0 -; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0 ; CHECK-GI-NEXT:dup v1.4s, w8 -; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s +; CHECK-GI-NEXT:xtn v1.4h, v1.4s +; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h ; CHECK-GI-NEXT:ret entry: %in = sext i16 %src to i32 @@ -108,16 +108,9 @@ define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> %b) { ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0 ; CHECK-GI-NEXT:sxtw x8, w0 -; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 ; CHECK-GI-NEXT:dup v1.2d, x8 -; CHECK-GI-NEXT:fmov x9, d0 -; CHECK-GI-NEXT:mov x11, v0.d[1] -; CHECK-GI-NEXT:fmov x8, d1 -; CHECK-GI-NEXT:mov x10, v1.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v1.2s, v1.2d +; CHECK-GI-NEXT:smull v0.2d, v1.2s, v0.2s ; CHECK-GI-NEXT:ret entry: %in = sext i32 %src to i64 @@ -293,15 +286,14 @@ define <4 x i32> @nonsplat_shuffleinsert2(<4 x i16> %b, i16 %b0, i16 %b1, i16 %b ; CHECK-GI-LABEL: nonsplat_shuffleinsert2: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sxth w8, w0 -; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0 -; CHECK-GI-NEXT:mov v1.s[0], w8 -; CHECK-GI-NEXT:sxth w8, w1 -; CHECK-GI-NEXT:mov v1.s[1], w8 +; CHECK-GI-NEXT:sxth w9, w1 +; CHECK-GI-NEXT:fmov s1, w8 ; CHECK-GI-NEXT:sxth w8, w2 -; CHECK-GI-NEXT:mov v1.s[2], w8 +; CHECK-GI-NEXT:mov v1.h[1], w9 +; CHECK-GI-NEXT:mov v1.h[2], w8 ; CHECK-GI-NEXT:sxth w8, w3 -; CHECK-GI-NEXT:mov v1.s[3], w8 -; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s +; CHECK-GI-NEXT:mov v1.h[3], w8 +; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h ; CHECK-GI-NEXT:ret entry: %s0 = sext i16 %b0 to i32 diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll index b89b422c8c5ad..418113a4e4e09 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll @@ -108,11 +108,12 @@ define void @matrix_mul_signed(i32 %N, ptr nocapture %C, ptr nocapture readonly ; ; CHECK-GI-LABEL: matrix_mul_signed: ; CHECK-GI: // %bb.0: // %vector.header -; CHECK-GI-NEXT:sxth w9, w3 +; CHECK-GI-NEXT:sxth w8, w3 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0 +; CHECK-GI-NEXT:dup v0.4s, w8 ; CHECK-GI-NEXT:sxtw x8, w0 -; CHECK-GI-NEXT:dup v0.4s, w9 ; CHECK-GI-NEXT:and x8, x8, #0xfff8 +; CHECK-GI-NEXT:xtn v0.4h, v0.4s ; CHECK-GI-NEXT: .LBB1_1: // %vector.body ; CHECK-GI-NEXT:// =>This Inn
[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR. (PR #139506)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) Changes The code is similar to SelectionDAG::ComputeNumSignBits, but does not deal with truncating buildvectors. --- Full diff: https://github.com/llvm/llvm-project/pull/139506.diff 3 Files Affected: - (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+17) - (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-18) - (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+22-24) ``diff diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp index fb483ed962270..999bae6ccf42c 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp @@ -874,6 +874,23 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_BUILD_VECTOR: { +// Collect the known bits that are shared by every demanded vector element. +FirstAnswer = TyBits; +for (unsigned i = 0, e = MI.getNumOperands() - 1; i < e; ++i) { + if (!DemandedElts[i]) +continue; + + unsigned Tmp2 = computeNumSignBits(MI.getOperand(i + 1).getReg(), + APInt(1, 1), Depth + 1); + FirstAnswer = std::min(FirstAnswer, Tmp2); + + // If we don't know any bits, early out. + if (FirstAnswer == 1) +break; +} +break; + } case TargetOpcode::G_SHUFFLE_VECTOR: { // Collect the minimum number of sign bits that are shared by every vector // element referenced by the shuffle. diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll index d86cbf57a65f3..295863f18fd41 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll @@ -61,9 +61,9 @@ define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) { ; CHECK-GI-LABEL: dupsext_v4i16_v4i32: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sxth w8, w0 -; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0 ; CHECK-GI-NEXT:dup v1.4s, w8 -; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s +; CHECK-GI-NEXT:xtn v1.4h, v1.4s +; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h ; CHECK-GI-NEXT:ret entry: %in = sext i16 %src to i32 @@ -108,16 +108,9 @@ define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> %b) { ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0 ; CHECK-GI-NEXT:sxtw x8, w0 -; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 ; CHECK-GI-NEXT:dup v1.2d, x8 -; CHECK-GI-NEXT:fmov x9, d0 -; CHECK-GI-NEXT:mov x11, v0.d[1] -; CHECK-GI-NEXT:fmov x8, d1 -; CHECK-GI-NEXT:mov x10, v1.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v1.2s, v1.2d +; CHECK-GI-NEXT:smull v0.2d, v1.2s, v0.2s ; CHECK-GI-NEXT:ret entry: %in = sext i32 %src to i64 @@ -293,15 +286,14 @@ define <4 x i32> @nonsplat_shuffleinsert2(<4 x i16> %b, i16 %b0, i16 %b1, i16 %b ; CHECK-GI-LABEL: nonsplat_shuffleinsert2: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sxth w8, w0 -; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0 -; CHECK-GI-NEXT:mov v1.s[0], w8 -; CHECK-GI-NEXT:sxth w8, w1 -; CHECK-GI-NEXT:mov v1.s[1], w8 +; CHECK-GI-NEXT:sxth w9, w1 +; CHECK-GI-NEXT:fmov s1, w8 ; CHECK-GI-NEXT:sxth w8, w2 -; CHECK-GI-NEXT:mov v1.s[2], w8 +; CHECK-GI-NEXT:mov v1.h[1], w9 +; CHECK-GI-NEXT:mov v1.h[2], w8 ; CHECK-GI-NEXT:sxth w8, w3 -; CHECK-GI-NEXT:mov v1.s[3], w8 -; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s +; CHECK-GI-NEXT:mov v1.h[3], w8 +; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h ; CHECK-GI-NEXT:ret entry: %s0 = sext i16 %b0 to i32 diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll index b89b422c8c5ad..418113a4e4e09 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll @@ -108,11 +108,12 @@ define void @matrix_mul_signed(i32 %N, ptr nocapture %C, ptr nocapture readonly ; ; CHECK-GI-LABEL: matrix_mul_signed: ; CHECK-GI: // %bb.0: // %vector.header -; CHECK-GI-NEXT:sxth w9, w3 +; CHECK-GI-NEXT:sxth w8, w3 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0 +; CHECK-GI-NEXT:dup v0.4s, w8 ; CHECK-GI-NEXT:sxtw x8, w0 -; CHECK-GI-NEXT:dup v0.4s, w9 ; CHECK-GI-NEXT:and x8, x8, #0xfff8 +; CHECK-GI-NEXT:xtn v0.4h, v0.4s ; CHECK-GI-NEXT: .LBB1_1: // %vector.body ; CHECK-GI-NEXT:// =>This Inner Loop Header: Depth=1 ; CHECK-GI-NEXT:add x9, x2, w0, sxtw #1 @@ -120,10 +121,8 @@ define void @matrix_mul_signe
[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR. (PR #139506)
llvmbot wrote: @llvm/pr-subscribers-llvm-globalisel Author: David Green (davemgreen) Changes The code is similar to SelectionDAG::ComputeNumSignBits, but does not deal with truncating buildvectors. --- Full diff: https://github.com/llvm/llvm-project/pull/139506.diff 3 Files Affected: - (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+17) - (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-18) - (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+22-24) ``diff diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp index fb483ed962270..999bae6ccf42c 100644 --- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp +++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp @@ -874,6 +874,23 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_BUILD_VECTOR: { +// Collect the known bits that are shared by every demanded vector element. +FirstAnswer = TyBits; +for (unsigned i = 0, e = MI.getNumOperands() - 1; i < e; ++i) { + if (!DemandedElts[i]) +continue; + + unsigned Tmp2 = computeNumSignBits(MI.getOperand(i + 1).getReg(), + APInt(1, 1), Depth + 1); + FirstAnswer = std::min(FirstAnswer, Tmp2); + + // If we don't know any bits, early out. + if (FirstAnswer == 1) +break; +} +break; + } case TargetOpcode::G_SHUFFLE_VECTOR: { // Collect the minimum number of sign bits that are shared by every vector // element referenced by the shuffle. diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll index d86cbf57a65f3..295863f18fd41 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll @@ -61,9 +61,9 @@ define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) { ; CHECK-GI-LABEL: dupsext_v4i16_v4i32: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sxth w8, w0 -; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0 ; CHECK-GI-NEXT:dup v1.4s, w8 -; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s +; CHECK-GI-NEXT:xtn v1.4h, v1.4s +; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h ; CHECK-GI-NEXT:ret entry: %in = sext i16 %src to i32 @@ -108,16 +108,9 @@ define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> %b) { ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0 ; CHECK-GI-NEXT:sxtw x8, w0 -; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0 ; CHECK-GI-NEXT:dup v1.2d, x8 -; CHECK-GI-NEXT:fmov x9, d0 -; CHECK-GI-NEXT:mov x11, v0.d[1] -; CHECK-GI-NEXT:fmov x8, d1 -; CHECK-GI-NEXT:mov x10, v1.d[1] -; CHECK-GI-NEXT:mul x8, x8, x9 -; CHECK-GI-NEXT:mul x9, x10, x11 -; CHECK-GI-NEXT:mov v0.d[0], x8 -; CHECK-GI-NEXT:mov v0.d[1], x9 +; CHECK-GI-NEXT:xtn v1.2s, v1.2d +; CHECK-GI-NEXT:smull v0.2d, v1.2s, v0.2s ; CHECK-GI-NEXT:ret entry: %in = sext i32 %src to i64 @@ -293,15 +286,14 @@ define <4 x i32> @nonsplat_shuffleinsert2(<4 x i16> %b, i16 %b0, i16 %b1, i16 %b ; CHECK-GI-LABEL: nonsplat_shuffleinsert2: ; CHECK-GI: // %bb.0: // %entry ; CHECK-GI-NEXT:sxth w8, w0 -; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0 -; CHECK-GI-NEXT:mov v1.s[0], w8 -; CHECK-GI-NEXT:sxth w8, w1 -; CHECK-GI-NEXT:mov v1.s[1], w8 +; CHECK-GI-NEXT:sxth w9, w1 +; CHECK-GI-NEXT:fmov s1, w8 ; CHECK-GI-NEXT:sxth w8, w2 -; CHECK-GI-NEXT:mov v1.s[2], w8 +; CHECK-GI-NEXT:mov v1.h[1], w9 +; CHECK-GI-NEXT:mov v1.h[2], w8 ; CHECK-GI-NEXT:sxth w8, w3 -; CHECK-GI-NEXT:mov v1.s[3], w8 -; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s +; CHECK-GI-NEXT:mov v1.h[3], w8 +; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h ; CHECK-GI-NEXT:ret entry: %s0 = sext i16 %b0 to i32 diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll index b89b422c8c5ad..418113a4e4e09 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll @@ -108,11 +108,12 @@ define void @matrix_mul_signed(i32 %N, ptr nocapture %C, ptr nocapture readonly ; ; CHECK-GI-LABEL: matrix_mul_signed: ; CHECK-GI: // %bb.0: // %vector.header -; CHECK-GI-NEXT:sxth w9, w3 +; CHECK-GI-NEXT:sxth w8, w3 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0 +; CHECK-GI-NEXT:dup v0.4s, w8 ; CHECK-GI-NEXT:sxtw x8, w0 -; CHECK-GI-NEXT:dup v0.4s, w9 ; CHECK-GI-NEXT:and x8, x8, #0xfff8 +; CHECK-GI-NEXT:xtn v0.4h, v0.4s ; CHECK-GI-NEXT: .LBB1_1: // %vector.body ; CHECK-GI-NEXT:// =>This Inner Loop Header: Depth=1 ; CHECK-GI-NEXT:add x9, x2, w0, sxtw #1 @@ -120,10 +121,8 @@ define void @matrix_mul_signe
[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)
@@ -60388,6 +60393,35 @@ static SDValue combineINTRINSIC_VOID(SDNode *N, SelectionDAG &DAG, return SDValue(); } +static SDValue combineVZEXT_LOAD(SDNode *N, SelectionDAG &DAG, + TargetLowering::DAGCombinerInfo &DCI) { + // Find the TokenFactor to locate the associated AtomicLoad. + SDNode *ALD = nullptr; + for (auto &TF : DAG.allnodes()) arsenm wrote: Looking at all nodes should never be necessary https://github.com/llvm/llvm-project/pull/125432 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for UA 2007 instructions (PR #138401)
https://github.com/s-barannikov approved this pull request. https://github.com/llvm/llvm-project/pull/138401 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Implicit resource binding for cbuffers (PR #139022)
@@ -539,19 +537,27 @@ static void initializeBuffer(CodeGenModule &CGM, llvm::GlobalVariable *GV, } static void initializeBufferFromBinding(CodeGenModule &CGM, -llvm::GlobalVariable *GV, unsigned Slot, -unsigned Space) { +llvm::GlobalVariable *GV, +HLSLResourceBindingAttr *RBA) { llvm::Type *Int1Ty = llvm::Type::getInt1Ty(CGM.getLLVMContext()); - llvm::Value *Args[] = { - llvm::ConstantInt::get(CGM.IntTy, Space), /* reg_space */ - llvm::ConstantInt::get(CGM.IntTy, Slot), /* lower_bound */ - llvm::ConstantInt::get(CGM.IntTy, 1), /* range_size */ - llvm::ConstantInt::get(CGM.IntTy, 0), /* index */ - llvm::ConstantInt::get(Int1Ty, false) /* non-uniform */ - }; - initializeBuffer(CGM, GV, - CGM.getHLSLRuntime().getCreateHandleFromBindingIntrinsic(), - Args); + auto *False = llvm::ConstantInt::get(Int1Ty, false); + auto *Zero = llvm::ConstantInt::get(CGM.IntTy, 0); + auto *One = llvm::ConstantInt::get(CGM.IntTy, 1); bogner wrote: Shouldn't these be called `NonUniform`, `Index`, and `RangeSize`? Naming these after their values isn't very helpful. https://github.com/llvm/llvm-project/pull/139022 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 07bd645 - Revert "[RISCV] Implement codegen for XAndesPerf lea instructions (#137925)"
Author: Jim Lin Date: 2025-05-12T10:41:56+08:00 New Revision: 07bd6454806aa8149809c49833b6e7c165a2eb51 URL: https://github.com/llvm/llvm-project/commit/07bd6454806aa8149809c49833b6e7c165a2eb51 DIFF: https://github.com/llvm/llvm-project/commit/07bd6454806aa8149809c49833b6e7c165a2eb51.diff LOG: Revert "[RISCV] Implement codegen for XAndesPerf lea instructions (#137925)" This reverts commit a788a1abd9c881aa113f5932d100e1a2e3898e14. Added: Modified: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td llvm/lib/Target/RISCV/RISCVInstrInfoZb.td llvm/test/CodeGen/RISCV/rv32zba.ll llvm/test/CodeGen/RISCV/rv64zba.ll Removed: diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index 158a3afdb864c..134d82d84b237 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -14516,8 +14516,8 @@ static SDValue combineBinOpToReduce(SDNode *N, SelectionDAG &DAG, // (SLLI (SH*ADD x, y), c0), if c1-c0 equals to [1|2|3]. static SDValue transformAddShlImm(SDNode *N, SelectionDAG &DAG, const RISCVSubtarget &Subtarget) { - // Perform this optimization only in the zba/xandesperf extension. - if (!Subtarget.hasStdExtZba() && !Subtarget.hasVendorXAndesPerf()) + // Perform this optimization only in the zba extension. + if (!Subtarget.hasStdExtZba()) return SDValue(); // Skip for vector types and larger types. @@ -15448,9 +15448,8 @@ static SDValue expandMul(SDNode *N, SelectionDAG &DAG, if (VT != Subtarget.getXLenVT()) return SDValue(); - const bool HasShlAdd = Subtarget.hasStdExtZba() || - Subtarget.hasVendorXTHeadBa() || - Subtarget.hasVendorXAndesPerf(); + const bool HasShlAdd = + Subtarget.hasStdExtZba() || Subtarget.hasVendorXTHeadBa(); ConstantSDNode *CNode = dyn_cast(N->getOperand(1)); if (!CNode) diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td index 4e01b93d76e80..2ec768435259c 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td @@ -135,16 +135,6 @@ class NDSRVInstRR funct7, string opcodestr> let mayStore = 0; } -class NDSRVInstLEA funct7, string opcodestr> -: RVInstR, - Sched<[WriteIALU, ReadIALU, ReadIALU]> { - let hasSideEffects = 0; - let mayLoad = 0; - let mayStore = 0; -} - // GP: ADDI, LB, LBU class NDSRVInstLBGP funct2, string opcodestr> : RVInst<(outs GPR:$rd), (ins simm18:$imm18), @@ -331,9 +321,9 @@ def NDS_BNEC : NDSRVInstBC<0b110, "nds.bnec">; def NDS_BFOS : NDSRVInstBFO<0b011, "nds.bfos">; def NDS_BFOZ : NDSRVInstBFO<0b010, "nds.bfoz">; -def NDS_LEA_H : NDSRVInstLEA<0b101, "nds.lea.h">; -def NDS_LEA_W : NDSRVInstLEA<0b110, "nds.lea.w">; -def NDS_LEA_D : NDSRVInstLEA<0b111, "nds.lea.d">; +def NDS_LEA_H : NDSRVInstRR<0b101, "nds.lea.h">; +def NDS_LEA_W : NDSRVInstRR<0b110, "nds.lea.w">; +def NDS_LEA_D : NDSRVInstRR<0b111, "nds.lea.d">; let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in def NDS_ADDIGP : NDSRVInstLBGP<0b01, "nds.addigp">; @@ -355,10 +345,10 @@ def NDS_FLMISM : NDSRVInstRR<0b0010011, "nds.flmism">; } // Predicates = [HasVendorXAndesPerf] let Predicates = [HasVendorXAndesPerf, IsRV64] in { -def NDS_LEA_B_ZE : NDSRVInstLEA<0b0001000, "nds.lea.b.ze">; -def NDS_LEA_H_ZE : NDSRVInstLEA<0b0001001, "nds.lea.h.ze">; -def NDS_LEA_W_ZE : NDSRVInstLEA<0b0001010, "nds.lea.w.ze">; -def NDS_LEA_D_ZE : NDSRVInstLEA<0b0001011, "nds.lea.d.ze">; +def NDS_LEA_B_ZE : NDSRVInstRR<0b0001000, "nds.lea.b.ze">; +def NDS_LEA_H_ZE : NDSRVInstRR<0b0001001, "nds.lea.h.ze">; +def NDS_LEA_W_ZE : NDSRVInstRR<0b0001010, "nds.lea.w.ze">; +def NDS_LEA_D_ZE : NDSRVInstRR<0b0001011, "nds.lea.d.ze">; def NDS_LWUGP : NDSRVInstLWGP<0b110, "nds.lwugp">; def NDS_LDGP : NDSRVInstLDGP<0b011, "nds.ldgp">; @@ -366,32 +356,3 @@ def NDS_LDGP : NDSRVInstLDGP<0b011, "nds.ldgp">; def NDS_SDGP : NDSRVInstSDGP<0b111, "nds.sdgp">; } // Predicates = [HasVendorXAndesPerf, IsRV64] } // DecoderNamespace = "XAndes" - -// Patterns - -let Predicates = [HasVendorXAndesPerf] in { - -defm : ShxAddPat<1, NDS_LEA_H>; -defm : ShxAddPat<2, NDS_LEA_W>; -defm : ShxAddPat<3, NDS_LEA_D>; - -def : CSImm12MulBy4Pat; -def : CSImm12MulBy8Pat; -} // Predicates = [HasVendorXAndesPerf] - -let Predicates = [HasVendorXAndesPerf, IsRV64] in { - -defm : ADD_UWPat; - -defm : ShxAdd_UWPat<1, NDS_LEA_H_ZE>; -defm : ShxAdd_UWPat<2, NDS_LEA_W_ZE>; -defm : ShxAdd_UWPat<3, NDS_LEA_D_ZE>; - -defm : Sh1Add_UWPat; -defm : Sh2Add_UWPat; -defm : Sh3Add_UWPat; - -def : Sh1AddPat; -def : Sh2AddPat; -def : Sh3AddPat; -} // Predicates = [HasVendorXAndesPerf, IsRV64] diff --git a/llvm/lib/Target/RISCV/RISCVInstr
[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)
https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/138635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)
@@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), (MOV16rm addr:$src)>; def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>; def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>; +def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 addr:$src)), + (MOVDI2PDIrm addr:$src)>; // load atomic <2 x i8> jofrn wrote: Switched it to a `zext` and now it dereferences 16 bits in the asm. https://github.com/llvm/llvm-project/pull/138635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138403 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)
llvmbot wrote: @llvm/pr-subscribers-mc Author: Koakuma (koachan) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/139451.diff 5 Files Affected: - (modified) llvm/lib/Target/Sparc/Sparc.td (+5-1) - (added) llvm/lib/Target/Sparc/SparcInstrCrypto.td (+98) - (modified) llvm/lib/Target/Sparc/SparcInstrInfo.td (+5) - (added) llvm/test/MC/Disassembler/Sparc/sparc-crypto.txt (+56) - (added) llvm/test/MC/Sparc/sparc-crypto.s (+88) ``diff diff --git a/llvm/lib/Target/Sparc/Sparc.td b/llvm/lib/Target/Sparc/Sparc.td index 6e6c887e60e12..7c26bf9061cb6 100644 --- a/llvm/lib/Target/Sparc/Sparc.td +++ b/llvm/lib/Target/Sparc/Sparc.td @@ -58,6 +58,9 @@ def FeatureUA2007 def FeatureOSA2011 : SubtargetFeature<"osa2011", "IsOSA2011", "true", "Enable Oracle SPARC Architecture 2011 extensions">; +def FeatureCrypto + : SubtargetFeature<"crypto", "IsCrypto", "true", + "Enable cryptographic extensions">; def FeatureLeon : SubtargetFeature<"leon", "IsLeon", "true", "Enable LEON extensions">; @@ -169,7 +172,8 @@ def : Proc<"niagara3",[FeatureV9, FeatureV8Deprecated, UsePopc, FeatureUA2005, FeatureUA2007]>; def : Proc<"niagara4",[FeatureV9, FeatureV8Deprecated, UsePopc, FeatureVIS, FeatureVIS2, FeatureVIS3, - FeatureUA2005, FeatureUA2007, FeatureOSA2011]>; + FeatureUA2005, FeatureUA2007, FeatureOSA2011, + FeatureCrypto]>; // LEON 2 FT generic def : Processor<"leon2", LEON2Itineraries, diff --git a/llvm/lib/Target/Sparc/SparcInstrCrypto.td b/llvm/lib/Target/Sparc/SparcInstrCrypto.td new file mode 100644 index 0..0e7063f99eb06 --- /dev/null +++ b/llvm/lib/Target/Sparc/SparcInstrCrypto.td @@ -0,0 +1,98 @@ +//===--- SparcInstrCrypto.td - cryptographic extensions ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This file contains instruction formats, definitions and patterns needed for +// cryptographic instructions on SPARC. +//===--===// + + +// Convenience template for 4-operand instructions +class FourOpImm op3val, bits<4> op5val, +RegisterClass RC> + : F3_4; + +let Predicates = [HasCrypto] in { +def AES_EROUND01 : FourOp<"aes_eround01", 0b011001, 0b, DFPRegs>; +def AES_EROUND23 : FourOp<"aes_eround23", 0b011001, 0b0001, DFPRegs>; +def AES_DROUND01 : FourOp<"aes_dround01", 0b011001, 0b0010, DFPRegs>; +def AES_DROUND23 : FourOp<"aes_dround23", 0b011001, 0b0011, DFPRegs>; +def AES_EROUND01_LAST : FourOp<"aes_eround01_l", 0b011001, 0b0100, DFPRegs>; +def AES_EROUND23_LAST : FourOp<"aes_eround23_l", 0b011001, 0b0101, DFPRegs>; +def AES_DROUND01_LAST : FourOp<"aes_dround01_l", 0b011001, 0b0110, DFPRegs>; +def AES_DROUND23_LAST : FourOp<"aes_dround23_l", 0b011001, 0b0111, DFPRegs>; +def AES_KEXPAND0 : F3_3<2, 0b110110, 0b10011, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"aes_kexpand0 $rs1, $rs2, $rd", []>; +def AES_KEXPAND1 : FourOpImm<"aes_kexpand1", 0b011001, 0b1000, DFPRegs>; +def AES_KEXPAND2 : F3_3<2, 0b110110, 0b100110001, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"aes_kexpand2 $rs1, $rs2, $rd", []>; + +def CAMELLIA_F : FourOp<"camellia_f", 0b011001, 0b1100, DFPRegs>; +def CAMELLIA_FL : F3_3<2, 0b110110, 0b10000, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"camellia_fl $rs1, $rs2, $rd", []>; +def CAMELLIA_FLI : F3_3<2, 0b110110, 0b10001, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"camellia_fli $rs1, $rs2, $rd", []>; + +def CRC32C : F3_3<2, 0b110110, 0b101000111, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"crc32c $rs1, $rs2, $rd", []>; + +def DES_ROUND : FourOp<"des_round", 0b011001, 0b1001, DFPRegs>; +let rs2 = 0 in { +def DES_IP : F3_3<2, 0b110110, 0b100110100, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1), +"des_ip $rs1, $rd", []>; +def DES_IIP : F3_3<2, 0b110110, 0b100110101, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1), +"des_iip $rs1, $rd", []>; +} +def DES_KEXPAND : F3_3<2, 0b110110, 0b100110110, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, simm5Op:$rs2), +"des_kexpand $rs1, $rs2, $rd", []>; + +let rs1 = 0, rs2 = 0, rd = 0 in { +let Uses = [D0, D1, D2, D5, D6, D7, D8, D9, D10, D11], +Def
[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)
https://github.com/koachan created https://github.com/llvm/llvm-project/pull/139451 None ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138403 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for UA 2005 instructions (PR #138400)
https://github.com/koachan updated https://github.com/llvm/llvm-project/pull/138400 >From b2e8de55ea9e54239a017eb932f7107f29f465a4 Mon Sep 17 00:00:00 2001 From: Koakuma Date: Sun, 4 May 2025 08:57:07 +0700 Subject: [PATCH 1/2] Add other instructions & fix typo Created using spr 1.3.5 --- llvm/lib/Target/Sparc/SparcInstrUAOSA.td| 17 - .../test/MC/Disassembler/Sparc/sparc-ua-osa.txt | 6 ++ llvm/test/MC/Sparc/sparc-ua2005.s | 9 + 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td index d883e517db89d..5ecc02ed10bfb 100644 --- a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td +++ b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td @@ -1,4 +1,4 @@ -//=== SparcInstrVIS.td - Visual Instruction Set extensions (VIS) -===// +//=== SparcInstrUAOSA.td - UltraSPARC/Oracle SPARC Architecture extensions ===// // // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // See https://llvm.org/LICENSE.txt for license information. @@ -18,4 +18,19 @@ def ALLCLEAN : InstSP<(outs), (ins), "allclean", []> { let Inst{29-19} = 0b00010110001; let Inst{18-0} = 0; } +def INVALW : InstSP<(outs), (ins), "invalw", []> { +let op = 2; +let Inst{29-19} = 0b00101110001; +let Inst{18-0} = 0; +} +def NORMALW : InstSP<(outs), (ins), "normalw", []> { +let op = 2; +let Inst{29-19} = 0b00100110001; +let Inst{18-0} = 0; +} +def OTHERW : InstSP<(outs), (ins), "otherw", []> { +let op = 2; +let Inst{29-19} = 0b0000001; +let Inst{18-0} = 0; +} } // Predicates = [HasUA2005] diff --git a/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt b/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt index dc3d196091c6b..4a2de98e03fe3 100644 --- a/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt +++ b/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt @@ -4,3 +4,9 @@ # CHECK: allclean 0x85,0x88,0x00,0x00 +# CHECK: invalw +0x8b,0x88,0x00,0x00 +# CHECK: otherw +0x87,0x88,0x00,0x00 +# CHECK: normalw +0x89,0x88,0x00,0x00 diff --git a/llvm/test/MC/Sparc/sparc-ua2005.s b/llvm/test/MC/Sparc/sparc-ua2005.s index 2214b91b335cd..b07c99a20033b 100644 --- a/llvm/test/MC/Sparc/sparc-ua2005.s +++ b/llvm/test/MC/Sparc/sparc-ua2005.s @@ -6,3 +6,12 @@ ! NO-UA2005: error: instruction requires a CPU feature not currently enabled ! UA2005: allclean ! encoding: [0x85,0x88,0x00,0x00] allclean +! NO-UA2005: error: instruction requires a CPU feature not currently enabled +! UA2005: invalw ! encoding: [0x8b,0x88,0x00,0x00] +invalw +! NO-UA2005: error: instruction requires a CPU feature not currently enabled +! UA2005: otherw ! encoding: [0x87,0x88,0x00,0x00] +otherw +! NO-UA2005: error: instruction requires a CPU feature not currently enabled +! UA2005: normalw! encoding: [0x89,0x88,0x00,0x00] +normalw >From a2c49c5b9ecf2451a20d660cdc059c3301a8b816 Mon Sep 17 00:00:00 2001 From: Koakuma Date: Mon, 12 May 2025 07:26:35 +0700 Subject: [PATCH 2/2] Fix indentation Created using spr 1.3.5 --- llvm/lib/Target/Sparc/SparcInstrUAOSA.td | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td index 8a833636301d0..b00995a960968 100644 --- a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td +++ b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td @@ -12,9 +12,9 @@ class UA2005RegWin fcn> : F3_1<2, 0b110001, (outs), (ins), asmstr, []> { -let rd = fcn; -let rs1 = 0; -let rs2 = 0; + let rd = fcn; + let rs1 = 0; + let rs2 = 0; } // UltraSPARC Architecture 2005 Instructions ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)
llvmbot wrote: @llvm/pr-subscribers-backend-sparc Author: Koakuma (koachan) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/139451.diff 5 Files Affected: - (modified) llvm/lib/Target/Sparc/Sparc.td (+5-1) - (added) llvm/lib/Target/Sparc/SparcInstrCrypto.td (+98) - (modified) llvm/lib/Target/Sparc/SparcInstrInfo.td (+5) - (added) llvm/test/MC/Disassembler/Sparc/sparc-crypto.txt (+56) - (added) llvm/test/MC/Sparc/sparc-crypto.s (+88) ``diff diff --git a/llvm/lib/Target/Sparc/Sparc.td b/llvm/lib/Target/Sparc/Sparc.td index 6e6c887e60e12..7c26bf9061cb6 100644 --- a/llvm/lib/Target/Sparc/Sparc.td +++ b/llvm/lib/Target/Sparc/Sparc.td @@ -58,6 +58,9 @@ def FeatureUA2007 def FeatureOSA2011 : SubtargetFeature<"osa2011", "IsOSA2011", "true", "Enable Oracle SPARC Architecture 2011 extensions">; +def FeatureCrypto + : SubtargetFeature<"crypto", "IsCrypto", "true", + "Enable cryptographic extensions">; def FeatureLeon : SubtargetFeature<"leon", "IsLeon", "true", "Enable LEON extensions">; @@ -169,7 +172,8 @@ def : Proc<"niagara3",[FeatureV9, FeatureV8Deprecated, UsePopc, FeatureUA2005, FeatureUA2007]>; def : Proc<"niagara4",[FeatureV9, FeatureV8Deprecated, UsePopc, FeatureVIS, FeatureVIS2, FeatureVIS3, - FeatureUA2005, FeatureUA2007, FeatureOSA2011]>; + FeatureUA2005, FeatureUA2007, FeatureOSA2011, + FeatureCrypto]>; // LEON 2 FT generic def : Processor<"leon2", LEON2Itineraries, diff --git a/llvm/lib/Target/Sparc/SparcInstrCrypto.td b/llvm/lib/Target/Sparc/SparcInstrCrypto.td new file mode 100644 index 0..0e7063f99eb06 --- /dev/null +++ b/llvm/lib/Target/Sparc/SparcInstrCrypto.td @@ -0,0 +1,98 @@ +//===--- SparcInstrCrypto.td - cryptographic extensions ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This file contains instruction formats, definitions and patterns needed for +// cryptographic instructions on SPARC. +//===--===// + + +// Convenience template for 4-operand instructions +class FourOpImm op3val, bits<4> op5val, +RegisterClass RC> + : F3_4; + +let Predicates = [HasCrypto] in { +def AES_EROUND01 : FourOp<"aes_eround01", 0b011001, 0b, DFPRegs>; +def AES_EROUND23 : FourOp<"aes_eround23", 0b011001, 0b0001, DFPRegs>; +def AES_DROUND01 : FourOp<"aes_dround01", 0b011001, 0b0010, DFPRegs>; +def AES_DROUND23 : FourOp<"aes_dround23", 0b011001, 0b0011, DFPRegs>; +def AES_EROUND01_LAST : FourOp<"aes_eround01_l", 0b011001, 0b0100, DFPRegs>; +def AES_EROUND23_LAST : FourOp<"aes_eround23_l", 0b011001, 0b0101, DFPRegs>; +def AES_DROUND01_LAST : FourOp<"aes_dround01_l", 0b011001, 0b0110, DFPRegs>; +def AES_DROUND23_LAST : FourOp<"aes_dround23_l", 0b011001, 0b0111, DFPRegs>; +def AES_KEXPAND0 : F3_3<2, 0b110110, 0b10011, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"aes_kexpand0 $rs1, $rs2, $rd", []>; +def AES_KEXPAND1 : FourOpImm<"aes_kexpand1", 0b011001, 0b1000, DFPRegs>; +def AES_KEXPAND2 : F3_3<2, 0b110110, 0b100110001, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"aes_kexpand2 $rs1, $rs2, $rd", []>; + +def CAMELLIA_F : FourOp<"camellia_f", 0b011001, 0b1100, DFPRegs>; +def CAMELLIA_FL : F3_3<2, 0b110110, 0b10000, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"camellia_fl $rs1, $rs2, $rd", []>; +def CAMELLIA_FLI : F3_3<2, 0b110110, 0b10001, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"camellia_fli $rs1, $rs2, $rd", []>; + +def CRC32C : F3_3<2, 0b110110, 0b101000111, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2), +"crc32c $rs1, $rs2, $rd", []>; + +def DES_ROUND : FourOp<"des_round", 0b011001, 0b1001, DFPRegs>; +let rs2 = 0 in { +def DES_IP : F3_3<2, 0b110110, 0b100110100, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1), +"des_ip $rs1, $rd", []>; +def DES_IIP : F3_3<2, 0b110110, 0b100110101, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1), +"des_iip $rs1, $rd", []>; +} +def DES_KEXPAND : F3_3<2, 0b110110, 0b100110110, +(outs DFPRegs:$rd), (ins DFPRegs:$rs1, simm5Op:$rs2), +"des_kexpand $rs1, $rs2, $rd", []>; + +let rs1 = 0, rs2 = 0, rd = 0 in { +let Uses = [D0, D1, D2, D5, D6, D7, D8, D9, D10, D11
[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)
https://github.com/mstorsjo created https://github.com/llvm/llvm-project/pull/139468 Backport f909b2229ac16ae3898d8b158bee85c384173dfa, the follow-up fix from 297f6d9f6b215bd7f58cf500b979b94dedbba7bb, plus two commits for updating the CI with regards to macOS. From 79e10b190029b749e042d1aaec3ee697a2f5d41a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20Storsj=C3=B6?= Date: Fri, 28 Feb 2025 20:43:46 -0100 Subject: [PATCH 1/4] [libcxx] Provide locale conversions to tests through lit substitution (#105651) There are 2 problems today that this PR resolves: libcxx tests assume the thousands separator for fr_FR locale is x00A0 on Windows. This currently fails when run on newer versions of Windows (it seems to have been updated to the new correct value of 0x202F around windows 11. The exact windows version where it changed doesn't seem to be documented anywhere). Depending the OS version, you need different values. There are several ifdefs to determine the environment/platform-specific locale conversion values and it leads to maintenance as things change over time. This PR includes the following changes: - Provide the environment's locale conversion values through a substitution. The test can opt in by placing the substitution value in a define flag. - Remove the platform ifdefs (the swapping of values between Windows, Linux, Apple, AIX). This is accomplished through a lit feature action that fetches the environment's locale conversions (lconv) for members like 'thousands_sep' that we need to provide. This should ensure that we don't lose the effectiveness of the test itself. In addition, as a result of the above, this PR: - Fixes a handful of locale tests which unexpectedly fail on newer Windows versions. - Resolves 3 XFAIL FIX-MEs. Originally submitted in https://github.com/llvm/llvm-project/pull/86649. Co-authored-by: Rodrigo Salazar <4rodrigosala...@gmail.com> (cherry picked from commit f909b2229ac16ae3898d8b158bee85c384173dfa) --- .../get_long_double_fr_FR.pass.cpp| 5 +- .../get_long_double_ru_RU.pass.cpp| 5 +- .../put_long_double_fr_FR.pass.cpp| 5 +- .../put_long_double_ru_RU.pass.cpp| 5 +- .../thousands_sep.pass.cpp| 34 ++- .../thousands_sep.pass.cpp| 20 ++-- .../time.duration.nonmember/ostream.pass.cpp | 24 ++--- libcxx/test/support/locale_helpers.h | 37 ++-- libcxx/utils/libcxx/test/features.py | 91 ++- 9 files changed, 138 insertions(+), 88 deletions(-) diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp index bbb67d694970a..f02241ad36a5b 100644 --- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp +++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp @@ -13,6 +13,8 @@ // REQUIRES: locale.fr_FR.UTF-8 +// ADDITIONAL_COMPILE_FLAGS: -DFR_MON_THOU_SEP=%{LOCALE_CONV_FR_FR_UTF_8_MON_THOUSANDS_SEP} + // // class money_get @@ -59,7 +61,8 @@ class my_facetw }; static std::wstring convert_thousands_sep(std::wstring const& in) { - return LocaleHelpers::convert_thousands_sep_fr_FR(in); + const wchar_t fr_sep = LocaleHelpers::mon_thousands_sep_or_default(FR_MON_THOU_SEP); + return LocaleHelpers::convert_thousands_sep(in, fr_sep); } #endif // TEST_HAS_NO_WIDE_CHARACTERS diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp index e680f2ea8816a..371cf0e90c8d3 100644 --- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp +++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp @@ -11,6 +11,8 @@ // REQUIRES: locale.ru_RU.UTF-8 +// ADDITIONAL_COMPILE_FLAGS: -DRU_MON_THOU_SEP=%{LOCALE_CONV_RU_RU_UTF_8_MON_THOUSANDS_SEP} + // XFAIL: glibc-old-ru_RU-decimal-point // @@ -52,7 +54,8 @@ class my_facetw }; static std::wstring convert_thousands_sep(std::wstring const& in) { - return LocaleHelpers::convert_thousands_sep_ru_RU(in); + const wchar_t ru_sep = LocaleHelpers::mon_thousands_sep_or_default(RU_MON_THOU_SEP); + return LocaleHelpers::convert_thousands_sep(in, ru_sep); } #endif // TEST_HAS_NO_WIDE_CHARACTERS diff --git a/libcxx/test/std/localization/locale.categories/category.monetar
[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)
llvmbot wrote: @llvm/pr-subscribers-github-workflow Author: Martin Storsjö (mstorsjo) Changes Backport f909b2229ac16ae3898d8b158bee85c384173dfa, the follow-up fix from 297f6d9f6b215bd7f58cf500b979b94dedbba7bb, plus two commits for updating the CI with regards to macOS. --- Patch is 38.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139468.diff 32 Files Affected: - (modified) .github/workflows/libcxx-build-and-test.yaml (+12-2) - (modified) libcxx/test/libcxx/input.output/iostreams.base/ios.base/ios.base.cons/dtor.uninitialized.pass.cpp (+4-1) - (modified) libcxx/test/libcxx/strings/basic.string/string.capacity/allocation_size.pass.cpp (-5) - (modified) libcxx/test/std/input.output/file.streams/fstreams/filebuf.virtuals/setbuf.pass.cpp (+6-2) - (modified) libcxx/test/std/input.output/iostream.format/input.streams/istream.unformatted/sync.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.collate/locale.collate.byname/compare.pass.cpp (+3) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp (+8-2) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp (+7-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp (+8-2) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_ru_RU.pass.cpp (+7-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_zh_CN.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/curr_symbol.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/grouping.pass.cpp (+5-2) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/neg_format.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp (+9-25) - (modified) libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp (+7) - (modified) libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp (+7) - (modified) libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long_double.pass.cpp (+7) - (modified) libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp (+11-12) - (modified) libcxx/test/std/strings/basic.string/string.capacity/max_size.pass.cpp (+1-5) - (modified) libcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp (+6) - (modified) libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp (+12-15) - (modified) libcxx/test/std/time/time.syn/formatter.duration.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.file_time.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.hh_mm_ss.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.local_time.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.sys_time.pass.cpp (+3) - (modified) libcxx/test/support/locale_helpers.h (+6-31) - (modified) libcxx/utils/generate_feature_test_macro_components.py (+1) - (modified) libcxx/utils/libcxx/test/features.py (+91-1) ``diff diff --git a/.github/workflows/libcxx-build-and-test.yaml b/.github/workflows/libcxx-build-and-test.yaml index 3346c1322a07c..84b2e104d260a 100644 --- a/.github/workflows/libcxx-build-and-test.yaml +++ b/.github/workflows/libcxx-build-and-test.yaml @@ -197,10 +197,20 @@ jobs: os: macos-15 - config: apple-configuration os: macos-15 +# TODO: These jobs are intended to test back-deployment (building against ToT libc++ but running against an +# older system-provided libc++.dylib). Doing this properly would require building the test suite on a +# recent macOS using a recent Clang (hence recent Xcode), and then running the actual test suite on an +# older mac. We could do that by e.g. sharing artifacts between the two jobs. +# +# However, our L
[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)
llvmbot wrote: @llvm/pr-subscribers-libcxx Author: Martin Storsjö (mstorsjo) Changes Backport f909b2229ac16ae3898d8b158bee85c384173dfa, the follow-up fix from 297f6d9f6b215bd7f58cf500b979b94dedbba7bb, plus two commits for updating the CI with regards to macOS. --- Patch is 38.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139468.diff 32 Files Affected: - (modified) .github/workflows/libcxx-build-and-test.yaml (+12-2) - (modified) libcxx/test/libcxx/input.output/iostreams.base/ios.base/ios.base.cons/dtor.uninitialized.pass.cpp (+4-1) - (modified) libcxx/test/libcxx/strings/basic.string/string.capacity/allocation_size.pass.cpp (-5) - (modified) libcxx/test/std/input.output/file.streams/fstreams/filebuf.virtuals/setbuf.pass.cpp (+6-2) - (modified) libcxx/test/std/input.output/iostream.format/input.streams/istream.unformatted/sync.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.collate/locale.collate.byname/compare.pass.cpp (+3) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp (+8-2) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp (+7-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp (+8-2) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_ru_RU.pass.cpp (+7-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_zh_CN.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/curr_symbol.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/grouping.pass.cpp (+5-2) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/neg_format.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp (+9-25) - (modified) libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp (+7) - (modified) libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp (+7) - (modified) libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long_double.pass.cpp (+7) - (modified) libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp (+4-1) - (modified) libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp (+11-12) - (modified) libcxx/test/std/strings/basic.string/string.capacity/max_size.pass.cpp (+1-5) - (modified) libcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp (+6) - (modified) libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp (+12-15) - (modified) libcxx/test/std/time/time.syn/formatter.duration.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.file_time.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.hh_mm_ss.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.local_time.pass.cpp (+3) - (modified) libcxx/test/std/time/time.syn/formatter.sys_time.pass.cpp (+3) - (modified) libcxx/test/support/locale_helpers.h (+6-31) - (modified) libcxx/utils/generate_feature_test_macro_components.py (+1) - (modified) libcxx/utils/libcxx/test/features.py (+91-1) ``diff diff --git a/.github/workflows/libcxx-build-and-test.yaml b/.github/workflows/libcxx-build-and-test.yaml index 3346c1322a07c..84b2e104d260a 100644 --- a/.github/workflows/libcxx-build-and-test.yaml +++ b/.github/workflows/libcxx-build-and-test.yaml @@ -197,10 +197,20 @@ jobs: os: macos-15 - config: apple-configuration os: macos-15 +# TODO: These jobs are intended to test back-deployment (building against ToT libc++ but running against an +# older system-provided libc++.dylib). Doing this properly would require building the test suite on a +# recent macOS using a recent Clang (hence recent Xcode), and then running the actual test suite on an +# older mac. We could do that by e.g. sharing artifacts between the two jobs. +# +# However, our Lit config
[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)
mstorsjo wrote: This is a manual backport attempt of the same as #136449, with some more fixes included. This should unbreak the libcxx CI on macOS on the release branch, which seems to be broken as is. https://github.com/llvm/llvm-project/pull/139468 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #136449)
mstorsjo wrote: > Do we still want to try to backport this one? I made a new backport attempt in #139468, let's see if it works. If not, it seems like the libcxx CI on the release branch is broken wrt macOS, and we can either choose to ignore it in all libcxx backports to 20.x, or just stop doing backports touching libcxx to this release branch (or we'd need to do even more CI fixing for the release branch). https://github.com/llvm/llvm-project/pull/136449 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/20.x: [RISCV] Allow `Zicsr`/`Zifencei` to duplicate with `g` (#136842) (PR #137490)
wangpc-pp wrote: Thanks @tstellar! Now all checks are passed! https://github.com/llvm/llvm-project/pull/137490 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits