[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for UA 2005 instructions (PR #138400)

2025-05-11 Thread Sergei Barannikov via llvm-branch-commits


@@ -0,0 +1,28 @@
+//=== SparcInstrUAOSA.td - UltraSPARC/Oracle SPARC Architecture extensions 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file contains instruction formats, definitions and patterns needed for
+// UA 2005 instructions on SPARC.
+//===--===//
+
+class UA2005RegWin fcn>
+: F3_1<2, 0b110001, (outs), (ins), asmstr, []> {
+let rd = fcn;
+let rs1 = 0;
+let rs2 = 0;

s-barannikov wrote:

The body is usually indented by 2 (the colon should still be indented by 4):

```suggestion
  let rd = fcn;
  let rs1 = 0;
  let rs2 = 0;
```


https://github.com/llvm/llvm-project/pull/138400
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for UA 2005 instructions (PR #138400)

2025-05-11 Thread Sergei Barannikov via llvm-branch-commits

https://github.com/s-barannikov approved this pull request.


https://github.com/llvm/llvm-project/pull/138400
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)

2025-05-11 Thread Martin Storsjö via llvm-branch-commits

https://github.com/mstorsjo updated 
https://github.com/llvm/llvm-project/pull/139468

From 79e10b190029b749e042d1aaec3ee697a2f5d41a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= 
Date: Fri, 28 Feb 2025 20:43:46 -0100
Subject: [PATCH 1/4] [libcxx] Provide locale conversions to tests through lit
 substitution (#105651)

There are 2 problems today that this PR resolves:

libcxx tests assume the thousands separator for fr_FR locale is x00A0 on
Windows. This currently fails when run on newer versions of Windows (it
seems to have been updated to the new correct value of 0x202F around
windows 11. The exact windows version where it changed doesn't seem to
be documented anywhere). Depending the OS version, you need different
values.

There are several ifdefs to determine the environment/platform-specific
locale conversion values and it leads to maintenance as things change
over time.

This PR includes the following changes:

- Provide the environment's locale conversion values through a
  substitution. The test can opt in by placing the substitution value in a
  define flag.
- Remove the platform ifdefs (the swapping of values between Windows,
  Linux, Apple, AIX).

This is accomplished through a lit feature action that fetches the
environment's locale conversions (lconv) for members like
'thousands_sep' that we need to provide. This should ensure that we
don't lose the effectiveness of the test itself.

In addition, as a result of the above, this PR:

- Fixes a handful of locale tests which unexpectedly fail on newer
  Windows versions.
- Resolves 3 XFAIL FIX-MEs.

Originally submitted in https://github.com/llvm/llvm-project/pull/86649.

Co-authored-by: Rodrigo Salazar <4rodrigosala...@gmail.com>
(cherry picked from commit f909b2229ac16ae3898d8b158bee85c384173dfa)
---
 .../get_long_double_fr_FR.pass.cpp|  5 +-
 .../get_long_double_ru_RU.pass.cpp|  5 +-
 .../put_long_double_fr_FR.pass.cpp|  5 +-
 .../put_long_double_ru_RU.pass.cpp|  5 +-
 .../thousands_sep.pass.cpp| 34 ++-
 .../thousands_sep.pass.cpp| 20 ++--
 .../time.duration.nonmember/ostream.pass.cpp  | 24 ++---
 libcxx/test/support/locale_helpers.h  | 37 ++--
 libcxx/utils/libcxx/test/features.py  | 91 ++-
 9 files changed, 138 insertions(+), 88 deletions(-)

diff --git 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
index bbb67d694970a..f02241ad36a5b 100644
--- 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
+++ 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
@@ -13,6 +13,8 @@
 
 // REQUIRES: locale.fr_FR.UTF-8
 
+// ADDITIONAL_COMPILE_FLAGS: 
-DFR_MON_THOU_SEP=%{LOCALE_CONV_FR_FR_UTF_8_MON_THOUSANDS_SEP}
+
 // 
 
 // class money_get
@@ -59,7 +61,8 @@ class my_facetw
 };
 
 static std::wstring convert_thousands_sep(std::wstring const& in) {
-  return LocaleHelpers::convert_thousands_sep_fr_FR(in);
+  const wchar_t fr_sep = 
LocaleHelpers::mon_thousands_sep_or_default(FR_MON_THOU_SEP);
+  return LocaleHelpers::convert_thousands_sep(in, fr_sep);
 }
 #endif // TEST_HAS_NO_WIDE_CHARACTERS
 
diff --git 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
index e680f2ea8816a..371cf0e90c8d3 100644
--- 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
+++ 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
@@ -11,6 +11,8 @@
 
 // REQUIRES: locale.ru_RU.UTF-8
 
+// ADDITIONAL_COMPILE_FLAGS: 
-DRU_MON_THOU_SEP=%{LOCALE_CONV_RU_RU_UTF_8_MON_THOUSANDS_SEP}
+
 // XFAIL: glibc-old-ru_RU-decimal-point
 
 // 
@@ -52,7 +54,8 @@ class my_facetw
 };
 
 static std::wstring convert_thousands_sep(std::wstring const& in) {
-  return LocaleHelpers::convert_thousands_sep_ru_RU(in);
+  const wchar_t ru_sep = 
LocaleHelpers::mon_thousands_sep_or_default(RU_MON_THOU_SEP);
+  return LocaleHelpers::convert_thousands_sep(in, ru_sep);
 }
 #endif // TEST_HAS_NO_WIDE_CHARACTERS
 
diff --git 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.m

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 4fccbd69f8ee5b6f16b08da38cb65d989450c8aa Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  37 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 209 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f88b4d5693979..a3b30943c8e7d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  assert(LD->getExtensionType() == ISD::NON_EXTLOAD &&
+ "Extended load during type legalization!");
+  SDLoc dl(LD);
+  EVT VT = LD->getValueType(0);
+  EVT LoVT, HiVT;
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(VT);
+
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits());
+  EVT MemIntVT =
+  EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, MemIntVT, IntVT, Ch,
+  Ptr, LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::TRUNCATE, dl, LoIntVT, ALD);
+  SDValue ExtractHi =
+  DAG.getNode(ISD::SRL, dl, IntVT, ALD,
+  DAG.getIntPtrConstant(VT.getSizeInBits() / 2, dl));
+  ExtractHi = DAG.getNode(ISD::TRUNCATE, dl, HiIntVT, ExtractHi);
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 3cf9e3c1a8dfa..6e2e9d4b21891 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -205,6 +205,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl (%rdi), %eax
+; CHECK0-NEXT:movl %eax, %ecx
+; CHECK0-NEXT:shrl 

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 4fccbd69f8ee5b6f16b08da38cb65d989450c8aa Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  37 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 209 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f88b4d5693979..a3b30943c8e7d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  assert(LD->getExtensionType() == ISD::NON_EXTLOAD &&
+ "Extended load during type legalization!");
+  SDLoc dl(LD);
+  EVT VT = LD->getValueType(0);
+  EVT LoVT, HiVT;
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(VT);
+
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits());
+  EVT MemIntVT =
+  EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, MemIntVT, IntVT, Ch,
+  Ptr, LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::TRUNCATE, dl, LoIntVT, ALD);
+  SDValue ExtractHi =
+  DAG.getNode(ISD::SRL, dl, IntVT, ALD,
+  DAG.getIntPtrConstant(VT.getSizeInBits() / 2, dl));
+  ExtractHi = DAG.getNode(ISD::TRUNCATE, dl, HiIntVT, ExtractHi);
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 3cf9e3c1a8dfa..6e2e9d4b21891 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -205,6 +205,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl (%rdi), %eax
+; CHECK0-NEXT:movl %eax, %ecx
+; CHECK0-NEXT:shrl 

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 717ea645df30178ab0873da4191d41bc7ba4b761 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 -
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 158 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..70f59eafc6ecb 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  auto *PtrTy = dyn_cast(I->getType()->getScalarType());
+  auto *VTy = dyn_cast(I->getType());
+  if (VTy && PtrTy && !Result->getType()->isVectorTy()) {
+unsigned AS = PtrTy->getAddressSpace();
+Value *BC = Builder.CreateBitCast(
+Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS)));
+V = Builder.CreateIntToPtr(BC, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..eaa2ffd9b2731 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index f72970d12b6eb..d3027e799 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -382,6 +382,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LABEL: atomic_vec4_i8:
 ; CHECK3:   ## %bb.0:
@@ -405,6 +420,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
   ret <4 x i16> %ret
 }
 
+define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_ptr270:
+; CHECK:   ## %b

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From 684a54284458cae0b700737126715384b9fddab1 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   2 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  65 ++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++
 4 files changed, 65 insertions(+), 153 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 87b6914f8a0ee..40550d96a5b3d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1873,7 +1873,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index bbf1b0fd590ef..d6e5cd1078776 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12215,7 +12215,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 3ab548f64d04c..409a8c7e73c0e 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7193,15 +7193,19 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, 
MVT VT, const SDLoc &dl,
 }
 
 // Recurse to find a LoadSDNode source and the accumulated ByteOffest.
-static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
-  if (ISD::isNON_EXTLoad(Elt.getNode())) {
-auto *BaseLd = cast(Elt);
-if (!BaseLd->isSimple())
-  return false;
+static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) {
+  if (auto *BaseLd = dyn_cast(Elt)) {
 Ld = BaseLd;
 ByteOffset = 0;
 return true;
-  }
+  } else if (auto *BaseLd = dyn_cast(Elt))
+if (ISD::isNON_EXTLoad(Elt.getNode())) {
+  if (!BaseLd->isSimple())
+return false;
+  Ld = BaseLd;
+  ByteOffset = 0;
+  return true;
+}
 
   switch (Elt.getOpcode()) {
   case ISD::BITCAST:
@@ -7254,7 +7258,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
   APInt ZeroMask = APInt::getZero(NumElems);
   APInt UndefMask = APInt::getZero(NumElems);
 
-  SmallVector Loads(NumElems, nullptr);
+  SmallVector Loads(NumElems, nullptr);
   SmallVector ByteOffsets(NumElems, 0);
 
   // For each element in the initializer, see if we've found a load, zero or an
@@ -7304,7 +7308,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
   EVT EltBaseVT = EltBase.getValueType();
   assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() &&
  "Register/Memory size mismatch");
-  LoadSDNode *LDBase = Loads[FirstLoadedElt];
+  MemSDNode *LDBase = Loads[FirstLoadedElt];
   assert(LDBase && "Did not find base load for merging consecutive loads");
   unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits();
   unsigned BaseSizeInBytes = BaseSizeInBits / 8;
@@ -7318,16 +7322,18 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
 
   // Check to see if the element's load is consecutive to the base load
   // or offset from a previous (already checked) load.
-  auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) {
-LoadSDNode *Ld = Loads[EltIdx];
+  auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) {
+MemSDNode *Ld = Loads[EltIdx];
 int64_t ByteOffset = ByteOffsets[EltIdx];
 if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) {
   int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes);
   return (0 <= BaseIdx && BaseIdx < (int)NumElems && LoadMask[BaseIdx] &&
   Loads[Bas

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From 684a54284458cae0b700737126715384b9fddab1 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   2 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  65 ++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++
 4 files changed, 65 insertions(+), 153 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 87b6914f8a0ee..40550d96a5b3d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1873,7 +1873,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index bbf1b0fd590ef..d6e5cd1078776 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12215,7 +12215,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 3ab548f64d04c..409a8c7e73c0e 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7193,15 +7193,19 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, 
MVT VT, const SDLoc &dl,
 }
 
 // Recurse to find a LoadSDNode source and the accumulated ByteOffest.
-static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
-  if (ISD::isNON_EXTLoad(Elt.getNode())) {
-auto *BaseLd = cast(Elt);
-if (!BaseLd->isSimple())
-  return false;
+static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) {
+  if (auto *BaseLd = dyn_cast(Elt)) {
 Ld = BaseLd;
 ByteOffset = 0;
 return true;
-  }
+  } else if (auto *BaseLd = dyn_cast(Elt))
+if (ISD::isNON_EXTLoad(Elt.getNode())) {
+  if (!BaseLd->isSimple())
+return false;
+  Ld = BaseLd;
+  ByteOffset = 0;
+  return true;
+}
 
   switch (Elt.getOpcode()) {
   case ISD::BITCAST:
@@ -7254,7 +7258,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
   APInt ZeroMask = APInt::getZero(NumElems);
   APInt UndefMask = APInt::getZero(NumElems);
 
-  SmallVector Loads(NumElems, nullptr);
+  SmallVector Loads(NumElems, nullptr);
   SmallVector ByteOffsets(NumElems, 0);
 
   // For each element in the initializer, see if we've found a load, zero or an
@@ -7304,7 +7308,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
   EVT EltBaseVT = EltBase.getValueType();
   assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() &&
  "Register/Memory size mismatch");
-  LoadSDNode *LDBase = Loads[FirstLoadedElt];
+  MemSDNode *LDBase = Loads[FirstLoadedElt];
   assert(LDBase && "Did not find base load for merging consecutive loads");
   unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits();
   unsigned BaseSizeInBytes = BaseSizeInBits / 8;
@@ -7318,16 +7322,18 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
 
   // Check to see if the element's load is consecutive to the base load
   // or offset from a previous (already checked) load.
-  auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) {
-LoadSDNode *Ld = Loads[EltIdx];
+  auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) {
+MemSDNode *Ld = Loads[EltIdx];
 int64_t ByteOffset = ByteOffsets[EltIdx];
 if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) {
   int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes);
   return (0 <= BaseIdx && BaseIdx < (int)NumElems && LoadMask[BaseIdx] &&
   Loads[Bas

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/125432

>From 684a54284458cae0b700737126715384b9fddab1 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 31 Jan 2025 13:12:56 -0500
Subject: [PATCH] [SelectionDAG][X86] Remove unused elements from atomic
 vector.

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of EXTRACT_ELEMENT.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |   2 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  65 ++--
 llvm/test/CodeGen/X86/atomic-load-store.ll| 149 ++
 4 files changed, 65 insertions(+), 153 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 87b6914f8a0ee..40550d96a5b3d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1873,7 +1873,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will 
have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index bbf1b0fd590ef..d6e5cd1078776 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12215,7 +12215,7 @@ SDValue 
SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
   assert(isa(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 3ab548f64d04c..409a8c7e73c0e 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7193,15 +7193,19 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, 
MVT VT, const SDLoc &dl,
 }
 
 // Recurse to find a LoadSDNode source and the accumulated ByteOffest.
-static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
-  if (ISD::isNON_EXTLoad(Elt.getNode())) {
-auto *BaseLd = cast(Elt);
-if (!BaseLd->isSimple())
-  return false;
+static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) {
+  if (auto *BaseLd = dyn_cast(Elt)) {
 Ld = BaseLd;
 ByteOffset = 0;
 return true;
-  }
+  } else if (auto *BaseLd = dyn_cast(Elt))
+if (ISD::isNON_EXTLoad(Elt.getNode())) {
+  if (!BaseLd->isSimple())
+return false;
+  Ld = BaseLd;
+  ByteOffset = 0;
+  return true;
+}
 
   switch (Elt.getOpcode()) {
   case ISD::BITCAST:
@@ -7254,7 +7258,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
   APInt ZeroMask = APInt::getZero(NumElems);
   APInt UndefMask = APInt::getZero(NumElems);
 
-  SmallVector Loads(NumElems, nullptr);
+  SmallVector Loads(NumElems, nullptr);
   SmallVector ByteOffsets(NumElems, 0);
 
   // For each element in the initializer, see if we've found a load, zero or an
@@ -7304,7 +7308,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
   EVT EltBaseVT = EltBase.getValueType();
   assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() &&
  "Register/Memory size mismatch");
-  LoadSDNode *LDBase = Loads[FirstLoadedElt];
+  MemSDNode *LDBase = Loads[FirstLoadedElt];
   assert(LDBase && "Did not find base load for merging consecutive loads");
   unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits();
   unsigned BaseSizeInBytes = BaseSizeInBits / 8;
@@ -7318,16 +7322,18 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, 
ArrayRef Elts,
 
   // Check to see if the element's load is consecutive to the base load
   // or offset from a previous (already checked) load.
-  auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) {
-LoadSDNode *Ld = Loads[EltIdx];
+  auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) {
+MemSDNode *Ld = Loads[EltIdx];
 int64_t ByteOffset = ByteOffsets[EltIdx];
 if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) {
   int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes);
   return (0 <= BaseIdx && BaseIdx < (int)NumElems && LoadMask[BaseIdx] &&
   Loads[Bas

[llvm-branch-commits] [llvm] [SelectionDAG] Widen <2 x T> vector types for atomic load (PR #120598)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 730b40b39dfa3ed5d802bbb1270d49273a5de7fb Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG] Widen <2 x T> vector types for atomic load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 97 ++-
 llvm/test/CodeGen/X86/atomic-load-store.ll| 78 +++
 3 files changed, 153 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 8eee7a4c61fe6..f88b4d5693979 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -6014,6 +6017,74 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue coerceLoadedValue(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+ TypeSize LdWidth, TypeSize FirstVTWidth,
+ SDLoc dl, SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  }
+  assert(FirstVT == WidenVT);
+  return LdOp;
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+ "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+ "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+  findMemType(DAG, TLI, LdWidth.getKnownMinValue(), WidenVT, /*LdAlign=*/0,
+  WidthDiff.getKnownMinValue());
+
+  if (!FirstVT)
+return SDValue();
+
+  SmallVector MemVTs;
+  TypeSize FirstVTWidth = FirstVT->getSizeInBits();
+
+  SDValue LdOp = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, *FirstVT, *FirstVT,
+   Chain, BasePtr, LD->getMemOperand());
+
+  // Load the element with one instruction.
+  SDValue Result = coerceLoadedValue(LdOp, *FirstVT, WidenVT, LdWidth,
+ FirstVTWidth, dl, DAG);
+
+  // Modified the chain - switch anything that used the old chain to use
+  // the new 

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 717ea645df30178ab0873da4191d41bc7ba4b761 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 -
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 158 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..70f59eafc6ecb 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  auto *PtrTy = dyn_cast(I->getType()->getScalarType());
+  auto *VTy = dyn_cast(I->getType());
+  if (VTy && PtrTy && !Result->getType()->isVectorTy()) {
+unsigned AS = PtrTy->getAddressSpace();
+Value *BC = Builder.CreateBitCast(
+Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS)));
+V = Builder.CreateIntToPtr(BC, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..eaa2ffd9b2731 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index f72970d12b6eb..d3027e799 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -382,6 +382,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LABEL: atomic_vec4_i8:
 ; CHECK3:   ## %bb.0:
@@ -405,6 +420,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
   ret <4 x i16> %ret
 }
 
+define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_ptr270:
+; CHECK:   ## %b

[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #120640)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120640

>From 4fccbd69f8ee5b6f16b08da38cb65d989450c8aa Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 16:25:55 -0500
Subject: [PATCH] [SelectionDAG] Split vector types for atomic load

Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.

commit-id:3a045357
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  37 
 llvm/test/CodeGen/X86/atomic-load-store.ll| 171 ++
 3 files changed, 209 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index bdfa5f7741ad3..d8f402f529632 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -960,6 +960,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   void SplitVecRes_FPOp_MultiType(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_IS_FPCLASS(SDNode *N, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
   void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f88b4d5693979..a3b30943c8e7d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1172,6 +1172,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SplitVecRes_STEP_VECTOR(N, Lo, Hi);
 break;
   case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
+  case ISD::ATOMIC_LOAD:
+SplitVecRes_ATOMIC_LOAD(cast(N), Lo, Hi);
+break;
   case ISD::LOAD:
 SplitVecRes_LOAD(cast(N), Lo, Hi);
 break;
@@ -1421,6 +1424,40 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, 
unsigned ResNo) {
 SetSplitVector(SDValue(N, ResNo), Lo, Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_ATOMIC_LOAD(AtomicSDNode *LD, SDValue &Lo,
+   SDValue &Hi) {
+  assert(LD->getExtensionType() == ISD::NON_EXTLOAD &&
+ "Extended load during type legalization!");
+  SDLoc dl(LD);
+  EVT VT = LD->getValueType(0);
+  EVT LoVT, HiVT;
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(VT);
+
+  SDValue Ch = LD->getChain();
+  SDValue Ptr = LD->getBasePtr();
+
+  EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits());
+  EVT MemIntVT =
+  EVT::getIntegerVT(*DAG.getContext(), LD->getMemoryVT().getSizeInBits());
+  SDValue ALD = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, MemIntVT, IntVT, Ch,
+  Ptr, LD->getMemOperand());
+
+  EVT LoIntVT = EVT::getIntegerVT(*DAG.getContext(), LoVT.getSizeInBits());
+  EVT HiIntVT = EVT::getIntegerVT(*DAG.getContext(), HiVT.getSizeInBits());
+  SDValue ExtractLo = DAG.getNode(ISD::TRUNCATE, dl, LoIntVT, ALD);
+  SDValue ExtractHi =
+  DAG.getNode(ISD::SRL, dl, IntVT, ALD,
+  DAG.getIntPtrConstant(VT.getSizeInBits() / 2, dl));
+  ExtractHi = DAG.getNode(ISD::TRUNCATE, dl, HiIntVT, ExtractHi);
+
+  Lo = DAG.getBitcast(LoVT, ExtractLo);
+  Hi = DAG.getBitcast(HiVT, ExtractHi);
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(LD, 1), ALD.getValue(1));
+}
+
 void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
 MachinePointerInfo &MPI, SDValue &Ptr,
 uint64_t *ScaledOffset) {
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 3cf9e3c1a8dfa..6e2e9d4b21891 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -205,6 +205,68 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
   ret <2 x float> %ret
 }
 
+define <2 x half> @atomic_vec2_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec2_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:shrl $16, %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm1
+; CHECK3-NEXT:punpcklwd {{.*#+}} xmm0 = 
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movl (%rdi), %eax
+; CHECK0-NEXT:movl %eax, %ecx
+; CHECK0-NEXT:shrl 

[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)

2025-05-11 Thread Owen Rodley via llvm-branch-commits

https://github.com/orodley edited 
https://github.com/llvm/llvm-project/pull/139497
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/139194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Print heatmap section scores in perf2bolt (PR #139194)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/139194


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Print heatmap section scores in perf2bolt (PR #139194)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/139194


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG] Legalize <1 x T> vector types for atomic load (PR #120385)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120385

>From 192b17cf42a818acb1f10c2a81481e58b25ff238 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:37:17 -0500
Subject: [PATCH] [SelectionDAG] Legalize <1 x T> vector types for atomic load

`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

commit-id:5c36cc8c
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  15 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 121 +-
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..89ea7ef4dbe89 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -874,6 +874,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
   SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
+  SDValue ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue ScalarizeVecRes_SCALAR_TO_VECTOR(SDNode *N);
   SDValue ScalarizeVecRes_VSELECT(SDNode *N);
   SDValue ScalarizeVecRes_SELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index d0b69b88748a9..8eee7a4c61fe6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -64,6 +64,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, 
unsigned ResNo) {
 R = ScalarizeVecRes_UnaryOpWithExtraInput(N);
 break;
   case ISD::INSERT_VECTOR_ELT: R = ScalarizeVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+R = ScalarizeVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:   R = 
ScalarizeVecRes_LOAD(cast(N));break;
   case ISD::SCALAR_TO_VECTOR:  R = ScalarizeVecRes_SCALAR_TO_VECTOR(N); break;
   case ISD::SIGN_EXTEND_INREG: R = ScalarizeVecRes_InregOp(N); break;
@@ -458,6 +461,18 @@ SDValue 
DAGTypeLegalizer::ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N) {
   return Op;
 }
 
+SDValue DAGTypeLegalizer::ScalarizeVecRes_ATOMIC_LOAD(AtomicSDNode *N) {
+  SDValue Result = DAG.getAtomicLoad(
+  ISD::NON_EXTLOAD, SDLoc(N), N->getMemoryVT().getVectorElementType(),
+  N->getValueType(0).getVectorElementType(), N->getChain(), 
N->getBasePtr(),
+  N->getMemOperand());
+
+  // Legalize the chain result - switch anything that used the old chain to
+  // use the new one.
+  ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
+  return Result;
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_LOAD(LoadSDNode *N) {
   assert(N->isUnindexed() && "Indexed vector load?");
 
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 5bce4401f7bdb..d23cfb89f9fc8 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s
-; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs | 
FileCheck %s --check-prefixes=CHECK,CHECK3
+; RUN: llc < %s -mtriple=x86_64-apple-macosx10.7.0 -verify-machineinstrs -O0 | 
FileCheck %s --check-prefixes=CHECK,CHECK0
 
 define void @test1(ptr %ptr, i32 %val1) {
 ; CHECK-LABEL: test1:
@@ -28,3 +28,120 @@ define i32 @test3(ptr %ptr) {
   %val = load atomic i32, ptr %ptr seq_cst, align 4
   ret i32 %val
 }
+
+define <1 x i32> @atomic_vec1_i32(ptr %x) {
+; CHECK-LABEL: atomic_vec1_i32:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movl (%rdi), %eax
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x i32>, ptr %x acquire, align 4
+  ret <1 x i32> %ret
+}
+
+define <1 x i8> @atomic_vec1_i8(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzbl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movb (%rdi), %al
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i8>, ptr %x acquire, align 1
+  ret <1 x i8> %ret
+}
+
+define <1 x i16> @atomic_vec1_i16(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %ax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i16>, ptr %x acquire, align 2
+  ret <1 x i16> %ret
+}
+
+define <1 x i32> @atomic_vec1_i8_zext(ptr %x) {
+; CHECK3-LABEL: atomic_ve

[llvm-branch-commits] [llvm] [X86] Add atomic vector tests for unaligned >1 sizes. (PR #120387)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120387

>From d212710191a62be5ad7257f8825b71230d715041 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:40:32 -0500
Subject: [PATCH] [X86] Add atomic vector tests for unaligned >1 sizes.

Unaligned atomic vectors with size >1 are lowered to calls.
Adding their tests separately here.

commit-id:a06a5cc6
---
 llvm/test/CodeGen/X86/atomic-load-store.ll | 253 +
 1 file changed, 253 insertions(+)

diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 6efcbb80c0ce6..39e9fdfa5e62b 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -146,6 +146,34 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   ret <1 x i64> %ret
 }
 
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_ptr:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_ptr:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
+
 define <1 x half> @atomic_vec1_half(ptr %x) {
 ; CHECK3-LABEL: atomic_vec1_half:
 ; CHECK3:   ## %bb.0:
@@ -182,3 +210,228 @@ define <1 x double> @atomic_vec1_double_align(ptr %x) 
nounwind {
   %ret = load atomic <1 x double>, ptr %x acquire, align 8
   ret <1 x double> %ret
 }
+
+define <1 x i64> @atomic_vec1_i64(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_i64:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movq (%rsp), %rax
+; CHECK3-NEXT:popq %rcx
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_i64:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq (%rsp), %rax
+; CHECK0-NEXT:popq %rcx
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x i64>, ptr %x acquire, align 4
+  ret <1 x i64> %ret
+}
+
+define <1 x double> @atomic_vec1_double(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec1_double:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_double:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 4
+  ret <1 x double> %ret
+}
+
+define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
+; CHECK3-LABEL: atomic_vec2_i32:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:pushq %rax
+; CHECK3-NEXT:movq %rdi, %rsi
+; CHECK3-NEXT:movq %rsp, %rdx
+; CHECK3-NEXT:movl $8, %edi
+; CHECK3-NEXT:movl $2, %ecx
+; CHECK3-NEXT:callq ___atomic_load
+; CHECK3-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK3-NEXT:popq %rax
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i32:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:pushq %rax
+; CHECK0-NEXT:movq %rdi, %rsi
+; CHECK0-NEXT:movl $8, %edi
+; CHECK0-NEXT:movq %rsp, %rdx
+; CHECK0-NEXT:movl $2, %ecx
+; CHECK0-NEXT:callq ___atomic_load
+; CHECK0-NEXT:movq {{.*#+}} xmm0 = mem[0],zero
+; CHECK0-NEXT:popq %rax
+; CHECK0-NEXT:retq
+  %ret = load atomic <2 x i32>, ptr %x acquire, align 4
+  ret <2 x i32> %ret
+}
+
+define <4 x float> @atomic_vec4_float_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_float_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[

[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/138635

>From 6312f8c4dbc5272b5f2c741a46fe7623ace49bf8 Mon Sep 17 00:00:00 2001
From: jofernau_amdeng 
Date: Tue, 6 May 2025 01:48:11 -0400
Subject: [PATCH] [X86] Remove extra MOV after widening atomic load

This change adds patterns to optimize out an extra MOV
present after widening the atomic load.

commit-id:45989503
---
 llvm/lib/Target/X86/X86InstrCompiler.td|  7 
 llvm/test/CodeGen/X86/atomic-load-store.ll | 40 --
 2 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td 
b/llvm/lib/Target/X86/X86InstrCompiler.td
index efa1e8bd7f3e3..786d0567280f9 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1204,6 +1204,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), 
(MOV16rm addr:$src)>;
 def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>;
 def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>;
 
+def : Pat<(v4i32 (scalar_to_vector (i32 (zext (i16 (atomic_load_16 
addr:$src)),
+   (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i8>
+def : Pat<(v4i32 (scalar_to_vector (i32 (atomic_load_32 addr:$src,
+   (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i16>
+def : Pat<(v2i64 (scalar_to_vector (i64 (atomic_load_64 addr:$src,
+   (MOV64toPQIrm  addr:$src)>; // load atomic <2 x i32,float>
+
 // Floating point loads/stores.
 def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst),
   (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>;
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 9ee8b4fc5ac7f..3cf9e3c1a8dfa 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -165,11 +165,15 @@ define <2 x i8> @atomic_vec2_i8(ptr %x) {
 }
 
 define <2 x i16> @atomic_vec2_i16(ptr %x) {
-; CHECK-LABEL: atomic_vec2_i16:
-; CHECK:   ## %bb.0:
-; CHECK-NEXT:movl (%rdi), %eax
-; CHECK-NEXT:movd %eax, %xmm0
-; CHECK-NEXT:retq
+; CHECK3-LABEL: atomic_vec2_i16:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec2_i16:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK0-NEXT:retq
   %ret = load atomic <2 x i16>, ptr %x acquire, align 4
   ret <2 x i16> %ret
 }
@@ -177,8 +181,7 @@ define <2 x i16> @atomic_vec2_i16(ptr %x) {
 define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_ptr270:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x ptr addrspace(270)>, ptr %x acquire, align 8
   ret <2 x ptr addrspace(270)> %ret
@@ -187,8 +190,7 @@ define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) 
{
 define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_i32_align:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x i32>, ptr %x acquire, align 8
   ret <2 x i32> %ret
@@ -197,8 +199,7 @@ define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 define <2 x float> @atomic_vec2_float_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_float_align:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <2 x float>, ptr %x acquire, align 8
   ret <2 x float> %ret
@@ -354,11 +355,15 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
 }
 
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
-; CHECK-LABEL: atomic_vec4_i8:
-; CHECK:   ## %bb.0:
-; CHECK-NEXT:movl (%rdi), %eax
-; CHECK-NEXT:movd %eax, %xmm0
-; CHECK-NEXT:retq
+; CHECK3-LABEL: atomic_vec4_i8:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec4_i8:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK0-NEXT:retq
   %ret = load atomic <4 x i8>, ptr %x acquire, align 4
   ret <4 x i8> %ret
 }
@@ -366,8 +371,7 @@ define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
 ; CHECK-LABEL: atomic_vec4_i16:
 ; CHECK:   ## %bb.0:
-; CHECK-NEXT:movq (%rdi), %rax
-; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:movq (%rdi), %xmm0
 ; CHECK-NEXT:retq
   %ret = load atomic <4 x i16>, ptr %x acquire, align 8
   ret <4 x i16> %ret

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bi

[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From ce52d5295249681faf782a15ebe56599152e8491 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index ac4fb157a6026..3ab548f64d04c 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2653,6 +2653,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc8..6efcbb80c0ce6 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Manage atomic load of fp -> int promotion in DAG (PR #120386)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120386

>From ce52d5295249681faf782a15ebe56599152e8491 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Wed, 18 Dec 2024 03:38:23 -0500
Subject: [PATCH] [X86] Manage atomic load of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

commit-id:f9d761c5
---
 llvm/lib/Target/X86/X86ISelLowering.cpp|  4 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll | 37 ++
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index ac4fb157a6026..3ab548f64d04c 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2653,6 +2653,10 @@ X86TargetLowering::X86TargetLowering(const 
X86TargetMachine &TM,
 setOperationAction(Op, MVT::f32, Promote);
   }
 
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f16, MVT::i16);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f32, MVT::i32);
+  setOperationPromotedToType(ISD::ATOMIC_LOAD, MVT::f64, MVT::i64);
+
   // We have target-specific dag combine patterns for the following nodes:
   setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::SCALAR_TO_VECTOR,
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index d23cfb89f9fc8..6efcbb80c0ce6 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -145,3 +145,40 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
   %ret = load atomic <1 x i64>, ptr %x acquire, align 8
   ret <1 x i64> %ret
 }
+
+define <1 x half> @atomic_vec1_half(ptr %x) {
+; CHECK3-LABEL: atomic_vec1_half:
+; CHECK3:   ## %bb.0:
+; CHECK3-NEXT:movzwl (%rdi), %eax
+; CHECK3-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK3-NEXT:retq
+;
+; CHECK0-LABEL: atomic_vec1_half:
+; CHECK0:   ## %bb.0:
+; CHECK0-NEXT:movw (%rdi), %cx
+; CHECK0-NEXT:## implicit-def: $eax
+; CHECK0-NEXT:movw %cx, %ax
+; CHECK0-NEXT:## implicit-def: $xmm0
+; CHECK0-NEXT:pinsrw $0, %eax, %xmm0
+; CHECK0-NEXT:retq
+  %ret = load atomic <1 x half>, ptr %x acquire, align 2
+  ret <1 x half> %ret
+}
+
+define <1 x float> @atomic_vec1_float(ptr %x) {
+; CHECK-LABEL: atomic_vec1_float:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x float>, ptr %x acquire, align 4
+  ret <1 x float> %ret
+}
+
+define <1 x double> @atomic_vec1_double_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec1_double_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:retq
+  %ret = load atomic <1 x double>, ptr %x acquire, align 8
+  ret <1 x double> %ret
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #120716)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120716

>From 717ea645df30178ab0873da4191d41bc7ba4b761 Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Fri, 20 Dec 2024 06:14:28 -0500
Subject: [PATCH] [AtomicExpand] Add bitcasts when expanding load atomic vector

AtomicExpand fails for aligned `load atomic ` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered.

commit-id:f430c1af
---
 llvm/lib/CodeGen/AtomicExpandPass.cpp | 15 -
 llvm/test/CodeGen/ARM/atomic-load-store.ll| 51 +++
 llvm/test/CodeGen/X86/atomic-load-store.ll| 30 +
 .../X86/expand-atomic-non-integer.ll  | 65 +++
 4 files changed, 158 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index c376de877ac7d..70f59eafc6ecb 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -2066,9 +2066,18 @@ bool AtomicExpandImpl::expandAtomicOpToLibcall(
 I->replaceAllUsesWith(V);
   } else if (HasResult) {
 Value *V;
-if (UseSizedLibcall)
-  V = Builder.CreateBitOrPointerCast(Result, I->getType());
-else {
+if (UseSizedLibcall) {
+  // Add bitcasts from Result's scalar type to I's  vector type
+  auto *PtrTy = dyn_cast(I->getType()->getScalarType());
+  auto *VTy = dyn_cast(I->getType());
+  if (VTy && PtrTy && !Result->getType()->isVectorTy()) {
+unsigned AS = PtrTy->getAddressSpace();
+Value *BC = Builder.CreateBitCast(
+Result, VTy->getWithNewType(DL.getIntPtrType(Ctx, AS)));
+V = Builder.CreateIntToPtr(BC, I->getType());
+  } else
+V = Builder.CreateBitOrPointerCast(Result, I->getType());
+} else {
   V = Builder.CreateAlignedLoad(I->getType(), AllocaResult,
 AllocaAlignment);
   Builder.CreateLifetimeEnd(AllocaResult, SizeVal64);
diff --git a/llvm/test/CodeGen/ARM/atomic-load-store.ll 
b/llvm/test/CodeGen/ARM/atomic-load-store.ll
index 560dfde356c29..eaa2ffd9b2731 100644
--- a/llvm/test/CodeGen/ARM/atomic-load-store.ll
+++ b/llvm/test/CodeGen/ARM/atomic-load-store.ll
@@ -983,3 +983,54 @@ define void @store_atomic_f64__seq_cst(ptr %ptr, double 
%val1) {
   store atomic double %val1, ptr %ptr seq_cst, align 8
   ret void
 }
+
+define <1 x ptr> @atomic_vec1_ptr(ptr %x) #0 {
+; ARM-LABEL: atomic_vec1_ptr:
+; ARM:   @ %bb.0:
+; ARM-NEXT:ldr r0, [r0]
+; ARM-NEXT:dmb ish
+; ARM-NEXT:bx lr
+;
+; ARMOPTNONE-LABEL: atomic_vec1_ptr:
+; ARMOPTNONE:   @ %bb.0:
+; ARMOPTNONE-NEXT:ldr r0, [r0]
+; ARMOPTNONE-NEXT:dmb ish
+; ARMOPTNONE-NEXT:bx lr
+;
+; THUMBTWO-LABEL: atomic_vec1_ptr:
+; THUMBTWO:   @ %bb.0:
+; THUMBTWO-NEXT:ldr r0, [r0]
+; THUMBTWO-NEXT:dmb ish
+; THUMBTWO-NEXT:bx lr
+;
+; THUMBONE-LABEL: atomic_vec1_ptr:
+; THUMBONE:   @ %bb.0:
+; THUMBONE-NEXT:push {r7, lr}
+; THUMBONE-NEXT:movs r1, #0
+; THUMBONE-NEXT:mov r2, r1
+; THUMBONE-NEXT:bl __sync_val_compare_and_swap_4
+; THUMBONE-NEXT:pop {r7, pc}
+;
+; ARMV4-LABEL: atomic_vec1_ptr:
+; ARMV4:   @ %bb.0:
+; ARMV4-NEXT:push {r11, lr}
+; ARMV4-NEXT:mov r1, #2
+; ARMV4-NEXT:bl __atomic_load_4
+; ARMV4-NEXT:pop {r11, lr}
+; ARMV4-NEXT:mov pc, lr
+;
+; ARMV6-LABEL: atomic_vec1_ptr:
+; ARMV6:   @ %bb.0:
+; ARMV6-NEXT:ldr r0, [r0]
+; ARMV6-NEXT:mov r1, #0
+; ARMV6-NEXT:mcr p15, #0, r1, c7, c10, #5
+; ARMV6-NEXT:bx lr
+;
+; THUMBM-LABEL: atomic_vec1_ptr:
+; THUMBM:   @ %bb.0:
+; THUMBM-NEXT:ldr r0, [r0]
+; THUMBM-NEXT:dmb sy
+; THUMBM-NEXT:bx lr
+  %ret = load atomic <1 x ptr>, ptr %x acquire, align 4
+  ret <1 x ptr> %ret
+}
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll 
b/llvm/test/CodeGen/X86/atomic-load-store.ll
index f72970d12b6eb..d3027e799 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -382,6 +382,21 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
   ret <2 x i32> %ret
 }
 
+define <2 x ptr> @atomic_vec2_ptr_align(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec2_ptr_align:
+; CHECK:   ## %bb.0:
+; CHECK-NEXT:pushq %rax
+; CHECK-NEXT:movl $2, %esi
+; CHECK-NEXT:callq ___atomic_load_16
+; CHECK-NEXT:movq %rdx, %xmm1
+; CHECK-NEXT:movq %rax, %xmm0
+; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:popq %rax
+; CHECK-NEXT:retq
+  %ret = load atomic <2 x ptr>, ptr %x acquire, align 16
+  ret <2 x ptr> %ret
+}
+
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 ; CHECK3-LABEL: atomic_vec4_i8:
 ; CHECK3:   ## %bb.0:
@@ -405,6 +420,21 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
   ret <4 x i16> %ret
 }
 
+define <4 x ptr addrspace(270)> @atomic_vec4_ptr270(ptr %x) nounwind {
+; CHECK-LABEL: atomic_vec4_ptr270:
+; CHECK:   ## %b

[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-risc-v

Author: Iris Shi (el-ev)


Changes



---

Patch is 24.97 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/139495.diff


14 Files Affected:

- (modified) llvm/lib/Target/RISCV/RISCVInstrInfoQ.td (+61-39) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedGenericOOO.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedRocket.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedSiFive7.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedSiFiveP400.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedSiFiveP500.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR345.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR7.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedTTAscalonD8.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedXiangShanNanHu.td (+1) 
- (modified) llvm/lib/Target/RISCV/RISCVSchedule.td (+85-3) 


``diff
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td 
b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td
index 7d216b5dd87c0..8cc965ccc515d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td
@@ -25,97 +25,119 @@ defvar QExtsRV64 = [QExt];
 
//===--===//
 
 let Predicates = [HasStdExtQ] in {
-  // def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>;
-  let hasSideEffects = 0, mayLoad = 1, mayStore = 0 in
-  def FLQ : RVInstI<0b100, OPC_LOAD_FP, (outs FPR128:$rd),
-  (ins GPRMem:$rs1, simm12:$imm12),
-  "flq", "$rd, ${imm12}(${rs1})">;
+  def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>;
+
   // Operands for stores are in the order srcreg, base, offset rather than
   // reflecting the order these fields are specified in the instruction
   // encoding.
-  // def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>;
-  let hasSideEffects = 0, mayLoad = 0, mayStore = 1 in
-  def FSQ : RVInstS<0b100, OPC_STORE_FP, (outs),
-  (ins FPR128:$rs2, GPRMem:$rs1, simm12:$imm12),
-  "fsq", "$rs2, ${imm12}(${rs1})">;
+  def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>;
 } // Predicates = [HasStdExtQ]
 
 foreach Ext = QExts in {
-  defm FMADD_Q : FPFMA_rrr_frm_m;
-  defm FMSUB_Q : FPFMA_rrr_frm_m;
-  defm FNMSUB_Q : FPFMA_rrr_frm_m;
-  defm FNMADD_Q : FPFMA_rrr_frm_m;
+  let SchedRW = [WriteFMA128, ReadFMA128, ReadFMA128, ReadFMA128Addend] in {
+defm FMADD_Q : FPFMA_rrr_frm_m;
+defm FMSUB_Q : FPFMA_rrr_frm_m;
+defm FNMSUB_Q : FPFMA_rrr_frm_m;
+defm FNMADD_Q : FPFMA_rrr_frm_m;
+  }
 
-  defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>;
-  defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>;
+  let SchedRW = [WriteFAdd128, ReadFAdd128, ReadFAdd128] in {
+defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>;
+defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>;
+  }
 
+  let SchedRW = [WriteFMul128, ReadFMul128, ReadFMul128] in 
   defm FMUL_Q : FPALU_rr_frm_m<0b0001011, "fmul.q", Ext>;
 
+  let SchedRW = [WriteFDiv128, ReadFDiv128, ReadFDiv128] in 
   defm FDIV_Q : FPALU_rr_frm_m<0b000, "fdiv.q", Ext>;
 
   defm FSQRT_Q : FPUnaryOp_r_frm_m<0b010, 0b0, Ext, Ext.PrimaryTy,
-   Ext.PrimaryTy, "fsqrt.q">;
+   Ext.PrimaryTy, "fsqrt.q">,
+ Sched<[WriteFSqrt128, ReadFSqrt128]>;
 
-  let mayRaiseFPException = 0 in {
+  let SchedRW = [WriteFSGNJ128, ReadFSGNJ128, ReadFSGNJ128],
+  mayRaiseFPException = 0 in {
 defm FSGNJ_Q : FPALU_rr_m<0b0010011, 0b000, "fsgnj.q", Ext>;
 defm FSGNJN_Q : FPALU_rr_m<0b0010011, 0b001, "fsgnjn.q", Ext>;
 defm FSGNJX_Q : FPALU_rr_m<0b0010011, 0b010, "fsgnjx.q", Ext>;
   }
 
-  defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>;
-  defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>;
+  let SchedRW = [WriteFMinMax128, ReadFMinMax128, ReadFMinMax128] in {
+defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>;
+defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>;
+  }
 
   defm FCVT_S_Q : FPUnaryOp_r_frm_m<0b010, 0b00011, Ext, Ext.F32Ty,
-Ext.PrimaryTy, "fcvt.s.q">;
+Ext.PrimaryTy, "fcvt.s.q">,
+  Sched<[WriteFCvtF128ToF32, ReadFCvtF128ToF32]>;
 
   defm FCVT_Q_S : FPUnaryOp_r_frmlegacy_m<0b0100011, 0b0, Ext,
-  Ext.PrimaryTy, Ext.F32Ty, 
"fcvt.q.s">;
+  Ext.PrimaryTy, Ext.F32Ty, 
+  "fcvt.q.s">,
+  Sched<[WriteFCvtF32ToF128, Read

[llvm-branch-commits] [llvm] [SelectionDAG] Widen <2 x T> vector types for atomic load (PR #120598)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn updated 
https://github.com/llvm/llvm-project/pull/120598

>From 730b40b39dfa3ed5d802bbb1270d49273a5de7fb Mon Sep 17 00:00:00 2001
From: jofrn 
Date: Thu, 19 Dec 2024 11:19:39 -0500
Subject: [PATCH] [SelectionDAG] Widen <2 x T> vector types for atomic load

Vector types of 2 elements must be widened. This change does this
for vector types of atomic load in SelectionDAG
so that it can translate aligned vectors of >1 size.

commit-id:2894ccd1
---
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |  1 +
 .../SelectionDAG/LegalizeVectorTypes.cpp  | 97 ++-
 llvm/test/CodeGen/X86/atomic-load-store.ll| 78 +++
 3 files changed, 153 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 89ea7ef4dbe89..bdfa5f7741ad3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1062,6 +1062,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
   SDValue WidenVecRes_INSERT_SUBVECTOR(SDNode *N);
   SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
+  SDValue WidenVecRes_ATOMIC_LOAD(AtomicSDNode *N);
   SDValue WidenVecRes_LOAD(SDNode* N);
   SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
   SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 8eee7a4c61fe6..f88b4d5693979 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4625,6 +4625,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, 
unsigned ResNo) {
 break;
   case ISD::EXTRACT_SUBVECTOR: Res = WidenVecRes_EXTRACT_SUBVECTOR(N); break;
   case ISD::INSERT_VECTOR_ELT: Res = WidenVecRes_INSERT_VECTOR_ELT(N); break;
+  case ISD::ATOMIC_LOAD:
+Res = WidenVecRes_ATOMIC_LOAD(cast(N));
+break;
   case ISD::LOAD:  Res = WidenVecRes_LOAD(N); break;
   case ISD::STEP_VECTOR:
   case ISD::SPLAT_VECTOR:
@@ -6014,6 +6017,74 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_INSERT_VECTOR_ELT(SDNode *N) {
  N->getOperand(1), N->getOperand(2));
 }
 
+/// Either return the same load or provide appropriate casts
+/// from the load and return that.
+static SDValue coerceLoadedValue(SDValue LdOp, EVT FirstVT, EVT WidenVT,
+ TypeSize LdWidth, TypeSize FirstVTWidth,
+ SDLoc dl, SelectionDAG &DAG) {
+  assert(TypeSize::isKnownLE(LdWidth, FirstVTWidth));
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  if (!FirstVT.isVector()) {
+unsigned NumElts =
+WidenWidth.getFixedValue() / FirstVTWidth.getFixedValue();
+EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), FirstVT, NumElts);
+SDValue VecOp = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, NewVecVT, LdOp);
+return DAG.getNode(ISD::BITCAST, dl, WidenVT, VecOp);
+  }
+  assert(FirstVT == WidenVT);
+  return LdOp;
+}
+
+static std::optional findMemType(SelectionDAG &DAG,
+  const TargetLowering &TLI, unsigned 
Width,
+  EVT WidenVT, unsigned Align,
+  unsigned WidenEx);
+
+SDValue DAGTypeLegalizer::WidenVecRes_ATOMIC_LOAD(AtomicSDNode *LD) {
+  EVT WidenVT =
+  TLI.getTypeToTransformTo(*DAG.getContext(), LD->getValueType(0));
+  EVT LdVT = LD->getMemoryVT();
+  SDLoc dl(LD);
+  assert(LdVT.isVector() && WidenVT.isVector() && "Expected vectors");
+  assert(LdVT.isScalableVector() == WidenVT.isScalableVector() &&
+ "Must be scalable");
+  assert(LdVT.getVectorElementType() == WidenVT.getVectorElementType() &&
+ "Expected equivalent element types");
+
+  // Load information
+  SDValue Chain = LD->getChain();
+  SDValue BasePtr = LD->getBasePtr();
+  MachineMemOperand::Flags MMOFlags = LD->getMemOperand()->getFlags();
+  AAMDNodes AAInfo = LD->getAAInfo();
+
+  TypeSize LdWidth = LdVT.getSizeInBits();
+  TypeSize WidenWidth = WidenVT.getSizeInBits();
+  TypeSize WidthDiff = WidenWidth - LdWidth;
+
+  // Find the vector type that can load from.
+  std::optional FirstVT =
+  findMemType(DAG, TLI, LdWidth.getKnownMinValue(), WidenVT, /*LdAlign=*/0,
+  WidthDiff.getKnownMinValue());
+
+  if (!FirstVT)
+return SDValue();
+
+  SmallVector MemVTs;
+  TypeSize FirstVTWidth = FirstVT->getSizeInBits();
+
+  SDValue LdOp = DAG.getAtomicLoad(ISD::NON_EXTLOAD, dl, *FirstVT, *FirstVT,
+   Chain, BasePtr, LD->getMemOperand());
+
+  // Load the element with one instruction.
+  SDValue Result = coerceLoadedValue(LdOp, *FirstVT, WidenVT, LdWidth,
+ FirstVTWidth, dl, DAG);
+
+  // Modified the chain - switch anything that used the old chain to use
+  // the new 

[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)

2025-05-11 Thread Iris Shi via llvm-branch-commits

https://github.com/el-ev created 
https://github.com/llvm/llvm-project/pull/139495

None

>From 55a551de62d325a8e5e23c503f81abe89aead549 Mon Sep 17 00:00:00 2001
From: Iris Shi <0...@owo.li>
Date: Mon, 12 May 2025 13:32:41 +0800
Subject: [PATCH] [RISCV][Scheduler] Add scheduler definitions for the Q
 extension

---
 llvm/lib/Target/RISCV/RISCVInstrInfoQ.td  | 100 +++---
 llvm/lib/Target/RISCV/RISCVSchedGenericOOO.td |   1 +
 llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td  |   1 +
 llvm/lib/Target/RISCV/RISCVSchedRocket.td |   1 +
 llvm/lib/Target/RISCV/RISCVSchedSiFive7.td|   1 +
 llvm/lib/Target/RISCV/RISCVSchedSiFiveP400.td |   1 +
 llvm/lib/Target/RISCV/RISCVSchedSiFiveP500.td |   1 +
 llvm/lib/Target/RISCV/RISCVSchedSiFiveP600.td |   1 +
 .../lib/Target/RISCV/RISCVSchedSpacemitX60.td |   1 +
 .../Target/RISCV/RISCVSchedSyntacoreSCR345.td |   1 +
 .../Target/RISCV/RISCVSchedSyntacoreSCR7.td   |   1 +
 .../lib/Target/RISCV/RISCVSchedTTAscalonD8.td |   1 +
 .../Target/RISCV/RISCVSchedXiangShanNanHu.td  |   1 +
 llvm/lib/Target/RISCV/RISCVSchedule.td|  88 ++-
 14 files changed, 158 insertions(+), 42 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td 
b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td
index 7d216b5dd87c0..8cc965ccc515d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoQ.td
@@ -25,97 +25,119 @@ defvar QExtsRV64 = [QExt];
 
//===--===//
 
 let Predicates = [HasStdExtQ] in {
-  // def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>;
-  let hasSideEffects = 0, mayLoad = 1, mayStore = 0 in
-  def FLQ : RVInstI<0b100, OPC_LOAD_FP, (outs FPR128:$rd),
-  (ins GPRMem:$rs1, simm12:$imm12),
-  "flq", "$rd, ${imm12}(${rs1})">;
+  def FLQ : FPLoad_r<0b100, "flq", FPR128, WriteFLD128>;
+
   // Operands for stores are in the order srcreg, base, offset rather than
   // reflecting the order these fields are specified in the instruction
   // encoding.
-  // def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>;
-  let hasSideEffects = 0, mayLoad = 0, mayStore = 1 in
-  def FSQ : RVInstS<0b100, OPC_STORE_FP, (outs),
-  (ins FPR128:$rs2, GPRMem:$rs1, simm12:$imm12),
-  "fsq", "$rs2, ${imm12}(${rs1})">;
+  def FSQ : FPStore_r<0b100, "fsq", FPR128, WriteFST128>;
 } // Predicates = [HasStdExtQ]
 
 foreach Ext = QExts in {
-  defm FMADD_Q : FPFMA_rrr_frm_m;
-  defm FMSUB_Q : FPFMA_rrr_frm_m;
-  defm FNMSUB_Q : FPFMA_rrr_frm_m;
-  defm FNMADD_Q : FPFMA_rrr_frm_m;
+  let SchedRW = [WriteFMA128, ReadFMA128, ReadFMA128, ReadFMA128Addend] in {
+defm FMADD_Q : FPFMA_rrr_frm_m;
+defm FMSUB_Q : FPFMA_rrr_frm_m;
+defm FNMSUB_Q : FPFMA_rrr_frm_m;
+defm FNMADD_Q : FPFMA_rrr_frm_m;
+  }
 
-  defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>;
-  defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>;
+  let SchedRW = [WriteFAdd128, ReadFAdd128, ReadFAdd128] in {
+defm FADD_Q : FPALU_rr_frm_m<0b011, "fadd.q", Ext>;
+defm FSUB_Q : FPALU_rr_frm_m<0b111, "fsub.q", Ext>;
+  }
 
+  let SchedRW = [WriteFMul128, ReadFMul128, ReadFMul128] in 
   defm FMUL_Q : FPALU_rr_frm_m<0b0001011, "fmul.q", Ext>;
 
+  let SchedRW = [WriteFDiv128, ReadFDiv128, ReadFDiv128] in 
   defm FDIV_Q : FPALU_rr_frm_m<0b000, "fdiv.q", Ext>;
 
   defm FSQRT_Q : FPUnaryOp_r_frm_m<0b010, 0b0, Ext, Ext.PrimaryTy,
-   Ext.PrimaryTy, "fsqrt.q">;
+   Ext.PrimaryTy, "fsqrt.q">,
+ Sched<[WriteFSqrt128, ReadFSqrt128]>;
 
-  let mayRaiseFPException = 0 in {
+  let SchedRW = [WriteFSGNJ128, ReadFSGNJ128, ReadFSGNJ128],
+  mayRaiseFPException = 0 in {
 defm FSGNJ_Q : FPALU_rr_m<0b0010011, 0b000, "fsgnj.q", Ext>;
 defm FSGNJN_Q : FPALU_rr_m<0b0010011, 0b001, "fsgnjn.q", Ext>;
 defm FSGNJX_Q : FPALU_rr_m<0b0010011, 0b010, "fsgnjx.q", Ext>;
   }
 
-  defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>;
-  defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>;
+  let SchedRW = [WriteFMinMax128, ReadFMinMax128, ReadFMinMax128] in {
+defm FMIN_Q : FPALU_rr_m<0b0010111, 0b000, "fmin.q", Ext, Commutable = 1>;
+defm FMAX_Q : FPALU_rr_m<0b0010111, 0b001, "fmax.q", Ext, Commutable = 1>;
+  }
 
   defm FCVT_S_Q : FPUnaryOp_r_frm_m<0b010, 0b00011, Ext, Ext.F32Ty,
-Ext.PrimaryTy, "fcvt.s.q">;
+Ext.PrimaryTy, "fcvt.s.q">,
+  Sched<[WriteFCvtF128ToF32, ReadFCvtF128ToF32]>;
 
   defm FCVT_Q_S : FPUnaryOp_r_frmlegacy_m<0b0100011, 0b0, Ext,
-  Ext.PrimaryTy, Ext.F32Ty, 
"fcvt.q.s">;
+  Ext.PrimaryTy, Ext.F32Ty, 
+  "fcvt.q.s">,
+  Sched<[WriteFCvtF32ToF128,

[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)

2025-05-11 Thread Iris Shi via llvm-branch-commits

el-ev wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/139495?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#139495** https://app.graphite.dev/github/pr/llvm/llvm-project/139495?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/139495?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#139369** https://app.graphite.dev/github/pr/llvm/llvm-project/139369?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/139495
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV][Scheduler] Add scheduler definitions for the Q extension (PR #139495)

2025-05-11 Thread Iris Shi via llvm-branch-commits

https://github.com/el-ev ready_for_review 
https://github.com/llvm/llvm-project/pull/139495
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)

2025-05-11 Thread Owen Rodley via llvm-branch-commits

https://github.com/orodley created 
https://github.com/llvm/llvm-project/pull/139497

None

>From bfb6cb21243f043ea1edf6f00cf27d08549066dc Mon Sep 17 00:00:00 2001
From: Owen Rodley 
Date: Mon, 12 May 2025 15:50:22 +1000
Subject: [PATCH] Add a GUIDLIST table to bitcode

---
 llvm/include/llvm/Bitcode/LLVMBitCodes.h  |  3 +++
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 11 +++---
 llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 25 +++
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h 
b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 92b6e68d9d0a7..8acba6477c4a1 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -120,6 +120,9 @@ enum ModuleCodes {
 
   // IFUNC: [ifunc value type, addrspace, resolver val#, linkage, visibility]
   MODULE_CODE_IFUNC = 18,
+
+  // GUIDLIST: [n x i64]
+  MODULE_CODE_GUIDLIST = 19,
 };
 
 /// PARAMATTR blocks have code for defining a parameter attribute set.
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp 
b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 1d7aa189026a5..6d36b007956a0 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -980,6 +980,9 @@ class ModuleSummaryIndexBitcodeReader : public 
BitcodeReaderBase {
   /// the CallStackRadixTreeBuilder class in ProfileData/MemProf.h for format.
   std::vector RadixArray;
 
+  // A table which maps ValueID to the GUID for that value.
+  std::vector DefinedGUIDs;
+
 public:
   ModuleSummaryIndexBitcodeReader(
   BitstreamCursor Stream, StringRef Strtab, ModuleSummaryIndex &TheIndex,
@@ -7164,9 +7167,7 @@ 
ModuleSummaryIndexBitcodeReader::getValueInfoFromValueId(unsigned ValueId) {
 void ModuleSummaryIndexBitcodeReader::setValueGUID(
 uint64_t ValueID, StringRef ValueName, GlobalValue::LinkageTypes Linkage,
 StringRef SourceFileName) {
-  std::string GlobalId =
-  GlobalValue::getGlobalIdentifier(ValueName, Linkage, SourceFileName);
-  auto ValueGUID = GlobalValue::getGUIDAssumingExternalLinkage(GlobalId);
+  auto ValueGUID = DefinedGUIDs[ValueID];
   auto OriginalNameID = ValueGUID;
   if (GlobalValue::isLocalLinkage(Linkage))
 OriginalNameID = GlobalValue::getGUIDAssumingExternalLinkage(ValueName);
@@ -7389,6 +7390,10 @@ Error ModuleSummaryIndexBitcodeReader::parseModule() {
   // was historically always the start of the regular bitcode header.
   VSTOffset = Record[0] - 1;
   break;
+// MODULE_CODE_GUIDLIST: [i64 x N]
+case bitc::MODULE_CODE_GUIDLIST:
+  llvm::append_range(DefinedGUIDs, Record);
+  break;
 // v1 GLOBALVAR: [pointer type, isconst, initid,   linkage, 
...]
 // v1 FUNCTION:  [type, callingconv, isproto,  linkage, 
...]
 // v1 ALIAS: [alias type,   addrspace,   aliasee val#, linkage, 
...]
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp 
b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 73bed85c65b3d..3e19220d1bde7 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -227,6 +227,7 @@ class ModuleBitcodeWriterBase : public BitcodeWriterBase {
 
 protected:
   void writePerModuleGlobalValueSummary();
+  void writeGUIDList();
 
 private:
   void writePerModuleFunctionSummaryRecord(
@@ -1560,6 +1561,8 @@ void ModuleBitcodeWriter::writeModuleInfo() {
 Vals.clear();
   }
 
+  writeGUIDList();
+
   // Emit the global variable information.
   for (const GlobalVariable &GV : M.globals()) {
 unsigned AbbrevToUse = 0;
@@ -4755,6 +4758,26 @@ void 
ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() {
   Stream.ExitBlock();
 }
 
+void ModuleBitcodeWriterBase::writeGUIDList() {
+  std::vector GUIDs;
+  GUIDs.reserve(M.global_size() + M.size() + M.alias_size());
+
+  for (const GlobalValue &GV : M.global_objects()) {
+if (GV.isDeclaration()) {
+  GUIDs.push_back(
+  GlobalValue::getGUIDAssumingExternalLinkage(GV.getName()));
+} else {
+  GUIDs.push_back(GV.getGUID());
+}
+  }
+  for (const GlobalAlias &GA : M.aliases()) {
+// Equivalent to the above loop, as GlobalAliases are always definitions.
+GUIDs.push_back(GA.getGUID());
+  }
+
+  Stream.EmitRecord(bitc::MODULE_CODE_GUIDLIST, GUIDs);
+}
+
 /// Emit the combined summary section into the combined index file.
 void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
   Stream.EnterSubblock(bitc::GLOBALVAL_SUMMARY_BLOCK_ID, 4);
@@ -5538,6 +5561,8 @@ void ThinLinkBitcodeWriter::writeSimplifiedModuleInfo() {
 Vals.clear();
   }
 
+  writeGUIDList();
+
   // Emit the global variable information.
   for (const GlobalVariable &GV : M.globals()) {
 // GLOBALVAR: [strtab offset, strtab size, 0, 0, 0, linkage]

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin

[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)

2025-05-11 Thread Owen Rodley via llvm-branch-commits

orodley wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/139497?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#139497** https://app.graphite.dev/github/pr/llvm/llvm-project/139497?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/139497?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#133682** https://app.graphite.dev/github/pr/llvm/llvm-project/133682?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129644** https://app.graphite.dev/github/pr/llvm/llvm-project/129644?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/139497
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add a GUIDLIST table to bitcode (PR #139497)

2025-05-11 Thread Owen Rodley via llvm-branch-commits

https://github.com/orodley edited 
https://github.com/llvm/llvm-project/pull/139497
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/139194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/139194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Drop perf2bolt cold samples diagnostic (PR #139337)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/139337
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/139194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Print heatmap from perf2bolt (PR #139194)

2025-05-11 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/139194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for UA 2007 instructions (PR #138401)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138401


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS][NFC] Rename CBCOND -> CPBCOND (PR #138402)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138402


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS][NFC] Rename CBCOND -> CPBCOND (PR #138402)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138402


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138403



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138403



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for ASHR (PR #139503)

2025-05-11 Thread David Green via llvm-branch-commits

https://github.com/davemgreen created 
https://github.com/llvm/llvm-project/pull/139503

None



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/139451


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for ASHR (PR #139503)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/139503.diff


4 Files Affected:

- (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+10) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-smull.ll (+12-55) 
- (modified) llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir 
(+1-2) 
- (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-2) 


``diff
diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
index 21990be21bbf7..41e36e1e6640b 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
@@ -864,6 +864,16 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
   return TyBits - 1; // Every always-zero bit is a sign bit.
 break;
   }
+  case TargetOpcode::G_ASHR: {
+Register Src1 = MI.getOperand(1).getReg();
+Register Src2 = MI.getOperand(2).getReg();
+LLT SrcTy = MRI.getType(Src1);
+FirstAnswer = computeNumSignBits(Src1, DemandedElts, Depth + 1);
+if (auto C = getIConstantSplatVal(Src2, MRI))
+  FirstAnswer = std::max(FirstAnswer + C->getZExtValue(),
+   SrcTy.getScalarSizeInBits());
+break;
+  }
   case TargetOpcode::G_INTRINSIC:
   case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS:
   case TargetOpcode::G_INTRINSIC_CONVERGENT:
diff --git a/llvm/test/CodeGen/AArch64/aarch64-smull.ll 
b/llvm/test/CodeGen/AArch64/aarch64-smull.ll
index 951001c84aed0..591bc65bf3226 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-smull.ll
@@ -2265,33 +2265,12 @@ define <2 x i64> @lsr_const(<2 x i64> %a, <2 x i64> %b) 
{
 }
 
 define <2 x i64> @asr(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-NEON-LABEL: asr:
-; CHECK-NEON:   // %bb.0:
-; CHECK-NEON-NEXT:shrn v0.2s, v0.2d, #32
-; CHECK-NEON-NEXT:shrn v1.2s, v1.2d, #32
-; CHECK-NEON-NEXT:smull v0.2d, v0.2s, v1.2s
-; CHECK-NEON-NEXT:ret
-;
-; CHECK-SVE-LABEL: asr:
-; CHECK-SVE:   // %bb.0:
-; CHECK-SVE-NEXT:shrn v0.2s, v0.2d, #32
-; CHECK-SVE-NEXT:shrn v1.2s, v1.2d, #32
-; CHECK-SVE-NEXT:smull v0.2d, v0.2s, v1.2s
-; CHECK-SVE-NEXT:ret
-;
-; CHECK-GI-LABEL: asr:
-; CHECK-GI:   // %bb.0:
-; CHECK-GI-NEXT:sshr v0.2d, v0.2d, #32
-; CHECK-GI-NEXT:sshr v1.2d, v1.2d, #32
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
-; CHECK-GI-NEXT:ret
+; CHECK-LABEL: asr:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:shrn v0.2s, v0.2d, #32
+; CHECK-NEXT:shrn v1.2s, v1.2d, #32
+; CHECK-NEXT:smull v0.2d, v0.2s, v1.2s
+; CHECK-NEXT:ret
 %x = ashr <2 x i64> %a, 
 %y = ashr <2 x i64> %b, 
 %z = mul nsw <2 x i64> %x, %y
@@ -2299,34 +2278,12 @@ define <2 x i64> @asr(<2 x i64> %a, <2 x i64> %b) {
 }
 
 define <2 x i64> @asr_const(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-NEON-LABEL: asr_const:
-; CHECK-NEON:   // %bb.0:
-; CHECK-NEON-NEXT:movi v1.2s, #31
-; CHECK-NEON-NEXT:shrn v0.2s, v0.2d, #32
-; CHECK-NEON-NEXT:smull v0.2d, v0.2s, v1.2s
-; CHECK-NEON-NEXT:ret
-;
-; CHECK-SVE-LABEL: asr_const:
-; CHECK-SVE:   // %bb.0:
-; CHECK-SVE-NEXT:movi v1.2s, #31
-; CHECK-SVE-NEXT:shrn v0.2s, v0.2d, #32
-; CHECK-SVE-NEXT:smull v0.2d, v0.2s, v1.2s
-; CHECK-SVE-NEXT:ret
-;
-; CHECK-GI-LABEL: asr_const:
-; CHECK-GI:   // %bb.0:
-; CHECK-GI-NEXT:adrp x8, .LCPI81_0
-; CHECK-GI-NEXT:sshr v0.2d, v0.2d, #32
-; CHECK-GI-NEXT:ldr q1, [x8, :lo12:.LCPI81_0]
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
-; CHECK-GI-NEXT:ret
+; CHECK-LABEL: asr_const:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:movi v1.2s, #31
+; CHECK-NEXT:shrn v0.2s, v0.2d, #32
+; CHECK-NEXT:smull v0.2d, v0.2s, v1.2s
+; CHECK-NEXT:ret
 %x = ashr <2 x i64> %a, 
 %z = mul nsw <2 x i64> %x, 
 ret <2 x i64> %z
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir 
b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir
index 78a2227b84a3a..a7c1c6355bff6 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-abs-rv64.mir
@@ -88,8 +88,7 @@ body: |
 ; RV64I-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[ASSERT_SEXT]], [[ASHR]]
 ; RV64I-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s64) = G_SEXT_INREG [[ADD]], 32
 ; R

[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/139451


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR (PR #139505)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: David Green (davemgreen)


Changes

The code is similar to computeKnownBits and the code in 
SelectionDAG::ComputeNumSignBits.

---
Full diff: https://github.com/llvm/llvm-project/pull/139505.diff


3 Files Affected:

- (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+24) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-23) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+15-26) 


``diff
diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
index 41e36e1e6640b..fb483ed962270 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
@@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
 break;
   }
+  case TargetOpcode::G_SHUFFLE_VECTOR: {
+// Collect the minimum number of sign bits that are shared by every vector
+// element referenced by the shuffle.
+APInt DemandedLHS, DemandedRHS;
+unsigned NumElts = MRI.getType(MI.getOperand(1).getReg()).getNumElements();
+if (!getShuffleDemandedElts(NumElts, MI.getOperand(3).getShuffleMask(),
+DemandedElts, DemandedLHS, DemandedRHS))
+  return 1;
+
+unsigned Tmp = std::numeric_limits::max();
+if (!!DemandedLHS)
+  Tmp =
+  computeNumSignBits(MI.getOperand(1).getReg(), DemandedLHS, Depth + 
1);
+if (!!DemandedRHS) {
+  unsigned Tmp2 =
+  computeNumSignBits(MI.getOperand(2).getReg(), DemandedRHS, Depth + 
1);
+  Tmp = std::min(Tmp, Tmp2);
+}
+// If we don't know anything, early out and try computeKnownBits fall-back.
+if (Tmp == 1)
+  break;
+assert(Tmp <= TyBits && "Failed to determine minimum sign bits");
+return Tmp;
+  }
   case TargetOpcode::G_INTRINSIC:
   case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS:
   case TargetOpcode::G_INTRINSIC_CONVERGENT:
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
index 56393142726c7..d86cbf57a65f3 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
@@ -400,9 +400,10 @@ define <8 x i16> @missing_insert(<8 x i8> %b) {
 ;
 ; CHECK-GI-LABEL: missing_insert:
 ; CHECK-GI:   // %bb.0: // %entry
-; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0
-; CHECK-GI-NEXT:ext v1.16b, v0.16b, v0.16b, #4
-; CHECK-GI-NEXT:mul v0.8h, v1.8h, v0.8h
+; CHECK-GI-NEXT:sshll v1.8h, v0.8b, #0
+; CHECK-GI-NEXT:ext v1.16b, v1.16b, v1.16b, #4
+; CHECK-GI-NEXT:xtn v1.8b, v1.8h
+; CHECK-GI-NEXT:smull v0.8h, v1.8b, v0.8b
 ; CHECK-GI-NEXT:ret
 entry:
   %ext.b = sext <8 x i8> %b to <8 x i16>
@@ -421,10 +422,10 @@ define <8 x i16> @shufsext_v8i8_v8i16(<8 x i8> %src, <8 x 
i8> %b) {
 ; CHECK-GI-LABEL: shufsext_v8i8_v8i16:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0
-; CHECK-GI-NEXT:sshll v1.8h, v1.8b, #0
 ; CHECK-GI-NEXT:rev64 v0.8h, v0.8h
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:mul v0.8h, v0.8h, v1.8h
+; CHECK-GI-NEXT:xtn v0.8b, v0.8h
+; CHECK-GI-NEXT:smull v0.8h, v0.8b, v1.8b
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <8 x i8> %src to <8 x i16>
@@ -444,16 +445,9 @@ define <2 x i64> @shufsext_v2i32_v2i64(<2 x i32> %src, <2 
x i32> %b) {
 ; CHECK-GI-LABEL: shufsext_v2i32_v2i64:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
-; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v0.2s, v0.2d
+; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <2 x i32> %src to <2 x i64>
@@ -496,16 +490,9 @@ define <2 x i64> @shufzext_v2i32_v2i64(<2 x i32> %src, <2 
x i32> %b) {
 ; CHECK-GI-LABEL: shufzext_v2i32_v2i64:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
-; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v0.2s, v0.2d
+; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <2 x i32> %src to <2 x i64>
diff --git a/llvm/test/CodeGen/AA

[llvm-branch-commits] [llvm] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR (PR #139505)

2025-05-11 Thread David Green via llvm-branch-commits

https://github.com/davemgreen created 
https://github.com/llvm/llvm-project/pull/139505

The code is similar to computeKnownBits and the code in 
SelectionDAG::ComputeNumSignBits.

>From 68fc0c493331eaa56ebc862ef7dfb7106cabad82 Mon Sep 17 00:00:00 2001
From: David Green 
Date: Mon, 12 May 2025 07:36:16 +0100
Subject: [PATCH] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR

The code is similar to computeKnownBits and the code in
SelectionDAG::ComputeNumSignBits
---
 .../CodeGen/GlobalISel/GISelValueTracking.cpp | 24 +++
 llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll  | 33 +--
 .../AArch64/aarch64-matrix-umull-smull.ll | 41 +++
 3 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
index 41e36e1e6640b..fb483ed962270 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
@@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
 break;
   }
+  case TargetOpcode::G_SHUFFLE_VECTOR: {
+// Collect the minimum number of sign bits that are shared by every vector
+// element referenced by the shuffle.
+APInt DemandedLHS, DemandedRHS;
+unsigned NumElts = MRI.getType(MI.getOperand(1).getReg()).getNumElements();
+if (!getShuffleDemandedElts(NumElts, MI.getOperand(3).getShuffleMask(),
+DemandedElts, DemandedLHS, DemandedRHS))
+  return 1;
+
+unsigned Tmp = std::numeric_limits::max();
+if (!!DemandedLHS)
+  Tmp =
+  computeNumSignBits(MI.getOperand(1).getReg(), DemandedLHS, Depth + 
1);
+if (!!DemandedRHS) {
+  unsigned Tmp2 =
+  computeNumSignBits(MI.getOperand(2).getReg(), DemandedRHS, Depth + 
1);
+  Tmp = std::min(Tmp, Tmp2);
+}
+// If we don't know anything, early out and try computeKnownBits fall-back.
+if (Tmp == 1)
+  break;
+assert(Tmp <= TyBits && "Failed to determine minimum sign bits");
+return Tmp;
+  }
   case TargetOpcode::G_INTRINSIC:
   case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS:
   case TargetOpcode::G_INTRINSIC_CONVERGENT:
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
index 56393142726c7..d86cbf57a65f3 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
@@ -400,9 +400,10 @@ define <8 x i16> @missing_insert(<8 x i8> %b) {
 ;
 ; CHECK-GI-LABEL: missing_insert:
 ; CHECK-GI:   // %bb.0: // %entry
-; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0
-; CHECK-GI-NEXT:ext v1.16b, v0.16b, v0.16b, #4
-; CHECK-GI-NEXT:mul v0.8h, v1.8h, v0.8h
+; CHECK-GI-NEXT:sshll v1.8h, v0.8b, #0
+; CHECK-GI-NEXT:ext v1.16b, v1.16b, v1.16b, #4
+; CHECK-GI-NEXT:xtn v1.8b, v1.8h
+; CHECK-GI-NEXT:smull v0.8h, v1.8b, v0.8b
 ; CHECK-GI-NEXT:ret
 entry:
   %ext.b = sext <8 x i8> %b to <8 x i16>
@@ -421,10 +422,10 @@ define <8 x i16> @shufsext_v8i8_v8i16(<8 x i8> %src, <8 x 
i8> %b) {
 ; CHECK-GI-LABEL: shufsext_v8i8_v8i16:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0
-; CHECK-GI-NEXT:sshll v1.8h, v1.8b, #0
 ; CHECK-GI-NEXT:rev64 v0.8h, v0.8h
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:mul v0.8h, v0.8h, v1.8h
+; CHECK-GI-NEXT:xtn v0.8b, v0.8h
+; CHECK-GI-NEXT:smull v0.8h, v0.8b, v1.8b
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <8 x i8> %src to <8 x i16>
@@ -444,16 +445,9 @@ define <2 x i64> @shufsext_v2i32_v2i64(<2 x i32> %src, <2 
x i32> %b) {
 ; CHECK-GI-LABEL: shufsext_v2i32_v2i64:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
-; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v0.2s, v0.2d
+; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <2 x i32> %src to <2 x i64>
@@ -496,16 +490,9 @@ define <2 x i64> @shufzext_v2i32_v2i64(<2 x i32> %src, <2 
x i32> %b) {
 ; CHECK-GI-LABEL: shufzext_v2i32_v2i64:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
-; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEX

[llvm-branch-commits] [llvm] [GlobalISel] Add computeKnownBits for G_SHUFFLE_VECTOR (PR #139505)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)


Changes

The code is similar to computeKnownBits and the code in 
SelectionDAG::ComputeNumSignBits.

---
Full diff: https://github.com/llvm/llvm-project/pull/139505.diff


3 Files Affected:

- (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+24) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-23) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+15-26) 


``diff
diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
index 41e36e1e6640b..fb483ed962270 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
@@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
 break;
   }
+  case TargetOpcode::G_SHUFFLE_VECTOR: {
+// Collect the minimum number of sign bits that are shared by every vector
+// element referenced by the shuffle.
+APInt DemandedLHS, DemandedRHS;
+unsigned NumElts = MRI.getType(MI.getOperand(1).getReg()).getNumElements();
+if (!getShuffleDemandedElts(NumElts, MI.getOperand(3).getShuffleMask(),
+DemandedElts, DemandedLHS, DemandedRHS))
+  return 1;
+
+unsigned Tmp = std::numeric_limits::max();
+if (!!DemandedLHS)
+  Tmp =
+  computeNumSignBits(MI.getOperand(1).getReg(), DemandedLHS, Depth + 
1);
+if (!!DemandedRHS) {
+  unsigned Tmp2 =
+  computeNumSignBits(MI.getOperand(2).getReg(), DemandedRHS, Depth + 
1);
+  Tmp = std::min(Tmp, Tmp2);
+}
+// If we don't know anything, early out and try computeKnownBits fall-back.
+if (Tmp == 1)
+  break;
+assert(Tmp <= TyBits && "Failed to determine minimum sign bits");
+return Tmp;
+  }
   case TargetOpcode::G_INTRINSIC:
   case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS:
   case TargetOpcode::G_INTRINSIC_CONVERGENT:
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
index 56393142726c7..d86cbf57a65f3 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
@@ -400,9 +400,10 @@ define <8 x i16> @missing_insert(<8 x i8> %b) {
 ;
 ; CHECK-GI-LABEL: missing_insert:
 ; CHECK-GI:   // %bb.0: // %entry
-; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0
-; CHECK-GI-NEXT:ext v1.16b, v0.16b, v0.16b, #4
-; CHECK-GI-NEXT:mul v0.8h, v1.8h, v0.8h
+; CHECK-GI-NEXT:sshll v1.8h, v0.8b, #0
+; CHECK-GI-NEXT:ext v1.16b, v1.16b, v1.16b, #4
+; CHECK-GI-NEXT:xtn v1.8b, v1.8h
+; CHECK-GI-NEXT:smull v0.8h, v1.8b, v0.8b
 ; CHECK-GI-NEXT:ret
 entry:
   %ext.b = sext <8 x i8> %b to <8 x i16>
@@ -421,10 +422,10 @@ define <8 x i16> @shufsext_v8i8_v8i16(<8 x i8> %src, <8 x 
i8> %b) {
 ; CHECK-GI-LABEL: shufsext_v8i8_v8i16:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.8h, v0.8b, #0
-; CHECK-GI-NEXT:sshll v1.8h, v1.8b, #0
 ; CHECK-GI-NEXT:rev64 v0.8h, v0.8h
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:mul v0.8h, v0.8h, v1.8h
+; CHECK-GI-NEXT:xtn v0.8b, v0.8h
+; CHECK-GI-NEXT:smull v0.8h, v0.8b, v1.8b
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <8 x i8> %src to <8 x i16>
@@ -444,16 +445,9 @@ define <2 x i64> @shufsext_v2i32_v2i64(<2 x i32> %src, <2 
x i32> %b) {
 ; CHECK-GI-LABEL: shufsext_v2i32_v2i64:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
-; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v0.2s, v0.2d
+; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <2 x i32> %src to <2 x i64>
@@ -496,16 +490,9 @@ define <2 x i64> @shufzext_v2i32_v2i64(<2 x i32> %src, <2 
x i32> %b) {
 ; CHECK-GI-LABEL: shufzext_v2i32_v2i64:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
-; CHECK-GI-NEXT:sshll v1.2d, v1.2s, #0
 ; CHECK-GI-NEXT:ext v0.16b, v0.16b, v0.16b, #8
-; CHECK-GI-NEXT:fmov x9, d1
-; CHECK-GI-NEXT:mov x11, v1.d[1]
-; CHECK-GI-NEXT:fmov x8, d0
-; CHECK-GI-NEXT:mov x10, v0.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v0.2s, v0.2d
+; CHECK-GI-NEXT:smull v0.2d, v0.2s, v1.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext <2 x i32> %src to <2 x i64>
diff --git a/llvm/test/CodeGen/AA

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR. (PR #139506)

2025-05-11 Thread David Green via llvm-branch-commits

https://github.com/davemgreen created 
https://github.com/llvm/llvm-project/pull/139506

The code is similar to SelectionDAG::ComputeNumSignBits, but does not deal with 
truncating buildvectors.

>From c1286744212c2b2f09e923161a6e6fc4d894e216 Mon Sep 17 00:00:00 2001
From: David Green 
Date: Mon, 12 May 2025 07:48:44 +0100
Subject: [PATCH] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR.

---
 .../CodeGen/GlobalISel/GISelValueTracking.cpp | 17 +++
 llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll  | 28 ---
 .../AArch64/aarch64-matrix-umull-smull.ll | 46 +--
 3 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
index fb483ed962270..999bae6ccf42c 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
@@ -874,6 +874,23 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
 break;
   }
+  case TargetOpcode::G_BUILD_VECTOR: {
+// Collect the known bits that are shared by every demanded vector element.
+FirstAnswer = TyBits;
+for (unsigned i = 0, e = MI.getNumOperands() - 1; i < e; ++i) {
+  if (!DemandedElts[i])
+continue;
+
+  unsigned Tmp2 = computeNumSignBits(MI.getOperand(i + 1).getReg(),
+ APInt(1, 1), Depth + 1);
+  FirstAnswer = std::min(FirstAnswer, Tmp2);
+
+  // If we don't know any bits, early out.
+  if (FirstAnswer == 1)
+break;
+}
+break;
+  }
   case TargetOpcode::G_SHUFFLE_VECTOR: {
 // Collect the minimum number of sign bits that are shared by every vector
 // element referenced by the shuffle.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
index d86cbf57a65f3..295863f18fd41 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
@@ -61,9 +61,9 @@ define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) 
{
 ; CHECK-GI-LABEL: dupsext_v4i16_v4i32:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sxth w8, w0
-; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0
 ; CHECK-GI-NEXT:dup v1.4s, w8
-; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s
+; CHECK-GI-NEXT:xtn v1.4h, v1.4s
+; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext i16 %src to i32
@@ -108,16 +108,9 @@ define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> 
%b) {
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0
 ; CHECK-GI-NEXT:sxtw x8, w0
-; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
 ; CHECK-GI-NEXT:dup v1.2d, x8
-; CHECK-GI-NEXT:fmov x9, d0
-; CHECK-GI-NEXT:mov x11, v0.d[1]
-; CHECK-GI-NEXT:fmov x8, d1
-; CHECK-GI-NEXT:mov x10, v1.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v1.2s, v1.2d
+; CHECK-GI-NEXT:smull v0.2d, v1.2s, v0.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext i32 %src to i64
@@ -293,15 +286,14 @@ define <4 x i32> @nonsplat_shuffleinsert2(<4 x i16> %b, 
i16 %b0, i16 %b1, i16 %b
 ; CHECK-GI-LABEL: nonsplat_shuffleinsert2:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sxth w8, w0
-; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0
-; CHECK-GI-NEXT:mov v1.s[0], w8
-; CHECK-GI-NEXT:sxth w8, w1
-; CHECK-GI-NEXT:mov v1.s[1], w8
+; CHECK-GI-NEXT:sxth w9, w1
+; CHECK-GI-NEXT:fmov s1, w8
 ; CHECK-GI-NEXT:sxth w8, w2
-; CHECK-GI-NEXT:mov v1.s[2], w8
+; CHECK-GI-NEXT:mov v1.h[1], w9
+; CHECK-GI-NEXT:mov v1.h[2], w8
 ; CHECK-GI-NEXT:sxth w8, w3
-; CHECK-GI-NEXT:mov v1.s[3], w8
-; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s
+; CHECK-GI-NEXT:mov v1.h[3], w8
+; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h
 ; CHECK-GI-NEXT:ret
 entry:
   %s0 = sext i16 %b0 to i32
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll 
b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index b89b422c8c5ad..418113a4e4e09 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -108,11 +108,12 @@ define void @matrix_mul_signed(i32 %N, ptr nocapture %C, 
ptr nocapture readonly
 ;
 ; CHECK-GI-LABEL: matrix_mul_signed:
 ; CHECK-GI:   // %bb.0: // %vector.header
-; CHECK-GI-NEXT:sxth w9, w3
+; CHECK-GI-NEXT:sxth w8, w3
 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0
+; CHECK-GI-NEXT:dup v0.4s, w8
 ; CHECK-GI-NEXT:sxtw x8, w0
-; CHECK-GI-NEXT:dup v0.4s, w9
 ; CHECK-GI-NEXT:and x8, x8, #0xfff8
+; CHECK-GI-NEXT:xtn v0.4h, v0.4s
 ; CHECK-GI-NEXT:  .LBB1_1: // %vector.body
 ; CHECK-GI-NEXT:// =>This Inn

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR. (PR #139506)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)


Changes

The code is similar to SelectionDAG::ComputeNumSignBits, but does not deal with 
truncating buildvectors.

---
Full diff: https://github.com/llvm/llvm-project/pull/139506.diff


3 Files Affected:

- (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+17) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-18) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+22-24) 


``diff
diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
index fb483ed962270..999bae6ccf42c 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
@@ -874,6 +874,23 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
 break;
   }
+  case TargetOpcode::G_BUILD_VECTOR: {
+// Collect the known bits that are shared by every demanded vector element.
+FirstAnswer = TyBits;
+for (unsigned i = 0, e = MI.getNumOperands() - 1; i < e; ++i) {
+  if (!DemandedElts[i])
+continue;
+
+  unsigned Tmp2 = computeNumSignBits(MI.getOperand(i + 1).getReg(),
+ APInt(1, 1), Depth + 1);
+  FirstAnswer = std::min(FirstAnswer, Tmp2);
+
+  // If we don't know any bits, early out.
+  if (FirstAnswer == 1)
+break;
+}
+break;
+  }
   case TargetOpcode::G_SHUFFLE_VECTOR: {
 // Collect the minimum number of sign bits that are shared by every vector
 // element referenced by the shuffle.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
index d86cbf57a65f3..295863f18fd41 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
@@ -61,9 +61,9 @@ define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) 
{
 ; CHECK-GI-LABEL: dupsext_v4i16_v4i32:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sxth w8, w0
-; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0
 ; CHECK-GI-NEXT:dup v1.4s, w8
-; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s
+; CHECK-GI-NEXT:xtn v1.4h, v1.4s
+; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext i16 %src to i32
@@ -108,16 +108,9 @@ define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> 
%b) {
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0
 ; CHECK-GI-NEXT:sxtw x8, w0
-; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
 ; CHECK-GI-NEXT:dup v1.2d, x8
-; CHECK-GI-NEXT:fmov x9, d0
-; CHECK-GI-NEXT:mov x11, v0.d[1]
-; CHECK-GI-NEXT:fmov x8, d1
-; CHECK-GI-NEXT:mov x10, v1.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v1.2s, v1.2d
+; CHECK-GI-NEXT:smull v0.2d, v1.2s, v0.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext i32 %src to i64
@@ -293,15 +286,14 @@ define <4 x i32> @nonsplat_shuffleinsert2(<4 x i16> %b, 
i16 %b0, i16 %b1, i16 %b
 ; CHECK-GI-LABEL: nonsplat_shuffleinsert2:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sxth w8, w0
-; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0
-; CHECK-GI-NEXT:mov v1.s[0], w8
-; CHECK-GI-NEXT:sxth w8, w1
-; CHECK-GI-NEXT:mov v1.s[1], w8
+; CHECK-GI-NEXT:sxth w9, w1
+; CHECK-GI-NEXT:fmov s1, w8
 ; CHECK-GI-NEXT:sxth w8, w2
-; CHECK-GI-NEXT:mov v1.s[2], w8
+; CHECK-GI-NEXT:mov v1.h[1], w9
+; CHECK-GI-NEXT:mov v1.h[2], w8
 ; CHECK-GI-NEXT:sxth w8, w3
-; CHECK-GI-NEXT:mov v1.s[3], w8
-; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s
+; CHECK-GI-NEXT:mov v1.h[3], w8
+; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h
 ; CHECK-GI-NEXT:ret
 entry:
   %s0 = sext i16 %b0 to i32
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll 
b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index b89b422c8c5ad..418113a4e4e09 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -108,11 +108,12 @@ define void @matrix_mul_signed(i32 %N, ptr nocapture %C, 
ptr nocapture readonly
 ;
 ; CHECK-GI-LABEL: matrix_mul_signed:
 ; CHECK-GI:   // %bb.0: // %vector.header
-; CHECK-GI-NEXT:sxth w9, w3
+; CHECK-GI-NEXT:sxth w8, w3
 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0
+; CHECK-GI-NEXT:dup v0.4s, w8
 ; CHECK-GI-NEXT:sxtw x8, w0
-; CHECK-GI-NEXT:dup v0.4s, w9
 ; CHECK-GI-NEXT:and x8, x8, #0xfff8
+; CHECK-GI-NEXT:xtn v0.4h, v0.4s
 ; CHECK-GI-NEXT:  .LBB1_1: // %vector.body
 ; CHECK-GI-NEXT:// =>This Inner Loop Header: Depth=1
 ; CHECK-GI-NEXT:add x9, x2, w0, sxtw #1
@@ -120,10 +121,8 @@ define void @matrix_mul_signe

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_BUILD_VECTOR. (PR #139506)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: David Green (davemgreen)


Changes

The code is similar to SelectionDAG::ComputeNumSignBits, but does not deal with 
truncating buildvectors.

---
Full diff: https://github.com/llvm/llvm-project/pull/139506.diff


3 Files Affected:

- (modified) llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp (+17) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+10-18) 
- (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+22-24) 


``diff
diff --git a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
index fb483ed962270..999bae6ccf42c 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
@@ -874,6 +874,23 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
 break;
   }
+  case TargetOpcode::G_BUILD_VECTOR: {
+// Collect the known bits that are shared by every demanded vector element.
+FirstAnswer = TyBits;
+for (unsigned i = 0, e = MI.getNumOperands() - 1; i < e; ++i) {
+  if (!DemandedElts[i])
+continue;
+
+  unsigned Tmp2 = computeNumSignBits(MI.getOperand(i + 1).getReg(),
+ APInt(1, 1), Depth + 1);
+  FirstAnswer = std::min(FirstAnswer, Tmp2);
+
+  // If we don't know any bits, early out.
+  if (FirstAnswer == 1)
+break;
+}
+break;
+  }
   case TargetOpcode::G_SHUFFLE_VECTOR: {
 // Collect the minimum number of sign bits that are shared by every vector
 // element referenced by the shuffle.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
index d86cbf57a65f3..295863f18fd41 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
@@ -61,9 +61,9 @@ define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) 
{
 ; CHECK-GI-LABEL: dupsext_v4i16_v4i32:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sxth w8, w0
-; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0
 ; CHECK-GI-NEXT:dup v1.4s, w8
-; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s
+; CHECK-GI-NEXT:xtn v1.4h, v1.4s
+; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext i16 %src to i32
@@ -108,16 +108,9 @@ define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> 
%b) {
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0
 ; CHECK-GI-NEXT:sxtw x8, w0
-; CHECK-GI-NEXT:sshll v0.2d, v0.2s, #0
 ; CHECK-GI-NEXT:dup v1.2d, x8
-; CHECK-GI-NEXT:fmov x9, d0
-; CHECK-GI-NEXT:mov x11, v0.d[1]
-; CHECK-GI-NEXT:fmov x8, d1
-; CHECK-GI-NEXT:mov x10, v1.d[1]
-; CHECK-GI-NEXT:mul x8, x8, x9
-; CHECK-GI-NEXT:mul x9, x10, x11
-; CHECK-GI-NEXT:mov v0.d[0], x8
-; CHECK-GI-NEXT:mov v0.d[1], x9
+; CHECK-GI-NEXT:xtn v1.2s, v1.2d
+; CHECK-GI-NEXT:smull v0.2d, v1.2s, v0.2s
 ; CHECK-GI-NEXT:ret
 entry:
   %in = sext i32 %src to i64
@@ -293,15 +286,14 @@ define <4 x i32> @nonsplat_shuffleinsert2(<4 x i16> %b, 
i16 %b0, i16 %b1, i16 %b
 ; CHECK-GI-LABEL: nonsplat_shuffleinsert2:
 ; CHECK-GI:   // %bb.0: // %entry
 ; CHECK-GI-NEXT:sxth w8, w0
-; CHECK-GI-NEXT:sshll v0.4s, v0.4h, #0
-; CHECK-GI-NEXT:mov v1.s[0], w8
-; CHECK-GI-NEXT:sxth w8, w1
-; CHECK-GI-NEXT:mov v1.s[1], w8
+; CHECK-GI-NEXT:sxth w9, w1
+; CHECK-GI-NEXT:fmov s1, w8
 ; CHECK-GI-NEXT:sxth w8, w2
-; CHECK-GI-NEXT:mov v1.s[2], w8
+; CHECK-GI-NEXT:mov v1.h[1], w9
+; CHECK-GI-NEXT:mov v1.h[2], w8
 ; CHECK-GI-NEXT:sxth w8, w3
-; CHECK-GI-NEXT:mov v1.s[3], w8
-; CHECK-GI-NEXT:mul v0.4s, v1.4s, v0.4s
+; CHECK-GI-NEXT:mov v1.h[3], w8
+; CHECK-GI-NEXT:smull v0.4s, v1.4h, v0.4h
 ; CHECK-GI-NEXT:ret
 entry:
   %s0 = sext i16 %b0 to i32
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll 
b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index b89b422c8c5ad..418113a4e4e09 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -108,11 +108,12 @@ define void @matrix_mul_signed(i32 %N, ptr nocapture %C, 
ptr nocapture readonly
 ;
 ; CHECK-GI-LABEL: matrix_mul_signed:
 ; CHECK-GI:   // %bb.0: // %vector.header
-; CHECK-GI-NEXT:sxth w9, w3
+; CHECK-GI-NEXT:sxth w8, w3
 ; CHECK-GI-NEXT:// kill: def $w0 killed $w0 def $x0
+; CHECK-GI-NEXT:dup v0.4s, w8
 ; CHECK-GI-NEXT:sxtw x8, w0
-; CHECK-GI-NEXT:dup v0.4s, w9
 ; CHECK-GI-NEXT:and x8, x8, #0xfff8
+; CHECK-GI-NEXT:xtn v0.4h, v0.4s
 ; CHECK-GI-NEXT:  .LBB1_1: // %vector.body
 ; CHECK-GI-NEXT:// =>This Inner Loop Header: Depth=1
 ; CHECK-GI-NEXT:add x9, x2, w0, sxtw #1
@@ -120,10 +121,8 @@ define void @matrix_mul_signe

[llvm-branch-commits] [llvm] [SelectionDAG][X86] Remove unused elements from atomic vector. (PR #125432)

2025-05-11 Thread Matt Arsenault via llvm-branch-commits


@@ -60388,6 +60393,35 @@ static SDValue combineINTRINSIC_VOID(SDNode *N, 
SelectionDAG &DAG,
   return SDValue();
 }
 
+static SDValue combineVZEXT_LOAD(SDNode *N, SelectionDAG &DAG,
+ TargetLowering::DAGCombinerInfo &DCI) {
+  // Find the TokenFactor to locate the associated AtomicLoad.
+  SDNode *ALD = nullptr;
+  for (auto &TF : DAG.allnodes())

arsenm wrote:

Looking at all nodes should never be necessary 

https://github.com/llvm/llvm-project/pull/125432
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for UA 2007 instructions (PR #138401)

2025-05-11 Thread Sergei Barannikov via llvm-branch-commits

https://github.com/s-barannikov approved this pull request.


https://github.com/llvm/llvm-project/pull/138401
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Implicit resource binding for cbuffers (PR #139022)

2025-05-11 Thread Justin Bogner via llvm-branch-commits


@@ -539,19 +537,27 @@ static void initializeBuffer(CodeGenModule &CGM, 
llvm::GlobalVariable *GV,
 }
 
 static void initializeBufferFromBinding(CodeGenModule &CGM,
-llvm::GlobalVariable *GV, unsigned 
Slot,
-unsigned Space) {
+llvm::GlobalVariable *GV,
+HLSLResourceBindingAttr *RBA) {
   llvm::Type *Int1Ty = llvm::Type::getInt1Ty(CGM.getLLVMContext());
-  llvm::Value *Args[] = {
-  llvm::ConstantInt::get(CGM.IntTy, Space), /* reg_space */
-  llvm::ConstantInt::get(CGM.IntTy, Slot),  /* lower_bound */
-  llvm::ConstantInt::get(CGM.IntTy, 1), /* range_size */
-  llvm::ConstantInt::get(CGM.IntTy, 0), /* index */
-  llvm::ConstantInt::get(Int1Ty, false) /* non-uniform */
-  };
-  initializeBuffer(CGM, GV,
-   CGM.getHLSLRuntime().getCreateHandleFromBindingIntrinsic(),
-   Args);
+  auto *False = llvm::ConstantInt::get(Int1Ty, false);
+  auto *Zero = llvm::ConstantInt::get(CGM.IntTy, 0);
+  auto *One = llvm::ConstantInt::get(CGM.IntTy, 1);

bogner wrote:

Shouldn't these be called `NonUniform`, `Index`, and `RangeSize`? Naming these 
after their values isn't very helpful.

https://github.com/llvm/llvm-project/pull/139022
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 07bd645 - Revert "[RISCV] Implement codegen for XAndesPerf lea instructions (#137925)"

2025-05-11 Thread via llvm-branch-commits

Author: Jim Lin
Date: 2025-05-12T10:41:56+08:00
New Revision: 07bd6454806aa8149809c49833b6e7c165a2eb51

URL: 
https://github.com/llvm/llvm-project/commit/07bd6454806aa8149809c49833b6e7c165a2eb51
DIFF: 
https://github.com/llvm/llvm-project/commit/07bd6454806aa8149809c49833b6e7c165a2eb51.diff

LOG: Revert "[RISCV] Implement codegen for XAndesPerf lea instructions 
(#137925)"

This reverts commit a788a1abd9c881aa113f5932d100e1a2e3898e14.

Added: 


Modified: 
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td
llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
llvm/test/CodeGen/RISCV/rv32zba.ll
llvm/test/CodeGen/RISCV/rv64zba.ll

Removed: 




diff  --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp 
b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 158a3afdb864c..134d82d84b237 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -14516,8 +14516,8 @@ static SDValue combineBinOpToReduce(SDNode *N, 
SelectionDAG &DAG,
 //  (SLLI (SH*ADD x, y), c0), if c1-c0 equals to [1|2|3].
 static SDValue transformAddShlImm(SDNode *N, SelectionDAG &DAG,
   const RISCVSubtarget &Subtarget) {
-  // Perform this optimization only in the zba/xandesperf extension.
-  if (!Subtarget.hasStdExtZba() && !Subtarget.hasVendorXAndesPerf())
+  // Perform this optimization only in the zba extension.
+  if (!Subtarget.hasStdExtZba())
 return SDValue();
 
   // Skip for vector types and larger types.
@@ -15448,9 +15448,8 @@ static SDValue expandMul(SDNode *N, SelectionDAG &DAG,
   if (VT != Subtarget.getXLenVT())
 return SDValue();
 
-  const bool HasShlAdd = Subtarget.hasStdExtZba() ||
- Subtarget.hasVendorXTHeadBa() ||
- Subtarget.hasVendorXAndesPerf();
+  const bool HasShlAdd =
+  Subtarget.hasStdExtZba() || Subtarget.hasVendorXTHeadBa();
 
   ConstantSDNode *CNode = dyn_cast(N->getOperand(1));
   if (!CNode)

diff  --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td 
b/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td
index 4e01b93d76e80..2ec768435259c 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td
@@ -135,16 +135,6 @@ class NDSRVInstRR funct7, string opcodestr>
   let mayStore = 0;
 }
 
-class NDSRVInstLEA funct7, string opcodestr>
-: RVInstR,
-  Sched<[WriteIALU, ReadIALU, ReadIALU]> {
-  let hasSideEffects = 0;
-  let mayLoad = 0;
-  let mayStore = 0;
-}
-
 // GP: ADDI, LB, LBU
 class NDSRVInstLBGP funct2, string opcodestr>
 : RVInst<(outs GPR:$rd), (ins simm18:$imm18),
@@ -331,9 +321,9 @@ def NDS_BNEC : NDSRVInstBC<0b110, "nds.bnec">;
 def NDS_BFOS : NDSRVInstBFO<0b011, "nds.bfos">;
 def NDS_BFOZ : NDSRVInstBFO<0b010, "nds.bfoz">;
 
-def NDS_LEA_H : NDSRVInstLEA<0b101, "nds.lea.h">;
-def NDS_LEA_W : NDSRVInstLEA<0b110, "nds.lea.w">;
-def NDS_LEA_D : NDSRVInstLEA<0b111, "nds.lea.d">;
+def NDS_LEA_H : NDSRVInstRR<0b101, "nds.lea.h">;
+def NDS_LEA_W : NDSRVInstRR<0b110, "nds.lea.w">;
+def NDS_LEA_D : NDSRVInstRR<0b111, "nds.lea.d">;
 
 let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in
 def NDS_ADDIGP : NDSRVInstLBGP<0b01, "nds.addigp">;
@@ -355,10 +345,10 @@ def NDS_FLMISM  : NDSRVInstRR<0b0010011, "nds.flmism">;
 } // Predicates = [HasVendorXAndesPerf]
 
 let Predicates = [HasVendorXAndesPerf, IsRV64] in {
-def NDS_LEA_B_ZE : NDSRVInstLEA<0b0001000, "nds.lea.b.ze">;
-def NDS_LEA_H_ZE : NDSRVInstLEA<0b0001001, "nds.lea.h.ze">;
-def NDS_LEA_W_ZE : NDSRVInstLEA<0b0001010, "nds.lea.w.ze">;
-def NDS_LEA_D_ZE : NDSRVInstLEA<0b0001011, "nds.lea.d.ze">;
+def NDS_LEA_B_ZE : NDSRVInstRR<0b0001000, "nds.lea.b.ze">;
+def NDS_LEA_H_ZE : NDSRVInstRR<0b0001001, "nds.lea.h.ze">;
+def NDS_LEA_W_ZE : NDSRVInstRR<0b0001010, "nds.lea.w.ze">;
+def NDS_LEA_D_ZE : NDSRVInstRR<0b0001011, "nds.lea.d.ze">;
 
 def NDS_LWUGP : NDSRVInstLWGP<0b110, "nds.lwugp">;
 def NDS_LDGP  : NDSRVInstLDGP<0b011, "nds.ldgp">;
@@ -366,32 +356,3 @@ def NDS_LDGP  : NDSRVInstLDGP<0b011, "nds.ldgp">;
 def NDS_SDGP  : NDSRVInstSDGP<0b111, "nds.sdgp">;
 } // Predicates = [HasVendorXAndesPerf, IsRV64]
 } // DecoderNamespace = "XAndes"
-
-// Patterns
-
-let Predicates = [HasVendorXAndesPerf] in {
-
-defm : ShxAddPat<1, NDS_LEA_H>;
-defm : ShxAddPat<2, NDS_LEA_W>;
-defm : ShxAddPat<3, NDS_LEA_D>;
-
-def : CSImm12MulBy4Pat;
-def : CSImm12MulBy8Pat;
-} // Predicates = [HasVendorXAndesPerf]
-
-let Predicates = [HasVendorXAndesPerf, IsRV64] in {
-
-defm : ADD_UWPat;
-
-defm : ShxAdd_UWPat<1, NDS_LEA_H_ZE>;
-defm : ShxAdd_UWPat<2, NDS_LEA_W_ZE>;
-defm : ShxAdd_UWPat<3, NDS_LEA_D_ZE>;
-
-defm : Sh1Add_UWPat;
-defm : Sh2Add_UWPat;
-defm : Sh3Add_UWPat;
-
-def : Sh1AddPat;
-def : Sh2AddPat;
-def : Sh3AddPat;
-} // Predicates = [HasVendorXAndesPerf, IsRV64]

diff  --git a/llvm/lib/Target/RISCV/RISCVInstr

[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)

2025-05-11 Thread via llvm-branch-commits

https://github.com/jofrn edited https://github.com/llvm/llvm-project/pull/138635
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Remove extra MOV after widening atomic load (PR #138635)

2025-05-11 Thread via llvm-branch-commits


@@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), 
(MOV16rm addr:$src)>;
 def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>;
 def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>;
 
+def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 
addr:$src)),
+   (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i8>

jofrn wrote:

Switched it to a `zext` and now it dereferences 16 bits in the asm.

https://github.com/llvm/llvm-project/pull/138635
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138403


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mc

Author: Koakuma (koachan)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/139451.diff


5 Files Affected:

- (modified) llvm/lib/Target/Sparc/Sparc.td (+5-1) 
- (added) llvm/lib/Target/Sparc/SparcInstrCrypto.td (+98) 
- (modified) llvm/lib/Target/Sparc/SparcInstrInfo.td (+5) 
- (added) llvm/test/MC/Disassembler/Sparc/sparc-crypto.txt (+56) 
- (added) llvm/test/MC/Sparc/sparc-crypto.s (+88) 


``diff
diff --git a/llvm/lib/Target/Sparc/Sparc.td b/llvm/lib/Target/Sparc/Sparc.td
index 6e6c887e60e12..7c26bf9061cb6 100644
--- a/llvm/lib/Target/Sparc/Sparc.td
+++ b/llvm/lib/Target/Sparc/Sparc.td
@@ -58,6 +58,9 @@ def FeatureUA2007
 def FeatureOSA2011
   : SubtargetFeature<"osa2011", "IsOSA2011", "true",
  "Enable Oracle SPARC Architecture 2011 extensions">;
+def FeatureCrypto
+  : SubtargetFeature<"crypto", "IsCrypto", "true",
+ "Enable cryptographic extensions">;
 def FeatureLeon
   : SubtargetFeature<"leon", "IsLeon", "true",
  "Enable LEON extensions">;
@@ -169,7 +172,8 @@ def : Proc<"niagara3",[FeatureV9, 
FeatureV8Deprecated, UsePopc,
FeatureUA2005, FeatureUA2007]>;
 def : Proc<"niagara4",[FeatureV9, FeatureV8Deprecated, UsePopc,
FeatureVIS, FeatureVIS2, FeatureVIS3,
-   FeatureUA2005, FeatureUA2007, FeatureOSA2011]>;
+   FeatureUA2005, FeatureUA2007, FeatureOSA2011,
+   FeatureCrypto]>;
 
 // LEON 2 FT generic
 def : Processor<"leon2", LEON2Itineraries,
diff --git a/llvm/lib/Target/Sparc/SparcInstrCrypto.td 
b/llvm/lib/Target/Sparc/SparcInstrCrypto.td
new file mode 100644
index 0..0e7063f99eb06
--- /dev/null
+++ b/llvm/lib/Target/Sparc/SparcInstrCrypto.td
@@ -0,0 +1,98 @@
+//===--- SparcInstrCrypto.td - cryptographic extensions 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file contains instruction formats, definitions and patterns needed for
+// cryptographic instructions on SPARC.
+//===--===//
+
+
+// Convenience template for 4-operand instructions
+class FourOpImm op3val, bits<4> op5val,
+RegisterClass RC>
+  : F3_4;
+
+let Predicates = [HasCrypto] in {
+def AES_EROUND01 : FourOp<"aes_eround01", 0b011001, 0b, DFPRegs>;
+def AES_EROUND23 : FourOp<"aes_eround23", 0b011001, 0b0001, DFPRegs>;
+def AES_DROUND01 : FourOp<"aes_dround01", 0b011001, 0b0010, DFPRegs>;
+def AES_DROUND23 : FourOp<"aes_dround23", 0b011001, 0b0011, DFPRegs>;
+def AES_EROUND01_LAST : FourOp<"aes_eround01_l", 0b011001, 0b0100, DFPRegs>;
+def AES_EROUND23_LAST : FourOp<"aes_eround23_l", 0b011001, 0b0101, DFPRegs>;
+def AES_DROUND01_LAST : FourOp<"aes_dround01_l", 0b011001, 0b0110, DFPRegs>;
+def AES_DROUND23_LAST : FourOp<"aes_dround23_l", 0b011001, 0b0111, DFPRegs>;
+def AES_KEXPAND0  : F3_3<2, 0b110110, 0b10011,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"aes_kexpand0 $rs1, $rs2, $rd", []>;
+def AES_KEXPAND1 : FourOpImm<"aes_kexpand1", 0b011001, 0b1000, DFPRegs>;
+def AES_KEXPAND2  : F3_3<2, 0b110110, 0b100110001,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"aes_kexpand2 $rs1, $rs2, $rd", []>;
+
+def CAMELLIA_F : FourOp<"camellia_f", 0b011001, 0b1100, DFPRegs>;
+def CAMELLIA_FL  : F3_3<2, 0b110110, 0b10000,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"camellia_fl $rs1, $rs2, $rd", []>;
+def CAMELLIA_FLI : F3_3<2, 0b110110, 0b10001,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"camellia_fli $rs1, $rs2, $rd", []>;
+
+def CRC32C : F3_3<2, 0b110110, 0b101000111,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"crc32c $rs1, $rs2, $rd", []>;
+
+def DES_ROUND : FourOp<"des_round", 0b011001, 0b1001, DFPRegs>;
+let rs2 = 0 in {
+def DES_IP  : F3_3<2, 0b110110, 0b100110100,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1),
+"des_ip $rs1, $rd", []>;
+def DES_IIP  : F3_3<2, 0b110110, 0b100110101,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1),
+"des_iip $rs1, $rd", []>;
+}
+def DES_KEXPAND : F3_3<2, 0b110110, 0b100110110,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, simm5Op:$rs2),
+"des_kexpand $rs1, $rs2, $rd", []>;
+
+let rs1 = 0, rs2 = 0, rd = 0 in {
+let Uses = [D0, D1, D2, D5, D6, D7, D8, D9, D10, D11],
+Def

[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan created 
https://github.com/llvm/llvm-project/pull/139451

None


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for OSA 2011 instructions (PR #138403)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138403


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SPARC][IAS] Add definitions for UA 2005 instructions (PR #138400)

2025-05-11 Thread via llvm-branch-commits

https://github.com/koachan updated 
https://github.com/llvm/llvm-project/pull/138400

>From b2e8de55ea9e54239a017eb932f7107f29f465a4 Mon Sep 17 00:00:00 2001
From: Koakuma 
Date: Sun, 4 May 2025 08:57:07 +0700
Subject: [PATCH 1/2] Add other instructions & fix typo

Created using spr 1.3.5
---
 llvm/lib/Target/Sparc/SparcInstrUAOSA.td| 17 -
 .../test/MC/Disassembler/Sparc/sparc-ua-osa.txt |  6 ++
 llvm/test/MC/Sparc/sparc-ua2005.s   |  9 +
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td 
b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td
index d883e517db89d..5ecc02ed10bfb 100644
--- a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td
+++ b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td
@@ -1,4 +1,4 @@
-//=== SparcInstrVIS.td - Visual Instruction Set extensions (VIS) -===//
+//=== SparcInstrUAOSA.td - UltraSPARC/Oracle SPARC Architecture extensions 
===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -18,4 +18,19 @@ def ALLCLEAN : InstSP<(outs), (ins), "allclean", []> {
 let Inst{29-19} = 0b00010110001;
 let Inst{18-0} = 0;
 }
+def INVALW : InstSP<(outs), (ins), "invalw", []> {
+let op = 2;
+let Inst{29-19} = 0b00101110001;
+let Inst{18-0} = 0;
+}
+def NORMALW : InstSP<(outs), (ins), "normalw", []> {
+let op = 2;
+let Inst{29-19} = 0b00100110001;
+let Inst{18-0} = 0;
+}
+def OTHERW : InstSP<(outs), (ins), "otherw", []> {
+let op = 2;
+let Inst{29-19} = 0b0000001;
+let Inst{18-0} = 0;
+}
 } // Predicates = [HasUA2005]
diff --git a/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt 
b/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt
index dc3d196091c6b..4a2de98e03fe3 100644
--- a/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt
+++ b/llvm/test/MC/Disassembler/Sparc/sparc-ua-osa.txt
@@ -4,3 +4,9 @@
 
 # CHECK: allclean
 0x85,0x88,0x00,0x00
+# CHECK: invalw
+0x8b,0x88,0x00,0x00
+# CHECK: otherw
+0x87,0x88,0x00,0x00
+# CHECK: normalw
+0x89,0x88,0x00,0x00
diff --git a/llvm/test/MC/Sparc/sparc-ua2005.s 
b/llvm/test/MC/Sparc/sparc-ua2005.s
index 2214b91b335cd..b07c99a20033b 100644
--- a/llvm/test/MC/Sparc/sparc-ua2005.s
+++ b/llvm/test/MC/Sparc/sparc-ua2005.s
@@ -6,3 +6,12 @@
 ! NO-UA2005: error: instruction requires a CPU feature not currently enabled
 ! UA2005: allclean   ! encoding: [0x85,0x88,0x00,0x00]
 allclean
+! NO-UA2005: error: instruction requires a CPU feature not currently enabled
+! UA2005: invalw ! encoding: [0x8b,0x88,0x00,0x00]
+invalw
+! NO-UA2005: error: instruction requires a CPU feature not currently enabled
+! UA2005: otherw ! encoding: [0x87,0x88,0x00,0x00]
+otherw
+! NO-UA2005: error: instruction requires a CPU feature not currently enabled
+! UA2005: normalw! encoding: [0x89,0x88,0x00,0x00]
+normalw

>From a2c49c5b9ecf2451a20d660cdc059c3301a8b816 Mon Sep 17 00:00:00 2001
From: Koakuma 
Date: Mon, 12 May 2025 07:26:35 +0700
Subject: [PATCH 2/2] Fix indentation

Created using spr 1.3.5
---
 llvm/lib/Target/Sparc/SparcInstrUAOSA.td | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td 
b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td
index 8a833636301d0..b00995a960968 100644
--- a/llvm/lib/Target/Sparc/SparcInstrUAOSA.td
+++ b/llvm/lib/Target/Sparc/SparcInstrUAOSA.td
@@ -12,9 +12,9 @@
 
 class UA2005RegWin fcn>
 : F3_1<2, 0b110001, (outs), (ins), asmstr, []> {
-let rd = fcn;
-let rs1 = 0;
-let rs2 = 0;
+  let rd = fcn;
+  let rs1 = 0;
+  let rs2 = 0;
 }
 
 // UltraSPARC Architecture 2005 Instructions

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SPARC][IAS] Add definitions for cryptographic instructions (PR #139451)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-sparc

Author: Koakuma (koachan)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/139451.diff


5 Files Affected:

- (modified) llvm/lib/Target/Sparc/Sparc.td (+5-1) 
- (added) llvm/lib/Target/Sparc/SparcInstrCrypto.td (+98) 
- (modified) llvm/lib/Target/Sparc/SparcInstrInfo.td (+5) 
- (added) llvm/test/MC/Disassembler/Sparc/sparc-crypto.txt (+56) 
- (added) llvm/test/MC/Sparc/sparc-crypto.s (+88) 


``diff
diff --git a/llvm/lib/Target/Sparc/Sparc.td b/llvm/lib/Target/Sparc/Sparc.td
index 6e6c887e60e12..7c26bf9061cb6 100644
--- a/llvm/lib/Target/Sparc/Sparc.td
+++ b/llvm/lib/Target/Sparc/Sparc.td
@@ -58,6 +58,9 @@ def FeatureUA2007
 def FeatureOSA2011
   : SubtargetFeature<"osa2011", "IsOSA2011", "true",
  "Enable Oracle SPARC Architecture 2011 extensions">;
+def FeatureCrypto
+  : SubtargetFeature<"crypto", "IsCrypto", "true",
+ "Enable cryptographic extensions">;
 def FeatureLeon
   : SubtargetFeature<"leon", "IsLeon", "true",
  "Enable LEON extensions">;
@@ -169,7 +172,8 @@ def : Proc<"niagara3",[FeatureV9, 
FeatureV8Deprecated, UsePopc,
FeatureUA2005, FeatureUA2007]>;
 def : Proc<"niagara4",[FeatureV9, FeatureV8Deprecated, UsePopc,
FeatureVIS, FeatureVIS2, FeatureVIS3,
-   FeatureUA2005, FeatureUA2007, FeatureOSA2011]>;
+   FeatureUA2005, FeatureUA2007, FeatureOSA2011,
+   FeatureCrypto]>;
 
 // LEON 2 FT generic
 def : Processor<"leon2", LEON2Itineraries,
diff --git a/llvm/lib/Target/Sparc/SparcInstrCrypto.td 
b/llvm/lib/Target/Sparc/SparcInstrCrypto.td
new file mode 100644
index 0..0e7063f99eb06
--- /dev/null
+++ b/llvm/lib/Target/Sparc/SparcInstrCrypto.td
@@ -0,0 +1,98 @@
+//===--- SparcInstrCrypto.td - cryptographic extensions 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file contains instruction formats, definitions and patterns needed for
+// cryptographic instructions on SPARC.
+//===--===//
+
+
+// Convenience template for 4-operand instructions
+class FourOpImm op3val, bits<4> op5val,
+RegisterClass RC>
+  : F3_4;
+
+let Predicates = [HasCrypto] in {
+def AES_EROUND01 : FourOp<"aes_eround01", 0b011001, 0b, DFPRegs>;
+def AES_EROUND23 : FourOp<"aes_eround23", 0b011001, 0b0001, DFPRegs>;
+def AES_DROUND01 : FourOp<"aes_dround01", 0b011001, 0b0010, DFPRegs>;
+def AES_DROUND23 : FourOp<"aes_dround23", 0b011001, 0b0011, DFPRegs>;
+def AES_EROUND01_LAST : FourOp<"aes_eround01_l", 0b011001, 0b0100, DFPRegs>;
+def AES_EROUND23_LAST : FourOp<"aes_eround23_l", 0b011001, 0b0101, DFPRegs>;
+def AES_DROUND01_LAST : FourOp<"aes_dround01_l", 0b011001, 0b0110, DFPRegs>;
+def AES_DROUND23_LAST : FourOp<"aes_dround23_l", 0b011001, 0b0111, DFPRegs>;
+def AES_KEXPAND0  : F3_3<2, 0b110110, 0b10011,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"aes_kexpand0 $rs1, $rs2, $rd", []>;
+def AES_KEXPAND1 : FourOpImm<"aes_kexpand1", 0b011001, 0b1000, DFPRegs>;
+def AES_KEXPAND2  : F3_3<2, 0b110110, 0b100110001,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"aes_kexpand2 $rs1, $rs2, $rd", []>;
+
+def CAMELLIA_F : FourOp<"camellia_f", 0b011001, 0b1100, DFPRegs>;
+def CAMELLIA_FL  : F3_3<2, 0b110110, 0b10000,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"camellia_fl $rs1, $rs2, $rd", []>;
+def CAMELLIA_FLI : F3_3<2, 0b110110, 0b10001,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"camellia_fli $rs1, $rs2, $rd", []>;
+
+def CRC32C : F3_3<2, 0b110110, 0b101000111,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, DFPRegs:$rs2),
+"crc32c $rs1, $rs2, $rd", []>;
+
+def DES_ROUND : FourOp<"des_round", 0b011001, 0b1001, DFPRegs>;
+let rs2 = 0 in {
+def DES_IP  : F3_3<2, 0b110110, 0b100110100,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1),
+"des_ip $rs1, $rd", []>;
+def DES_IIP  : F3_3<2, 0b110110, 0b100110101,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1),
+"des_iip $rs1, $rd", []>;
+}
+def DES_KEXPAND : F3_3<2, 0b110110, 0b100110110,
+(outs DFPRegs:$rd), (ins DFPRegs:$rs1, simm5Op:$rs2),
+"des_kexpand $rs1, $rs2, $rd", []>;
+
+let rs1 = 0, rs2 = 0, rd = 0 in {
+let Uses = [D0, D1, D2, D5, D6, D7, D8, D9, D10, D11

[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)

2025-05-11 Thread Martin Storsjö via llvm-branch-commits

https://github.com/mstorsjo created 
https://github.com/llvm/llvm-project/pull/139468

Backport f909b2229ac16ae3898d8b158bee85c384173dfa, the follow-up fix from 
297f6d9f6b215bd7f58cf500b979b94dedbba7bb, plus two commits for updating the CI 
with regards to macOS.

From 79e10b190029b749e042d1aaec3ee697a2f5d41a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= 
Date: Fri, 28 Feb 2025 20:43:46 -0100
Subject: [PATCH 1/4] [libcxx] Provide locale conversions to tests through lit
 substitution (#105651)

There are 2 problems today that this PR resolves:

libcxx tests assume the thousands separator for fr_FR locale is x00A0 on
Windows. This currently fails when run on newer versions of Windows (it
seems to have been updated to the new correct value of 0x202F around
windows 11. The exact windows version where it changed doesn't seem to
be documented anywhere). Depending the OS version, you need different
values.

There are several ifdefs to determine the environment/platform-specific
locale conversion values and it leads to maintenance as things change
over time.

This PR includes the following changes:

- Provide the environment's locale conversion values through a
  substitution. The test can opt in by placing the substitution value in a
  define flag.
- Remove the platform ifdefs (the swapping of values between Windows,
  Linux, Apple, AIX).

This is accomplished through a lit feature action that fetches the
environment's locale conversions (lconv) for members like
'thousands_sep' that we need to provide. This should ensure that we
don't lose the effectiveness of the test itself.

In addition, as a result of the above, this PR:

- Fixes a handful of locale tests which unexpectedly fail on newer
  Windows versions.
- Resolves 3 XFAIL FIX-MEs.

Originally submitted in https://github.com/llvm/llvm-project/pull/86649.

Co-authored-by: Rodrigo Salazar <4rodrigosala...@gmail.com>
(cherry picked from commit f909b2229ac16ae3898d8b158bee85c384173dfa)
---
 .../get_long_double_fr_FR.pass.cpp|  5 +-
 .../get_long_double_ru_RU.pass.cpp|  5 +-
 .../put_long_double_fr_FR.pass.cpp|  5 +-
 .../put_long_double_ru_RU.pass.cpp|  5 +-
 .../thousands_sep.pass.cpp| 34 ++-
 .../thousands_sep.pass.cpp| 20 ++--
 .../time.duration.nonmember/ostream.pass.cpp  | 24 ++---
 libcxx/test/support/locale_helpers.h  | 37 ++--
 libcxx/utils/libcxx/test/features.py  | 91 ++-
 9 files changed, 138 insertions(+), 88 deletions(-)

diff --git 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
index bbb67d694970a..f02241ad36a5b 100644
--- 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
+++ 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
@@ -13,6 +13,8 @@
 
 // REQUIRES: locale.fr_FR.UTF-8
 
+// ADDITIONAL_COMPILE_FLAGS: 
-DFR_MON_THOU_SEP=%{LOCALE_CONV_FR_FR_UTF_8_MON_THOUSANDS_SEP}
+
 // 
 
 // class money_get
@@ -59,7 +61,8 @@ class my_facetw
 };
 
 static std::wstring convert_thousands_sep(std::wstring const& in) {
-  return LocaleHelpers::convert_thousands_sep_fr_FR(in);
+  const wchar_t fr_sep = 
LocaleHelpers::mon_thousands_sep_or_default(FR_MON_THOU_SEP);
+  return LocaleHelpers::convert_thousands_sep(in, fr_sep);
 }
 #endif // TEST_HAS_NO_WIDE_CHARACTERS
 
diff --git 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
index e680f2ea8816a..371cf0e90c8d3 100644
--- 
a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
+++ 
b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
@@ -11,6 +11,8 @@
 
 // REQUIRES: locale.ru_RU.UTF-8
 
+// ADDITIONAL_COMPILE_FLAGS: 
-DRU_MON_THOU_SEP=%{LOCALE_CONV_RU_RU_UTF_8_MON_THOUSANDS_SEP}
+
 // XFAIL: glibc-old-ru_RU-decimal-point
 
 // 
@@ -52,7 +54,8 @@ class my_facetw
 };
 
 static std::wstring convert_thousands_sep(std::wstring const& in) {
-  return LocaleHelpers::convert_thousands_sep_ru_RU(in);
+  const wchar_t ru_sep = 
LocaleHelpers::mon_thousands_sep_or_default(RU_MON_THOU_SEP);
+  return LocaleHelpers::convert_thousands_sep(in, ru_sep);
 }
 #endif // TEST_HAS_NO_WIDE_CHARACTERS
 
diff --git 
a/libcxx/test/std/localization/locale.categories/category.monetar

[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-github-workflow

Author: Martin Storsjö (mstorsjo)


Changes

Backport f909b2229ac16ae3898d8b158bee85c384173dfa, the follow-up fix from 
297f6d9f6b215bd7f58cf500b979b94dedbba7bb, plus two commits for updating the CI 
with regards to macOS.

---

Patch is 38.04 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/139468.diff


32 Files Affected:

- (modified) .github/workflows/libcxx-build-and-test.yaml (+12-2) 
- (modified) 
libcxx/test/libcxx/input.output/iostreams.base/ios.base/ios.base.cons/dtor.uninitialized.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/libcxx/strings/basic.string/string.capacity/allocation_size.pass.cpp
 (-5) 
- (modified) 
libcxx/test/std/input.output/file.streams/fstreams/filebuf.virtuals/setbuf.pass.cpp
 (+6-2) 
- (modified) 
libcxx/test/std/input.output/iostream.format/input.streams/istream.unformatted/sync.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.collate/locale.collate.byname/compare.pass.cpp
 (+3) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
 (+8-2) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
 (+7-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
 (+8-2) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_ru_RU.pass.cpp
 (+7-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_zh_CN.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/curr_symbol.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/grouping.pass.cpp
 (+5-2) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/neg_format.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp
 (+9-25) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
 (+7) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp
 (+7) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long_double.pass.cpp
 (+7) 
- (modified) 
libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
 (+11-12) 
- (modified) 
libcxx/test/std/strings/basic.string/string.capacity/max_size.pass.cpp (+1-5) 
- (modified) 
libcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp 
(+6) 
- (modified) 
libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp 
(+12-15) 
- (modified) libcxx/test/std/time/time.syn/formatter.duration.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.file_time.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.hh_mm_ss.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.local_time.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.sys_time.pass.cpp (+3) 
- (modified) libcxx/test/support/locale_helpers.h (+6-31) 
- (modified) libcxx/utils/generate_feature_test_macro_components.py (+1) 
- (modified) libcxx/utils/libcxx/test/features.py (+91-1) 


``diff
diff --git a/.github/workflows/libcxx-build-and-test.yaml 
b/.github/workflows/libcxx-build-and-test.yaml
index 3346c1322a07c..84b2e104d260a 100644
--- a/.github/workflows/libcxx-build-and-test.yaml
+++ b/.github/workflows/libcxx-build-and-test.yaml
@@ -197,10 +197,20 @@ jobs:
   os: macos-15
 - config: apple-configuration
   os: macos-15
+# TODO: These jobs are intended to test back-deployment (building 
against ToT libc++ but running against an
+#   older system-provided libc++.dylib). Doing this properly would 
require building the test suite on a
+#   recent macOS using a recent Clang (hence recent Xcode), and 
then running the actual test suite on an
+#   older mac. We could do that by e.g. sharing artifacts between 
the two jobs.
+#
+#   However, our L

[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)

2025-05-11 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: Martin Storsjö (mstorsjo)


Changes

Backport f909b2229ac16ae3898d8b158bee85c384173dfa, the follow-up fix from 
297f6d9f6b215bd7f58cf500b979b94dedbba7bb, plus two commits for updating the CI 
with regards to macOS.

---

Patch is 38.04 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/139468.diff


32 Files Affected:

- (modified) .github/workflows/libcxx-build-and-test.yaml (+12-2) 
- (modified) 
libcxx/test/libcxx/input.output/iostreams.base/ios.base/ios.base.cons/dtor.uninitialized.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/libcxx/strings/basic.string/string.capacity/allocation_size.pass.cpp
 (-5) 
- (modified) 
libcxx/test/std/input.output/file.streams/fstreams/filebuf.virtuals/setbuf.pass.cpp
 (+6-2) 
- (modified) 
libcxx/test/std/input.output/iostream.format/input.streams/istream.unformatted/sync.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.collate/locale.collate.byname/compare.pass.cpp
 (+3) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
 (+8-2) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
 (+7-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
 (+8-2) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_ru_RU.pass.cpp
 (+7-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_zh_CN.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/curr_symbol.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/grouping.pass.cpp
 (+5-2) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/neg_format.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp
 (+9-25) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
 (+7) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp
 (+7) 
- (modified) 
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long_double.pass.cpp
 (+7) 
- (modified) 
libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp
 (+4-1) 
- (modified) 
libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
 (+11-12) 
- (modified) 
libcxx/test/std/strings/basic.string/string.capacity/max_size.pass.cpp (+1-5) 
- (modified) 
libcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp 
(+6) 
- (modified) 
libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp 
(+12-15) 
- (modified) libcxx/test/std/time/time.syn/formatter.duration.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.file_time.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.hh_mm_ss.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.local_time.pass.cpp (+3) 
- (modified) libcxx/test/std/time/time.syn/formatter.sys_time.pass.cpp (+3) 
- (modified) libcxx/test/support/locale_helpers.h (+6-31) 
- (modified) libcxx/utils/generate_feature_test_macro_components.py (+1) 
- (modified) libcxx/utils/libcxx/test/features.py (+91-1) 


``diff
diff --git a/.github/workflows/libcxx-build-and-test.yaml 
b/.github/workflows/libcxx-build-and-test.yaml
index 3346c1322a07c..84b2e104d260a 100644
--- a/.github/workflows/libcxx-build-and-test.yaml
+++ b/.github/workflows/libcxx-build-and-test.yaml
@@ -197,10 +197,20 @@ jobs:
   os: macos-15
 - config: apple-configuration
   os: macos-15
+# TODO: These jobs are intended to test back-deployment (building 
against ToT libc++ but running against an
+#   older system-provided libc++.dylib). Doing this properly would 
require building the test suite on a
+#   recent macOS using a recent Clang (hence recent Xcode), and 
then running the actual test suite on an
+#   older mac. We could do that by e.g. sharing artifacts between 
the two jobs.
+#
+#   However, our Lit config

[llvm-branch-commits] [libcxx] [llvm] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #139468)

2025-05-11 Thread Martin Storsjö via llvm-branch-commits

mstorsjo wrote:

This is a manual backport attempt of the same as #136449, with some more fixes 
included. This should unbreak the libcxx CI on macOS on the release branch, 
which seems to be broken as is.

https://github.com/llvm/llvm-project/pull/139468
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/20.x: [libcxx] Provide locale conversions to tests through lit substitution (#105651) (PR #136449)

2025-05-11 Thread Martin Storsjö via llvm-branch-commits

mstorsjo wrote:

> Do we still want to try to backport this one?

I made a new backport attempt in #139468, let's see if it works. If not, it 
seems like the libcxx CI on the release branch is broken wrt macOS, and we can 
either choose to ignore it in all libcxx backports to 20.x, or just stop doing 
backports touching libcxx to this release branch (or we'd need to do even more 
CI fixing for the release branch).


https://github.com/llvm/llvm-project/pull/136449
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/20.x: [RISCV] Allow `Zicsr`/`Zifencei` to duplicate with `g` (#136842) (PR #137490)

2025-05-11 Thread Pengcheng Wang via llvm-branch-commits

wangpc-pp wrote:

Thanks @tstellar! Now all checks are passed!

https://github.com/llvm/llvm-project/pull/137490
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits